Difference between revisions of "Splunk"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Splunk is a flexible data aggregation system.  In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the [http://www.splunk.com splunk website].
 
Splunk is a flexible data aggregation system.  In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the [http://www.splunk.com splunk website].
== Splunk at UNH ==
 
We are now (Dec 16 2007) running the free 3.1.3 on our system [[Jalapeno]]. Splunk is resource hungry. It requires at least 600MB of memory and quite a bit of CPU. Although it is possible to run a splunkd server deamon on each node and have these pass the information to the master node, this is '''not''' how I chose to set it up. Our splunbk setup is as follows:
 
* Splunk runs on [[Jalapeno]]. It is installed in /data/splunk, with a link to /opt/splunk.
 
* Jalapeno mounts the /var/log directories from einstein and roentgen so that it can be accessed by splunk for aggregation.
 
* The free version of splunk does not allow for login. We should restrict access to jalapeno to sysadmins.
 
* This can be extended to do many different tasks!
 
  
== Connecting to Splunk ==
+
== Splunk at NPG ==
jalapeno blocks all port 80 and port 8000 connections in the iptables, so it's not possible to access the interface to Splunk by simply opening a web browser and going to the appropriate port. This is a safety issue, so it is not going to change. There is a fairly simple workaround, you can open an ssh tunnel:
 
# <code>ssh -L8001:localhost:8000 jalapeno</code>. It doesn't necessarily have to be 8001, but some non-priviledged available port on your machine.
 
# Open a web browser with good Javascript support (and optionally Flash, for some fancy graphing features), and go to ''localhost:8001'' (or whatever port you chose). On Linux and OS X only Firefox is compatible. On Window IE is compatible as well.
 
  
== Sophisticated stuff for Splunk ==
+
At NPG we have the following setup:
You can use the admin button on the splunk web interface to do administration, add a user account (licensed version only), add new input streams. This is pretty simple. More sophisticated use is documented here: [http://www.splunk.com/ Splunk.com] then go to documentation and click on the version used.
 
  
 +
Taro  = Splunk indexer, splunk deployment server, splunk web server
  
=== Filtering the input files ===
+
Endeavour, Gourd, Einstein, Roentgen, Lentil, ... = Splunk forwarding servers.
See [http://www.splunk.com/doc/3.1.1/admin/adminfilewhiteblacklist Splunk File whitelist/blacklist].
 
  
We usually just let splunk loose on an entire directory (/var/log) of several machines (einstein, roentgen, pumpkin...). There are files splunk will skip automatically (mostly binaries). Others can be filtered out by editing /opt/splunk/etc/bundles/local/inputs.conf and adding a line like:
+
So, the other systems all forward their data to Taro, where it is indexed. The instructions of what to do by these forwarding hosts is also provided by Taro (deployment server).
_blacklist = audit\.log|\.[12345]  # Ignore the audit files, which you should read with aureport anyhow.
+
Additional indexing of data can be implemented by adding data to the deployment class on Taro. You can also add data directly on the node by editing the config files.
  
You can see what the input files splunked will be with:
+
Note: This means that even if you start the web interface on one of the other nodes, you won't see anything, since all the data was send to Taro.
. /opt/splunk/bin/setSplunkEnv
 
/opt/splunk/bin/listtails
 
  
=== Splunk getting sysinfo from other nodes ===
+
== Splunk 6.2.x ==  
  
'''This is discontinued.''' Too much ssh connections, causes lots of entries in log files, which is no good since it obfuscates what happens and ssh is important!
+
We are currently using Splunk version 6.2.2, which will be upgraded only as needed.
  
To get sysinfo (cpu load, users logged in, memory useage) from other nodes, without running splunk everywhere and without creating huge log files with this info everywhere, I made a "pipe" for splunk. This is a script that runs on splunk in $SPLUNKHOME/etc/bundles/sysinfo that will ssh over to each node monitorred and execute the command /root/splunk_ssh_info_pipe.
+
== Accessing Splunk ==
  
To make this whole thing secure, I did the following:
+
The splunk web interface is available via port 8000 on Taro. This port is not open on the firewall so an ssh port forward should be used to access it. To do so use the following command:
* Modify the /root/.ssh/authorized_keys to have an entry that will only execute one command when jalapeno tries to connect to the node (pepper, taro,...) with a passwordless ssh connection. This command is our pipe script:
 
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,from="jalapeno.farm.physics.unh.edu",command="/root/splunk_ssh_info_pipe" ssh-rsa verylongsshkeyishere root@jalapeno.unh.edu
 
* This will only work is root is allowed to connect like this, so I modified /etc/security/access.conf to allow a root login from jalapeno.
 
* The script when run on the node creates output that is then parsed by splunk.
 
  
This is fairly secure. I could have created a used "splunk" for all machines and set it up so that that user can only execute one command. Perhaps I'll switch to that at some point.
+
ssh -L 8001:localhost:8000 username@pumpkin
 +
 
 +
Then direct your browser to https://localhost:8001  (you can change the 8001 to anything you want on both commands.)
 +
 
 +
IF we are still using an "enterprise" license, you will be asked to enter a user name and password. Try "admin" and "changeme" or "splunkitnow"
 +
 
 +
== Using Splunk ==
 +
 
 +
A good place to start when using splunk is the "search". There are saved searches, which allow you to start exploring. Some of the saved searches are exported as dashboard apps.
 +
 
 +
Click on "Dashboards" in the green bar on the top. Then choose a dashboard (i.e. "Errors last 24h") and click the title. The resulting bar graph shows the errors color coded by machine. You can now click on the colored bars to explore what these errors were and then '''take action'''.
 +
 
 +
== Installing Splunk ==
 +
 
 +
This is changing with every release, fortunately it get easier.
 +
 
 +
On Taro, Splunk is installed on /data/splunk/splunk-xxx  with a link to the latest version. Taro also hosts the tar file.
 +
 
 +
 
 +
=== Adding Forwarders ===
 +
 
 +
The process for installing a forwarder is pretty simple. Forwarders don't have a web interface, we turn it off, so use the following commands:
 +
 
 +
# cd /opt        # (or /data for systems with a data drive)
 +
# mkdir splunk
 +
# cd splunk
 +
# tar xzvf /net/data/taro/splunk/splunk-xxxx.tgx
 +
# cd splunk
 +
# bin/splunk start  ## (agree to license)
 +
# bin/splunk edit user admin -password splunkitnow -role admin -auth admin:changeme
 +
# bin/splunk set deploy-poll 10.0.0.247:8089  ## (set deployment server as Taro)
 +
# bin/splunk enable boot-start
 +
# bin/splunk disable webserver
 +
# bin/splunk restart
 +
 
 +
That's all folks.
 +
 
 +
== Documentation ==
 +
 
 +
These links are useful references when setting up Splunk.
 +
 
 +
[http://www.splunk.com/download/ Download Splunk]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/latest/Updating/Extendedexampledeployseveralstandardforwarders Example Deployment server]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/6.2.3/Updating/Configuredeploymentclients  Configure Deployment Clients (CLI examples)]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/latest/Installation/InstallonLinux Install on Linux]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Deployanixdfmanually Deploy *nix Universal Forwarder]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/latest/Admin/Changedefaultvalues#Changing_the_admin_default_password Change Admin pw from command line]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorfilesanddirectoriesusingtheCLI Add files to monitor from command line]

Latest revision as of 14:30, 14 May 2015

Splunk is a flexible data aggregation system. In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the splunk website.

Splunk at NPG

At NPG we have the following setup:

Taro = Splunk indexer, splunk deployment server, splunk web server

Endeavour, Gourd, Einstein, Roentgen, Lentil, ... = Splunk forwarding servers.

So, the other systems all forward their data to Taro, where it is indexed. The instructions of what to do by these forwarding hosts is also provided by Taro (deployment server). Additional indexing of data can be implemented by adding data to the deployment class on Taro. You can also add data directly on the node by editing the config files.

Note: This means that even if you start the web interface on one of the other nodes, you won't see anything, since all the data was send to Taro.

Splunk 6.2.x

We are currently using Splunk version 6.2.2, which will be upgraded only as needed.

Accessing Splunk

The splunk web interface is available via port 8000 on Taro. This port is not open on the firewall so an ssh port forward should be used to access it. To do so use the following command:

ssh -L 8001:localhost:8000 username@pumpkin

Then direct your browser to https://localhost:8001 (you can change the 8001 to anything you want on both commands.)

IF we are still using an "enterprise" license, you will be asked to enter a user name and password. Try "admin" and "changeme" or "splunkitnow"

Using Splunk

A good place to start when using splunk is the "search". There are saved searches, which allow you to start exploring. Some of the saved searches are exported as dashboard apps.

Click on "Dashboards" in the green bar on the top. Then choose a dashboard (i.e. "Errors last 24h") and click the title. The resulting bar graph shows the errors color coded by machine. You can now click on the colored bars to explore what these errors were and then take action.

Installing Splunk

This is changing with every release, fortunately it get easier.

On Taro, Splunk is installed on /data/splunk/splunk-xxx with a link to the latest version. Taro also hosts the tar file.


Adding Forwarders

The process for installing a forwarder is pretty simple. Forwarders don't have a web interface, we turn it off, so use the following commands:

  1. cd /opt # (or /data for systems with a data drive)
  2. mkdir splunk
  3. cd splunk
  4. tar xzvf /net/data/taro/splunk/splunk-xxxx.tgx
  5. cd splunk
  6. bin/splunk start ## (agree to license)
  7. bin/splunk edit user admin -password splunkitnow -role admin -auth admin:changeme
  8. bin/splunk set deploy-poll 10.0.0.247:8089 ## (set deployment server as Taro)
  9. bin/splunk enable boot-start
  10. bin/splunk disable webserver
  11. bin/splunk restart

That's all folks.

Documentation

These links are useful references when setting up Splunk.

Download Splunk

Example Deployment server

Configure Deployment Clients (CLI examples)

Install on Linux

Deploy *nix Universal Forwarder

Change Admin pw from command line

Add files to monitor from command line