Difference between revisions of "Splunk"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Splunk is a flexible data aggregation system.  In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the [http://www.splunk.com splunk website].
 
Splunk is a flexible data aggregation system.  In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the [http://www.splunk.com splunk website].
  
== Splunk 4.3 ==  
+
== Splunk at NPG ==
  
We are currently in the process of reconfiguring our (now not completely working) former Splunk setup with version 4.3. This configuration will utilize the new Universal Forwarders which are more lightweight and have a smaller overall footprint than the older light forwarder configuration.
+
At NPG we have the following setup:
  
Our Current setup involves one primary Receiver / Indexer running on Pumpkin with each system we wish to index running a universal forwarder. The forwarder has no web interface and simply forwards log data to be indexed to the main splunk indexer.
+
Taro  = Splunk indexer, splunk deployment server, splunk web server
  
== Accessing Splunk ==
+
Endeavour, Gourd, Einstein, Roentgen, Lentil, ... = Splunk forwarding servers.
 +
 
 +
So, the other systems all forward their data to Taro, where it is indexed. The instructions of what to do by these forwarding hosts is also provided by Taro (deployment server).
 +
Additional indexing of data can be implemented by adding data to the deployment class on Taro. You can also add data directly on the node by editing the config files.
  
The splunk web interface is available via port 8000 on pumpkin. This port is not open on the firewall so an ssh port forward should be used to access it. To do so use the following command:
+
Note: This means that even if you start the web interface on one of the other nodes, you won't see anything, since all the data was send to Taro.
  
ssh -L 8000:localhost:8000 username@pumpkin
+
== Splunk 6.2.x ==
  
Then direct your browser to https://localhost:8000
+
We are currently using Splunk version 6.2.2, which will be upgraded only as needed.
  
 +
== Accessing Splunk ==
  
== Installing Splunk ==
+
The splunk web interface is available via port 8000 on Taro. This port is not open on the firewall so an ssh port forward should be used to access it. To do so use the following command:
  
Setting up splunk differs depending upon whether you're reinstalling the main indexer or setting up a forwarder on a new system. Here are some general install notes.
+
ssh -L 8001:localhost:8000 username@pumpkin
  
When installing splunk you will need to download the appropriate tarball from the Splunk website. This should be unpacked in a reasonably consistent place across systems. If the system has a /data volume, unpack it in /data/splunk and then create a symlink to that folder in /opt/splunk. If no /data volume exists, just install it into /opt/splunk.
+
Then direct your browser to https://localhost:8001  (you can change the 8001 to anything you want on both commands.)
  
The main indexer on Pumpkin is installed in /data1/splunk/splunk-4.3, and older versions of splunk still exist from previous installs. Those will stay there until I'm fairly certain they're not needed anymore, and then they will be cleaned up.  
+
IF we are still using an "enterprise" license, you will be asked to enter a user name and password. Try "admin" and "changeme" or "splunkitnow"
  
== Configuring Splunk ==
+
== Using Splunk ==  
  
=== On Pumpkin ===
+
A good place to start when using splunk is the "search". There are saved searches, which allow you to start exploring. Some of the saved searches are exported as dashboard apps.
  
$SPLUNK_HOME refers to the location that the splunk tarball was unpacked to.  
+
Click on "Dashboards" in the green bar on the top. Then choose a dashboard (i.e. "Errors last 24h") and click the title. The resulting bar graph shows the errors color coded by machine. You can now click on the colored bars to explore what these errors were and then '''take action'''.
  
Setting up the main splunk indexer was completed as follows:
+
== Installing Splunk ==
# Download and unpack the tarball to $SPLUNK_HOME
 
#Accept the license and set the init script to run at boot:<br/><code>$SPLUNK_HOME/bin/splunk enable boot-start --accept-license</code>
 
# Start Splunk<br/><code>$SPLUNK_HOME/bin/splunk start</code>
 
# Login to the web interface. You will be prompted to set the admin password. Set it to the typical admin password minus the host portion.
 
# Navigate to Manager >> System Settings >> General Settings and make sure that HTTPS is enabled.
 
# Navigate to Manager >> Licensing and make sure to change the license type from "Trial" to "Free".
 
# Navigate to Manager >> Forwarding and Receiving and click "configure receiving".
 
# Add a new receiver and set a port to listen on. Pumpkin is currently configured to listen on port 8099.
 
# Add Manager >> Data Inputs add the folder /var/log/ so that pumpkin's logs are indexed.
 
  
This should get a simple main indexer with search functionality up and working. Forwarders can now start sending log data to the main index.  
+
This is changing with every release, fortunately it get easier.  
  
=== Adding Forwarders ===
+
On Taro, Splunk is installed on /data/splunk/splunk-xxx  with a link to the latest version. Taro also hosts the tar file.
  
The process for installing a universal forwarder is pretty simple. Forwarders don't have a web interface, so use the following commands:
 
  
# Download the tarball and unpack it in $SPLUNK_HOME
+
=== Adding Forwarders ===
#Accept the license and set the init script to run at boot: <br/><code>$SPLUNK_HOME/bin/splunk enable boot-start --accept-license</code>
 
# Start Splunk: <br/><code>$SPLUNK_HOME/bin/splunk start</code>
 
# Set the admin password. You will be prompted for the default credentials these are are admin:changeme<br/><code>$SPLUNK_HOME/bin/splunk edit user admin -password <newpassword></code>
 
# Add the server to forward to:<br/><code>$SPLUNK_HOME/bin/splunk add forward-server pumpkin.farm.physics.unh.edu:8099</code>
 
# Add log files to monitor: <br/><code>$SPLUNK_HOME/bin/splunk add monitor /var/log</code>
 
  
 +
The process for installing a forwarder is pretty simple. Forwarders don't have a web interface, we turn it off, so use the following commands:
  
The new forwarder will immediately begin sending data from /var/log to the main indexer. If there is a lot of old log data in /var/log you need to watch to make sure that this initial sync will not push the daily indexing over the 500MB limit. You can see the current amount indexed on the Licensing page of the indexer's web interface.  
+
# cd /opt        # (or /data for systems with a data drive)
 +
# mkdir splunk
 +
# cd splunk
 +
# tar xzvf /net/data/taro/splunk/splunk-xxxx.tgx
 +
# cd splunk
 +
# bin/splunk start  ## (agree to license)
 +
# bin/splunk edit user admin -password splunkitnow -role admin -auth admin:changeme
 +
# bin/splunk set deploy-poll 10.0.0.247:8089  ## (set deployment server as Taro)
 +
# bin/splunk enable boot-start
 +
# bin/splunk disable webserver
 +
# bin/splunk restart
  
If you find that importing a new system is pushing the daily index amount too close to the maximum just shut the forwarder off. You can start it up the next day and gradually pull in all of the log data until it catches up. At that point it should only send new log data, which isn't a very large amount under normal circumstances.  
+
That's all folks.
  
 
== Documentation ==  
 
== Documentation ==  
Line 65: Line 65:
 
[http://www.splunk.com/download/ Download Splunk]
 
[http://www.splunk.com/download/ Download Splunk]
  
[http://www.splunk.com/download/universalforwarder Download Universal Forwarder]
+
[http://docs.splunk.com/Documentation/Splunk/latest/Updating/Extendedexampledeployseveralstandardforwarders Example Deployment server]
 +
 
 +
[http://docs.splunk.com/Documentation/Splunk/6.2.3/Updating/Configuredeploymentclients  Configure Deployment Clients (CLI examples)]
  
 
[http://docs.splunk.com/Documentation/Splunk/latest/Installation/InstallonLinux Install on Linux]
 
[http://docs.splunk.com/Documentation/Splunk/latest/Installation/InstallonLinux Install on Linux]
Line 74: Line 76:
  
 
[http://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorfilesanddirectoriesusingtheCLI Add files to monitor from command line]
 
[http://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorfilesanddirectoriesusingtheCLI Add files to monitor from command line]
 
= Old (Pre-2012) Splunk Configuration =
 
We are now (December 2009) running the free 4.0.7 on our systems: [[Pumpkin]], [[Taro]], [[Gourd]], [[Endeavour]], [[Einstein]], [[Tomato]], [[Improv]]. If it is not running on one of these systems, it should be.  Splunk is no longer as resource hungry as before. On systems where the splunk system is starting to use too much resources we can reconfigure the splunk layer as a lightweight forwarder. Currently [[Pumpkin]] is set up as a receiver and [[Endeavour]] as a duplicate receiver.
 
 
Our setup:
 
 
* Splunk runs on servers, with [[Pumpkin]] the master (receiver) node.
 
* On Pumpkin, it is installed in /data1/splunk, with a link to /opt/splunk. This should be fairly consistent among systems.
 
* Pumpkin mounts the /var/log directories from [[Roentgen]] so that it can be accessed by splunk for aggregation, without the need to run a splunk copy on roentgen (which is virtual).
 
* Splunk runs on [[Endeavour]] as a full server, on [[Einstein]],[[Taro]],[[Pepper]], [[Gourd]], [[Tomato]] and [[Improv]] it depends, it may run as forwarding server.
 
* The free version of splunk does not allow for login. We restrict access to the splunk console in iptables. Use an ssh tunnel to access the splunk web portal.
 
* This can be extended to do many different tasks!
 
* The new >4 versions of Splunk come with applications. We run the *Nix application, which does a nice job of giving a sense of what is happening on Unix like systems.
 
 
== Connecting to Splunk ==
 
Pumpkin blocks all port 80 and port 8000 connections in the iptables, so it's not possible to access the interface to Splunk by simply opening a web browser and going to the appropriate port. This is a safety issue, so it is not going to change. There is a fairly simple workaround, you can open an ssh tunnel:
 
# <code>ssh -L8001:localhost:8000 pumpkin</code>. It doesn't necessarily have to be 8001, but some non-priviledged available port on your machine.
 
# Open a web browser with good Javascript support and Flash 10 or later, and go to ''localhost:8001'' (or whatever port you chose). On Linux and OS X only Firefox is compatible. On Windows IE is compatible as well (but you won't care, right?)
 
 
== Sophisticated stuff for Splunk ==
 
You can use the admin button on the splunk web interface to do administration, add a user account (licensed version only), add new input streams. This is pretty simple. More sophisticated use is documented here: [http://www.splunk.com/ Splunk.com] then go to documentation and click on the version used.
 
 
 
 
== OLD Config things from version 3 ==
 
 
These may or may not work anymore, but are saved here for documentation history.
 
 
=== Filtering the input files ===
 
See [http://www.splunk.com/doc/3.1.1/admin/adminfilewhiteblacklist Splunk File whitelist/blacklist].
 
 
We usually just let splunk loose on an entire directory (/var/log) of several machines (einstein, roentgen, pumpkin...). There are files splunk will skip automatically (mostly binaries). Others can be filtered out by editing /opt/splunk/etc/bundles/local/inputs.conf and adding a line like:
 
_blacklist = audit\.log|\.[12345]  # Ignore the audit files, which you should read with aureport anyhow.
 
 
You can see what the input files splunked will be with:
 
. /opt/splunk/bin/setSplunkEnv
 
/opt/splunk/bin/listtails
 
 
=== Splunk getting sysinfo from other nodes ===
 
 
'''This is discontinued.''' Too much ssh connections, causes lots of entries in log files, which is no good since it obfuscates what happens and ssh is important!
 
 
To get sysinfo (cpu load, users logged in, memory useage) from other nodes, without running splunk everywhere and without creating huge log files with this info everywhere, I made a "pipe" for splunk. This is a script that runs on splunk in $SPLUNKHOME/etc/bundles/sysinfo that will ssh over to each node monitorred and execute the command /root/splunk_ssh_info_pipe.
 
 
To make this whole thing secure, I did the following:
 
* Modify the /root/.ssh/authorized_keys to have an entry that will only execute one command when jalapeno tries to connect to the node (pepper, taro,...) with a passwordless ssh connection. This command is our pipe script:
 
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,from="jalapeno.farm.physics.unh.edu",command="/root/splunk_ssh_info_pipe" ssh-rsa verylongsshkeyishere root@jalapeno.unh.edu
 
* This will only work is root is allowed to connect like this, so I modified /etc/security/access.conf to allow a root login from jalapeno.
 
* The script when run on the node creates output that is then parsed by splunk.
 
 
This is fairly secure. I could have created a used "splunk" for all machines and set it up so that that user can only execute one command. Perhaps I'll switch to that at some point.
 
 
== Getting Splunk to run on a new node ==
 
 
Install splunk by untarring the install tar file, currently located at /net/data/pumpkin1/splunk <br>
 
Standard location is /opt, move the resulting /opt/splunk to /opt/splunk-<version> and make a soft link to splunk. <br>
 
Now start up the system:
 
  /opt/splunk/bin/splunk start --accept-license
 
 
Next, startup "firefox localhost:8000" or tunnel to the splunk web server. <br>
 
Next go to admin tab:
 
 
# (Optional) Turn on the SSL
 
# Setup the logs to watch: data input -> files & directories  -> New Input. Then add /var/log.
 
# Setup forwading to pumpkin.farm.physics.unh.edu port 8089. Do not store local data (usually).
 
# Run bin/splunk disable webserver (or the splunk/etc/system/local/web.conf set "startwebserver=0" to turn off the local web server.)
 
# Restart server: bin/splunk restart
 
# Make splunk start automaticalle: bin/splunk enable boot-start
 
 
Wow, you're done!
 
 
== LAYOUT ==
 
 
Current setup:
 
* Pumpkin is the master collector.
 
* Endeavour stores local and sends to Pumpkin.
 
* Einstein sends to Pumpkin AND Endeavour, no local store.
 
* Taro sends to Pumpkin, no local store.
 
* Improv sends to Pumpkin, no local store.
 
* Pepper sends to Pumpkin, no local store.
 
* Roentgen is sucked directly out of /var/log through a mount on Pumpkin.
 

Latest revision as of 14:30, 14 May 2015

Splunk is a flexible data aggregation system. In laymens' words, Splunk is a system that combs through log files (and anything else that contains structured information that you want to throw at it) and presents the results in a summarized format. It is really a pretty neat thing. See the splunk website.

Splunk at NPG

At NPG we have the following setup:

Taro = Splunk indexer, splunk deployment server, splunk web server

Endeavour, Gourd, Einstein, Roentgen, Lentil, ... = Splunk forwarding servers.

So, the other systems all forward their data to Taro, where it is indexed. The instructions of what to do by these forwarding hosts is also provided by Taro (deployment server). Additional indexing of data can be implemented by adding data to the deployment class on Taro. You can also add data directly on the node by editing the config files.

Note: This means that even if you start the web interface on one of the other nodes, you won't see anything, since all the data was send to Taro.

Splunk 6.2.x

We are currently using Splunk version 6.2.2, which will be upgraded only as needed.

Accessing Splunk

The splunk web interface is available via port 8000 on Taro. This port is not open on the firewall so an ssh port forward should be used to access it. To do so use the following command:

ssh -L 8001:localhost:8000 username@pumpkin

Then direct your browser to https://localhost:8001 (you can change the 8001 to anything you want on both commands.)

IF we are still using an "enterprise" license, you will be asked to enter a user name and password. Try "admin" and "changeme" or "splunkitnow"

Using Splunk

A good place to start when using splunk is the "search". There are saved searches, which allow you to start exploring. Some of the saved searches are exported as dashboard apps.

Click on "Dashboards" in the green bar on the top. Then choose a dashboard (i.e. "Errors last 24h") and click the title. The resulting bar graph shows the errors color coded by machine. You can now click on the colored bars to explore what these errors were and then take action.

Installing Splunk

This is changing with every release, fortunately it get easier.

On Taro, Splunk is installed on /data/splunk/splunk-xxx with a link to the latest version. Taro also hosts the tar file.


Adding Forwarders

The process for installing a forwarder is pretty simple. Forwarders don't have a web interface, we turn it off, so use the following commands:

  1. cd /opt # (or /data for systems with a data drive)
  2. mkdir splunk
  3. cd splunk
  4. tar xzvf /net/data/taro/splunk/splunk-xxxx.tgx
  5. cd splunk
  6. bin/splunk start ## (agree to license)
  7. bin/splunk edit user admin -password splunkitnow -role admin -auth admin:changeme
  8. bin/splunk set deploy-poll 10.0.0.247:8089 ## (set deployment server as Taro)
  9. bin/splunk enable boot-start
  10. bin/splunk disable webserver
  11. bin/splunk restart

That's all folks.

Documentation

These links are useful references when setting up Splunk.

Download Splunk

Example Deployment server

Configure Deployment Clients (CLI examples)

Install on Linux

Deploy *nix Universal Forwarder

Change Admin pw from command line

Add files to monitor from command line