Preventing license violations in Splunk

Splunk is a great log file search engine that I have been using for a few years now.  Gone are the days of opening up 8 different putty sessions to discover which server a user was on.  It is euphoric to search across all servers (even those your team does not own) in order to have a birds-eye view of what is going down in your environment.

Sometimes though, something goes awry and you exceed that license of yours.  If you exceed your license max (e.g. 10G of data per day), 5 times in a 30 day rolling period, you lose your ability to search within Splunk.  So far, we have been lucky and this only happens a couple of times a year, and our Splunk reps are good about getting us a temp license, which allows us to revive our search capabilities.

Over the years we have come up with a few different ways to help prevent the license violations from occurring in the first place.  In summary, we remove all forms of log file entries that contain “Debug” when we hit 85% of our license max (e.g. 8.5G).  This is accomplished through a saved search that runs every 30 minutes to check how much data has been processed for the day.


host=splunk* index=_internal group="per_index_thruput" NOT series="_*" NOT series="history" NOT series="summary" | eval mb=kb/1024  | stats sum(mb) as MB_indexed | where MB_indexed > 8500

When the daily indexed total is above 8.5G, then we have Splunk run a script that updates the props.conf to include the transforms.xml stanza that will filter anything containing various forms of the word “Debug”.  The data will still be sent from the forwarder, but the settings below will prevent the debug related log entries from being indexed by sending them to the nullQueue.  The nullQueue is where you send data to die before it is “indexed”, which counts against your license.


SPLUNK_HOME/bin/scripts/filterDebugEntries.sh:
cp SPLUNK_HOME/etc/apps/search/local/props.conf.filterDebug SPLUNK_HOME/etc/apps/search/local/props.conf

SPLUNK_HOME/bin/splunk.sh restart

props.conf.filterDebug:

[host::*]
TRANSFORMS-null=setnull_allDebug

transforms.conf

[setnull_allDebug]
REGEX = DEBUG|Debug|debug|.debug
DEST_KEY = queue
FORMAT = nullQueue

We also set this saved search to sleep for 8 hours after the indexed total is found to be greater than 8.5G as to prevent Splunk from being put into a yo-yo cycle of stop and starts since at the end of filterDebugEntries.sh, Splunk is restarted for the changes to take affect.

We have another search that runs every 30 minutes as well, which tests to see if our total daily index is at 95% of our license.  If this occurs, a separate, but very similar to the Splunk search above (just replace the 8500 with a 9500), will kick off a different script and then sleep for 8 hours:


SPLUNK_HOME/bin/scripts/filterAllLogEntries.sh:
cp SPLUNK_HOME/etc/apps/search/local/props.conf.filterEverything SPLUNK_HOME/etc/apps/search/local/props.conf

props.conf

[source::*]
TRANSFORMS-nullhost = nullhost


transforms.conf:
[nullhost]
REGEX=.
DEST_KEY = queue
queue = nullQueue

After the restart, no more data is being indexed since all data from every host is going to the nullQueue.  Be careful to schedule the 85% and 95% saved searches to run at different times. You will still be able to search across the data that has been indexed, but you can bask in the glory knowing that you prevented a license violation today.

Using Splunk to monitor a UNC path

You can have Splunk reference a UNC path with the following configuration:

etcappssearchlocalinputs.conf

[monitor:\SANCIFS_TDC_NETAPP01A.SAN.MyCompany.ComCIFS_COGNOS$TestLogs]
disabled = false
host = sancifs_test
index = default
sourcetype = motio_test

The main thing to be cognizant of is who is running Splunkd; especially on Windows. On this particular windows machine, I had it setup to run as "Local System Account",
and that is probably not what you want.

I had to reconfigure the Windows Service to be run as: COMPANY_DOMAINadmin_user

Splunk’s Interactive Field Extraction (IFX)

Cognos has some very precarious logs that we have to search through from time to time, and during these times I want to poke my eyes out with a rusty nail. However, Splunk has made this entire process much easier through their search capabilities – especially the IFX interface.

We were able to find a pattern for the PID, SessionID, and RequestID in the logs, but they do not fall into a key-value pair pattern. They are separated by what appear to be spaces in the logs. Splunk’s IFX allowed us to easily go through the data and select which "columns" were for the PID, SessionID, and RequestID. So now when we search through the data, we are automatically given data counts for all of the aforementioned fields.

Their tutorial was extremely easy to follow as well:

http://www.splunk.com/base/Documentation/latest/User/Fieldsextractiontutorial

Splunk savedsearch via the CLI

The syntax of this can be a little tricky:

$SPLUNK_HOME/bin/splunk search ‘|savedsearch “Splunk errors last 24 hours”‘

The primary reason I am looking to run Saved Searches from the Command Line Interface (CLI) is to have a scheduled query kick off a script after an error occurs that will sleep for a few minutes and then run a search similar to the one above to test for a server restart.

So when my WebSphere App Server throws a java.lang.OutOfMemoryError, I want to wait a few minutes and make sure that the server restarted. There may be a better design then what I have put together but so far this is the only way I can think to solve the problem. However, I am still searching for a better solution.

UPDATE

Something else to be aware of is the permissions of the Saved Search.  If you created the Saved Search as yourself (bobsmith), but you try to run it as an admin account from the CLI, then the admin account will not have access unless you give permissions to the admin (or everyone) to run it.

Splunk: Changing the Default Search Page

Splunk is a freakin’ awesome tool.  However, I really, really hate how when you select the Search application, it takes you directly to the Summary page.  The Summary page is inherently slow and resource intense because it is pulling a high-level view of your Splunk environment.

Well, no longer do you have to wait for the Summary page.

1 – Navigate to: Manager -> User interface -> Navigation menus -> default

2 – move the location of the default from the “dashboard” to the “flashtimeline”

...
<view name="dashboard" />
<view name="flashtimeline" default='true'  />
...

3 – Revel in knowing that you no longer see the Dashboard, and now see the Search page by default