Showing posts with label Host. Show all posts

Sunday, October 7, 2018

Troubleshooting Data Sources with Incorrect Times using Splunk

By Tony Lee

Have you ever had a data source that you thought was sending data using the wrong time? This can be a problem since Splunk tries to parse and use the event time instead of the ingest time, this can cause issues when trying to find ingested data. If you suspect this is the case you may be experiencing one of the following scenarios:

Systems not using NTP that experience clock drift
Systems using broken or faulty NTP
Systems using the wrong timezone (ex: Sending events in central time, but specifies GMT)

Depending on the time range selected, this can result in data not showing up within Splunk (or any SIEM) because the data may appear to be in the past or the future. For example, events that are lagging current time by 5 hours will not show up if "Last 4 hours" is selected for the time range. In a similar fashion, events that are sent with a future date and time will only show up when the time range selector of "All Time" is selected.

Enough about the problems, let's walk through building one possible solution. As a bonus we provide the dashboard shown below at the bottom of the article.

Figure 1: Last Communicated Calculator

Dashboard Components

To assist in usability, we provide a drop down input at the top that contains a list of the indexes. This is list of indexes is populated dynamically. This is derived using the dbinspect command which contains data about existing indexes within Splunk. The following creates the drop down input in the dashboard.

| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index

The upper (host detail) panel consists of columns indicating the host, total count, first written time, last written time and so on--perfect information to determine time issues. This information can be found using the metadata command which can quickly query info about hosts, sources, and sourcetypes. In this case, we care about the hosts.

| metadata index=<index we care about> type=hosts

The lower panel (a time-based area chart), represents the volume of data at a given time for a given host. We used the tstats command that we covered in previous article, but looks like the following:

| tstats prestats=t count where index=<index we care about> AND host=<host we care about> by host, _time | timechart useother=false count by host

It is certainly noteworthy that every search on this dashboard uses metadata and that's why it is so quick to discover these details. As a result, you will probably notice that there is no time wasted waiting for the search to return as the data renders almost instantly.

Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight into the last time a host communicated. This can be used to identify, troubleshoot, and finally confirm the time being reported by devices. We hope this article helps you troubleshoot these very frustrating issues. Enjoy!

Dashboard XML code

Below is the dashboard code needed to see the Last Communicated Times for hosts by Index. Feel free to modify the dashboard as needed:

<form>
<label>Last Communicated Calculator</label>
<description>Select an Index (or Indexes) - High Number is bad...</description>
<fieldset submitButton="true">
<input type="time" token="time">
<label>Time Range</label>
<default>
<earliest>-24h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="multiselect" token="index">
<label>Index</label>
<fieldForLabel>Index</fieldForLabel>
<fieldForValue>index</fieldForValue>
<search>
<query>| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index | sort index</query>
<earliest>-30d@d</earliest>
<latest>now</latest>
</search>
<valuePrefix>index=</valuePrefix>
<delimiter> OR </delimiter>
</input>
<input type="text" token="host">
<label>Host</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Hosts</title>
<search>
<query>| metadata $index$ type=hosts | dedup host | eval currentTime=now() | eval seconds=now()-lastTime | eval minutes=(seconds/60) | eval hours=(minutes/60) | convert ctime(lastTime) ctime(firstTime) ctime(currentTime) | table host, totalCount, firstTime, lastTime, currentTime, hours, minutes, seconds | sort - seconds | rename hours AS "Last Comm (in hrs)", minutes AS "Last Comm (in mins)", seconds AS "Last Comm (in secs)" | search host=$host$</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">20</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="rowNumbers">true</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Visual (Keep in mind your time range. Anything beyond the time range will not show up)</title>
<search>
<query>| tstats prestats=t count where $index$ AND host=$host$ by host, _time | timechart useother=false count by host</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
</form>

Wednesday, September 19, 2018

Spelunking your Splunk – Part V (Splunk Stats)

By Tony Lee

Welcome to the fifth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance. Here is a quick recap of the previous articles:

Spelunking your Splunk Part I (Exploring Your Data) - A clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.
Spelunking your Splunk – Part II (Disk Usage) - A dashboard that can be used to monitor data distribution across multiple indexers.
Spelunking your Splunk – Part III (License Usage) - A dashboard to understand license usage over time.
Spelunking your Splunk – Part IV (User Metrics) - A dashboard to provide insight into user activity

This article focuses on understanding your Splunk environment at a high-level. Have you ever wondered the following?

How many events ingested over a user-defined time period
How that equates to events per second (EPS)
The distinct host count
Number of indexes with data
Number of sourcetypes
Number of sources
Visually what the data ingest looks like by total event count and by index

This dashboard will give it to you and do it fast! As a bonus we will provide the dashboard code at the end of the article.

Figure 1: Splunk Stats dashboard

Finding detailed index information quickly

There are at least two places within Splunk to discover index information. The first uses a RESTful call and provides detailed information about indexes. The second requires more calculation and is less efficient. For this exercise, lets try copying and pasting the following RESTful search into your Splunk search bar to see what data is returned:

| rest /services/data/indexes-extended

Figure 2: Results of the restful search (remember to scroll right)

| dbinspect index=*

Figure 3: Column headers from dbinspect (remember to scroll right)

Now try the following which combines both (thank you Splunk!):

| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention

Now that you understand the basics, the sky is the limit. :-)

Finding source, sourcetype, and host data quickly

You may remember from the first article of this series (Spelunking your Splunk Part I (Exploring Your Data) called tstats. In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly. These indexed fields by default are index, source, sourcetype, and host. It just so happens that these are the fields that we need to understand the environment. Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time. This works great for our dashboard!

Conclusion

Dashboard XML code

Below is the dashboard code needed to enumerate your Splunk stats. Feel free to modify the dashboard as needed:

<form>
<label>Splunk Stats</label>
<fieldset submitButton="true" autoRun="true">
<input type="time" token="time">
<label>Time Range Selector</label>
<default>
<earliest>-7d@h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<single>
<title>Distinct Events</title>
<search>
<query>| tstats count where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Events Per Second (EPS)</title>
<search>
<query>| tstats count where index=* | addinfo | eval diff = info_max_time - info_min_time | eval EPS = count / diff | table EPS</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Hosts</title>
<search>
<query>| tstats dc(host) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Indexes with Data</title>
<search>
<query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention | stats count</query>
<earliest>0</earliest>
<latest></latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Sourcetypes</title>
<search>
<query>| tstats dc(sourcetype) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Sources</title>
<search>
<query>| tstats dc(source) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
</row>
<row>
<panel>
<chart>
<title>Total Event Count Over Time</title>
<search>
<query>| tstats prestats=t count where index=* by _time | timechart count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Event Count by Index Over Time</title>
<search>
<query>| tstats prestats=t count where index=* by index, _time | timechart count by index</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Indexes with Data</title>
<search>
<query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention</query>
<earliest>0</earliest>
<latest></latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>

Sunday, October 15, 2017

Spelunking your Splunk – Part I (Explore Your Data)

By Tony Lee

Introduction

Have you ever inherited a Splunk instance that you did not build? This means that you probably have no idea what data sources are being sent into Splunk. You probably don’t know much about where the data is being stored. And you certainly do not know who the highest volume hosts are within the environment.

As a consultant, this is reality for nearly every engagement we encounter: We did not build the environment and documentation is sparse or inaccurate if we are lucky enough to even have it. So, what do we do? We could run some fairly complex queries to figure this out, but many of those queries are not efficient enough to search over vast amounts of data or long periods of time—even on highly optimized environments. All is not lost though, we have some tricks (and a handy dashboard) that we would like to share.

Note: Maybe you did build the environment, but you need a sanity check to make sure you don’t have any misconfigured or run-away hosts. You will also find value here.

tstats to the rescue!

If you have not discovered or used the tstats command, we recommend that you become familiar with it even if it is at a very high-level. In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly. These indexed fields by default are index, source, sourcetype, and host. It just so happens that these are the fields that we need to understand the environment. Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time. Ok, time to answer some questions!

Common questions

These are common questions we ask during consulting engagements and this is how we get answers FAST. Most of the time 7 days’ worth of data is enough to give us a good understanding of the environment and week out anomalies.

How many events are we ingesting per day?

| tstats count where index=* by _time

Figure 1: Events per day

What are my most active indexes (events per day)?

| tstats prestats=t count where index=* by index, _time span=1d | timechart span=1d count by index

Figure 2: Most active indexes

What are my most active sourcetypes (events per day)?

| tstats prestats=t count where index=* by sourcetype, _time span=1d | timechart span=1d count by sourcetype

Figure 3: Most active sourcetypes

What are my most active sources (events per day)?

| tstats prestats=t count where index=* by source, _time span=1d | timechart span=1d count by source

Figure 4: Most active sources

What is the noisiest host (events per day)?

| tstats prestats=t count where index=* by host, _time span=1d | timechart span=1d count by host

Figure 5: Most active hosts

Dashboard Code

To make things even easier for you, try this dashboard out (code at the bottom) that combines the searches we provided above and as a bonus adds a filter to specify the index and time range.

Figure 6: Data Explorer dashboard

Conclusion

Splunk is a very powerful search platform but it can grow to be a complicated beast--especially over time. Feel free to use the searches and dashboard provided to regain control and really understand your environment. This will allow you to trim the waste and regain efficiency. Happy Splunking.

Dashboard XML code is below:

<form>

<label>Data Explorer</label>

<label>Time Range Selector</label>

</default>

</input>

<label>Index</label>

</input>

</fieldset>

<row>

<panel>

<chart>

<title>Most Active Indexes</title>

<query>| tstats prestats=t count where index=$index$ by index, _time span=1d | timechart span=1d count by index</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="charting.chart">column</option>

</chart>

</panel>

</row>

<row>

<panel>

<chart>

<title>Most Active Sourcetypes</title>

<query>| tstats prestats=t count where index=$index$ by sourcetype, _time span=1d | timechart span=1d count by sourcetype</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="charting.chart">column</option>

</chart>

</panel>

</row>

<row>

<panel>

<chart>

<title>Most Active Sources</title>

<query>| tstats prestats=t count where index=$index$ by source, _time span=1d | timechart span=1d count by source</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="charting.chart">column</option>

</chart>

</panel>

</row>

<row>

<panel>

<chart>

<title>Most Active Hosts</title>

<query>| tstats prestats=t count where index=$index$ by host, _time span=1d | timechart span=1d count by host</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="charting.chart">column</option>

</chart>

</panel>

</row>

</form>

Monday, February 15, 2016

Processing Mandiant Redline Files Using Splunk

By Tony Lee

Introduction

Do you use Mandiant's Redline (https://www.fireeye.com/services/freeware/redline.html) for performing host investigation? Do you use Splunk for centralized log collection and monitoring? How about using these two tools together? The team behind the Splunk Forensic Investigator app (https://splunkbase.splunk.com/app/2895/) is experimenting with ingesting Redline collections. We have made good progress on proving that it is possible to automate the ingestion of Redline collections and use Splunk to carve and display data from multiple hosts at the same time. However we were wondering how many people would find this capability useful enough to see the work completed. Check out the prototyping below and let us know if you would find this useful by leaving a comment below (account not necessary).

We have example output below:

System info displayed in Redline

System info displayed in Splunk

Driver modules displayed in Redline

Driver modules displayed in Splunk

Above and beyond replication

Recreating the Redline output is all well and good, however keep in mind that ingesting the data into Splunk allows you to filter, search, and carve across multiple systems at the same time. Additionally, it would allow you to use Splunk's big data crunching capabilities. It is very simple to ask Splunk to apply statistical analysis to large data sets to help look for anomalies within hosts such as:

Drive letters/mappings that don't meet corporate standards
Logged in/on users that occur infrequently (such as service accounts)
Forgotten operating systems that may be weak points or exploited first within a network

Or when analyzing drivers on multiple hosts, an investigator could glance at a dashboard and determine any of the following and more:

Number of drivers per host
Largest driver
Smallest driver
Most common driver file name
Most common driver path
Least common driver file name
Least common driver path

Conclusion

These are just some examples of interesting data one might pull from analyzing many collections. The possibilities are probably endless. Let us know what you think. Thanks.