Monday, November 12, 2018

Monitoring USB Storage Activity with Splunk – Part II (Read/Write/Delete/Modify events)

By Tony Lee

Welcome to Part two in our series on Monitoring USB Storage Activity. In the first article (http://www.securitysynapse.com/2018/11/monitoring-usb-storage-activity-part-1.html), we examined what is required to monitor USB Storage connect and disconnect events. But how about activity that happens after the drives are connected? The good news is that this is also possible using Microsoft Windows Event logs and a bit of data crunching effort. In this article we will again use Splunk to aggregate, process, and display the logs. As a bonus, we will not only outline the steps to accomplish this task, but we will also provide working dashboard code at the end of the article.

Note: The Audit Removable Storage policy is only available in Windows 8 / 2008 and above—It is not available in Windows 7 / 2003.  ☹

Figure 1:  Dashboard provided at the end of the article


High-level steps

There are two main steps needed to accomplish this task. We need to generate and collect the Windows event logs and then we need to process and display the logs within Splunk. Each is outlined below.

Windows Event Generation
For Windows 8 / 2008 hosts and above, Microsoft USB activity logs can be enabled manually one machine at a time or via Group Policy (see references section below for instructions). For this demo, we will show how to enable it on one machine using Local Security Policy:  Advanced Audit Policy Configuration > System Audit Policies > Local Group > Object Access > Audit Removable Storage

Figure 2:  Enabling Audit of Removable Storage

Double click and audit for Success and Failure. After enabling auditing, we rebooted for good measure, because hey, this is Windows.

Activity Event IDs
Now that Audit Removable Storage is enabled, open Event Viewer > Windows Logs > Security.  Select Filter Current Log on the right-hand side and type in 4663 for event ID and click OK.  Insert a USB device and click the Refresh button on the right-hand side. If all is well, there should be multiple 4663 success events. Note that Event ID 4656 is used for failures.


Figure 3:  Testing 4663 and 4656 event visibility

Feel free to explore the data within each event but take note that for USB auditing the events that we care about have a Task Category of “Removable Storage”. For convenience we provide a file delete event below:

XX/XX/XXXX 05:54:43 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4663
EventType=0
Type=Information
ComputerName=DESKTOP-8HSPO8Q
TaskCategory=Removable Storage
OpCode=Info
RecordNumber=1211
Keywords=Audit Success
Message=An attempt was made to access an object.

Subject:
Security ID: S-1-5-21-XXXXXXX-XXXXXXXXX-XXXXXXXXXX-XXXX
Account Name: User
Account Domain: DESKTOP-8HSPO8Q
Logon ID: 0x229E9

Object:
Object Server: Security
Object Type: File
Object Name: \Device\HarddiskVolume7\New Microsoft Word Document.docx
Handle ID: 0x1404
Resource Attributes:

Process Information:
Process ID: 0x17b4
Process Name: C:\Windows\explorer.exe

Access Request Information:
Accesses: DELETE

Access Mask: 0x10000



Windows Event Collection
Now that the logs are being generated, they need to be forwarded from the endpoints to a central location—in this case Splunk. This task could be accomplished using a number of methods such as Windows Event Collector (WEC), a Splunk Universal Forwarder agent, or some other forwarding method. For this demo, we will use a Splunk Universal Forwarder shown in next section.

Splunk

While we are assuming a functional Splunk Enterprise installation exists, we still need to collect the logs. We provide a sample Splunk Universal Forwarder configuration file below to help those using the Splunk Universal Forwarder. Note: we will be placing the events into an index called wineventlog. If this index does not already exist, you will first need to create it.

inputs.conf 
Located on the Windows endpoint (Usually found here:  C:\Program Files\SplunkUniversalForwarder\etc\apps\SplunkUniversalForwarder\local\inputs.conf)

[WinEventLog://Security]
index = wineventlog
checkpointInterval = 5
current_only = 0
disabled = 0
start_from = oldest
whitelist = 4663, 4656


Once the inputs.conf file is properly configured (and the universal forwarder restarted) to collect these logs from the endpoint, we need to verify that the logs are reaching Splunk. Try running the following Splunk search:

index=wineventlog 


If you see results, try something more specific, such as either of the following:

index=wineventlog EventCode=4663
index=wineventlog EventCode=4656


Conclusion

Now that we have the proper event IDs flowing into Splunk, we created a Removable Storage Activity dashboard. The dashboard provides statistical analysis for top accounts, hostname, actions, and processes. It even includes events over time by hostname and action along with the details needed to investigate USB connections. Because there may be applications within an environment that scan or interact with removable storage, it may be necessary to add some filters to reduce noise which can be customized for each environment. For your convenience, we included the dashboard code below.

Acknowledgement and References

https://www.eventtracker.com/tech-articles/tracking-removable-storage-windows-security-log/
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/jj574128(v=ws.11)


Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of wineventlog. Feel free to adjust as necessary. Splunk dashboard code provided below:


<form>
  <label>Removable Storage Activity</label>
  <description>index=wineventlog EventCode=4663 TaskCategory="Removable Storage"</description>
  <fieldset autoRun="true" submitButton="true">
    <input type="time" token="time">
      <label>Time Range</label>
      <default>
        <earliest>0</earliest>
        <latest></latest>
      </default>
    </input>
    <input type="text" token="wild">
      <label>Wildcard Search</label>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
    <input type="multiselect" token="Accesses">
      <label>Actions (Accesses)</label>
      <choice value="*">All</choice>
      <choice value="ReadData (or ListDirectory)">ReadData (or ListDirectory)</choice>
      <choice value="WriteData (or AddFile)">WriteData (or AddFile)</choice>
      <choice value="AppendData (or AddSubdirectory or CreatePipeInstance)">AppendData (or AddSubdirectory or CreatePipeInstance)</choice>
      <choice value="DELETE">DELETE</choice>
      <default>*</default>
      <initialValue>*</initialValue>
      <valuePrefix>Accesses="</valuePrefix>
      <valueSuffix>"</valueSuffix>
      <delimiter> OR </delimiter>
    </input>
  </fieldset>
  <row>
    <panel>
      <single>
        <title>Total events</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | stats count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">all</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <table>
        <title>Top Account_Domain</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top ComputerName</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 ComputerName</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Account_Name</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Account_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Top Accesses</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Accesses</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Process_Name</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Process_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Activity Over Time</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | eval ComputerAction = ComputerName + ":" + Accesses | timechart count(ComputerAction) by ComputerAction</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Details</title>
        <search>
          <query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>



Tuesday, November 6, 2018

Monitoring USB Storage Activity with Splunk – Part 1 (Connectivity events)

By Tony Lee

Have you ever wanted to monitor what goes on with removable media in your environment, but maybe lack the money or man power to run a Data Loss Prevention (DLP) tool to monitor the USB devices? The good news is that you can do this on the cheap using Microsoft Windows Event logs and a bit of data crunching effort. In this article we will provide a few ways to collect the logs, but we will ultimately use Splunk to aggregate, process, and display the information. As a bonus, we will not only outline the steps to accomplish this task, but we will also provide working dashboard code at the end of the article.

Figure 1: Dashboard provided at the end of the article

High-level steps

There are two main steps needed to accomplish this task. We need to generate and collect the Windows event logs and then we need to process and display the logs within Splunk. Each is outlined below.

Windows Event Generation
Microsoft logs USB connect and disconnect actions in the following Windows Event Viewer location:  Application and Services Logs > Microsoft > Windows > DriverFrameworks-UserMode > Operational

Unfortunately, this log is disabled by default. Administrators can manually enable it per machine or take action on a larger scale using a login script or other mechanism outlined in the References section below. For this article, we will enable the logs manually by right clicking on “Operational” and selecting “Properties” to show that it is disabled. Check the box to enable these logs. After checking the box, we rebooted for good measure, because hey, this is Windows.

Figure 2:  DriverFrameworks-UserMode enablement and log path


Connect Event IDs
Now that USB connectivity logging is enabled, insert a USB drive and click the refresh button to see some events. You will notice that there are quite a few event IDs associated with connecting a USB device, but fortunately for our situation, not all of them are important. For example, some of the event IDs pertain to USB functions needed to ready the device. For the sake of completeness, the event IDs associated with connecting a device are the following:

  • 2003 – This is a unique event created upon connecting a USB device which contains helpful data
  • 2004
  • 2006
  • 2010
  • 2100
  • 2101
  • 2105
  • 2106


Disconnect Event IDs
Fortunately, there are far fewer event IDs associated with disconnecting a USB device.

  • 2100
  • 2102 – This is a unique event created upon disconnecting a USB device which contains helpful data


Feel free to explore the data within each event but note that we have called out two Event IDs that contain the most amount of data pertaining to connection (2003) and disconnection (2102).

Windows Event Collection
Now that the logs are being generated, they need to be forwarded from the endpoints to a central location—in this case Splunk. This task could be accomplished using a number of methods such as Windows Event Collector (WEC), a Splunk Universal Forwarder agent, or some other forwarding method. For this demo, we will use a Splunk Universal Forwarder shown in next section.

Splunk
While we are assuming a functional Splunk Enterprise installation exists, we still need to collect the logs. We provide a sample Splunk Universal Forwarder configuration file below to help those using the Splunk Universal Forwarder. Note: we will be placing the events into an index called wineventlog. If this index does not already exist, you will first need to create it.

inputs.conf 
Located on the Windows endpoint (Usually found here:  C:\Program Files\SplunkUniversalForwarder\etc\apps\SplunkUniversalForwarder\local\inputs.conf)

WinEventLog://Microsoft-Windows-DriverFrameworks-UserMode/Operational]
index = wineventlog
checkpointInterval = 5
current_only = 0
disabled = 0
start_from = oldest
whitelist = 2003, 2102


Once the inputs.conf file is properly configured (and the universal forwarder restarted) to collect these logs from the endpoint, we need to verify that the logs are reaching Splunk. Try running the following Splunk search:

index=wineventlog 

If you see results, try something more specific, such as either of the following:

index=wineventlog EventCode=2003
index=wineventlog EventCode=2102

Field Extraction

Now that we have the proper Windows Event IDs we need to make sure we can reference the fields. Unfortunately, Windows event logs are a hybrid between human readable and machine readable—which usually means that no one likes to read them. As a result, we need to perform some manual extraction within Splunk to pull out key information such as the USB vendor, product, serial number, and guid. Within Splunk (Settings -> Fields -> Fields extractions) we added the following regex string to enable this parsing:

.*?VEN_(?<vendor>.*?)\&PROD_(?<product>.*?)\&.*?#(?<serialNumber>.*?)&.*?{(?<guid>.*?)}


Figure 3:  Example Field Extraction

Figure 4:  Example Event ID 2003 showing fields are properly extracted


Conclusion

Now that we have the proper event IDs flowing into Splunk and the necessary fields extracted, we created a Removable Storage Connections dashboard. The dashboard provides statistical analysis for connects, disconnects, top vendors, products, serial numbers, and hosts.  It even includes events over time by action and serial number along with the details needed to investigate USB connections. For your convenience, we included the dashboard code below.

Caveats

Per Greg Shultz, “If you find an Event ID 2003 event record for a specific USB flash drive but don't find a corresponding Event ID 2102 event record, that either means that the USB flash drive is still attached to the system or the system was shut down before the device was removed.”

Acknowledgement and References

Big thanks to the following articles which were quite useful:
https://www.techrepublic.com/article/how-to-track-down-usb-flash-drive-usage-in-windows-10s-event-viewer/ 
https://df-stream.com/2014/01/the-windows-7-event-log-and-usb-device/

Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of wineventlog. Feel free to adjust as necessary. Splunk dashboard code provided below:


<form>
  <label>Removable Storage Connections</label>
  <description>index=wineventlog EventCode=2003 &amp; 2102 - Microsoft-Windows-DriverFrameworks-UserMode/Operational"</description>
  <fieldset autoRun="true" submitButton="true">
    <input type="time" token="time">
      <label>Time Range</label>
      <default>
        <earliest>0</earliest>
        <latest></latest>
      </default>
    </input>
    <input type="text" token="wild">
      <label>Wildcard Search</label>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
  </fieldset>
  <row>
    <panel>
      <single>
        <title>Number of Connect Events</title>
        <search>
          <query>index=wineventlog EventCode=2003 USBSTOR $wild$ | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | stats count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">all</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Number of Disconnect Events</title>
        <search>
          <query>index=wineventlog EventCode=2102 USBSTOR $wild$ | transaction maxspan=5s EventCode, ComputerName, serialNumber | dedup _time, ComputerName, serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | stats count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">all</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <table>
        <title>Top Hosts with USB Activity</title>
        <search>
          <query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | transaction maxspan=5s EventCode, ComputerName, serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 ComputerName</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Removable Storage Vendors</title>
        <search>
          <query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | dedup serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 vendor</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Removable Storage Products</title>
        <search>
          <query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | dedup serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 product</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Serial Numbers</title>
        <search>
          <query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | dedup serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 serialNumber</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="drilldown">cell</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Events Over Time</title>
        <search>
          <query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | eval action=case(EventCode == 2003, "Connect", EventCode == 2102, "Disconnect") | table _time, ComputerName, action, EventCode, User, vendor, product, serialNumber | eval ActionSerial = action + ":" + serialNumber | timechart dc(serialNumber) by ActionSerial</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Connect Events (EventCode=2003)</title>
        <search>
          <query>index=wineventlog EventCode=2003 USBSTOR $wild$ | table _time, ComputerName, EventCode, User, vendor, product, serialNumber</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Disconnect Events (EventCode=2102)</title>
        <search>
          <query>index=wineventlog EventCode=2102 USBSTOR $wild$ | transaction maxspan=5s EventCode, ComputerName, serialNumber | dedup _time, ComputerName, serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>


Sunday, October 7, 2018

Troubleshooting Data Sources with Incorrect Times using Splunk

By Tony Lee

Have you ever had a data source that you thought was sending data using the wrong time? This can be a problem since Splunk tries to parse and use the event time instead of the ingest time, this can cause issues when trying to find ingested data. If you suspect this is the case you may be experiencing one of the following scenarios:
  • Systems not using NTP that experience clock drift
  • Systems using broken or faulty NTP
  • Systems using the wrong timezone (ex: Sending events in central time, but specifies GMT)
Depending on the time range selected, this can result in data not showing up within Splunk (or any SIEM) because the data may appear to be in the past or the future.  For example, events that are lagging current time by 5 hours will not show up if "Last 4 hours" is selected for the time range.  In a similar fashion, events that are sent with a future date and time will only show up when the time range selector of "All Time" is selected.

Enough about the problems, let's walk through building one possible solution. As a bonus we provide the dashboard shown below at the bottom of the article.

Figure 1:  Last Communicated Calculator

Dashboard Components

To assist in usability, we provide a drop down input at the top that contains a list of the indexes.  This is list of indexes is populated dynamically. This is derived using the dbinspect command which contains data about existing indexes within Splunk. The following creates the drop down input in the dashboard.

| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index


The upper (host detail) panel consists of columns indicating the host, total count, first written time, last written time and so on--perfect information to determine time issues.  This information can be found using the metadata command which can quickly query info about hosts, sources, and sourcetypes. In this case, we care about the hosts.

| metadata index=<index we care about> type=hosts


The lower panel (a time-based area chart), represents the volume of data at a given time for a given host. We used the tstats command that we covered in previous article, but looks like the following:

| tstats prestats=t count where index=<index we care about> AND host=<host we care about> by host, _time | timechart useother=false count by host


It is certainly noteworthy that every search on this dashboard uses metadata and that's why it is so quick to discover these details. As a result, you will probably notice that there is no time wasted waiting for the search to return as the data renders almost instantly.

Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight into the last time a host communicated. This can be used to identify, troubleshoot, and finally confirm the time being reported by devices. We hope this article helps you troubleshoot these very frustrating issues. Enjoy!

Dashboard XML code

Below is the dashboard code needed to see the Last Communicated Times for hosts by Index.  Feel free to modify the dashboard as needed:

<form>
  <label>Last Communicated Calculator</label>
  <description>Select an Index (or Indexes) - High Number is bad...</description>
  <fieldset submitButton="true">
    <input type="time" token="time">
      <label>Time Range</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="multiselect" token="index">
      <label>Index</label>
      <fieldForLabel>Index</fieldForLabel>
      <fieldForValue>index</fieldForValue>
      <search>
        <query>| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index | sort index</query>
        <earliest>-30d@d</earliest>
        <latest>now</latest>
      </search>
      <valuePrefix>index=</valuePrefix>
      <delimiter> OR </delimiter>
    </input>
    <input type="text" token="host">
      <label>Host</label>
      <default>*</default>
    </input>
  </fieldset>
  <row>
    <panel>
      <table>
        <title>Hosts</title>
        <search>
          <query>| metadata $index$ type=hosts | dedup host | eval currentTime=now() | eval seconds=now()-lastTime | eval minutes=(seconds/60) | eval hours=(minutes/60) | convert ctime(lastTime) ctime(firstTime) ctime(currentTime) | table host, totalCount, firstTime, lastTime, currentTime, hours, minutes, seconds | sort - seconds | rename hours AS "Last Comm (in hrs)", minutes AS "Last Comm (in mins)", seconds AS "Last Comm (in secs)" | search host=$host$</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">true</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Visual (Keep in mind your time range.  Anything beyond the time range will not show up)</title>
        <search>
          <query>| tstats prestats=t count where $index$ AND host=$host$ by host, _time | timechart useother=false count by host</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
</form>

Wednesday, September 19, 2018

Spelunking your Splunk – Part V (Splunk Stats)

By Tony Lee

Welcome to the fifth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance.  Here is a quick recap of the previous articles:


This article focuses on understanding your Splunk environment at a high-level.  Have you ever wondered the following?


  • How many events ingested over a user-defined time period
  • How that equates to events per second (EPS)
  • The distinct host count
  • Number of indexes with data
  • Number of sourcetypes
  • Number of sources
  • Visually what the data ingest looks like by total event count and by index

This dashboard will give it to you and do it fast!  As a bonus we will provide the dashboard code at the end of the article.

Figure 1:  Splunk Stats dashboard


Finding detailed index information quickly

There are at least two places within Splunk to discover index information. The first uses a RESTful call and provides detailed information about indexes. The second requires more calculation and is less efficient. For this exercise, lets try copying and pasting the following RESTful search into your Splunk search bar to see what data is returned:

| rest /services/data/indexes-extended


Figure 2:  Results of the restful search (remember to scroll right)


| dbinspect index=*

Figure 3:  Column headers from dbinspect (remember to scroll right)

Now try the following which combines both (thank you Splunk!):

| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention


Now that you understand the basics, the sky is the limit.  :-)

Finding source, sourcetype, and host data quickly 

You may remember from the first article of this series (Spelunking your Splunk Part I (Exploring Your Data) called tstats.  In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly.  These indexed fields by default are index, source, sourcetype, and host.  It just so happens that these are the fields that we need to understand the environment.  Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time.  This works great for our dashboard!


Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight.  We hope this helps you too.  Enjoy!


Dashboard XML code


Below is the dashboard code needed to enumerate your Splunk stats.  Feel free to modify the dashboard as needed:

<form>
  <label>Splunk Stats</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="time">
      <label>Time Range Selector</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <single>
        <title>Distinct Events</title>
        <search>
          <query>| tstats count where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Events Per Second (EPS)</title>
        <search>
          <query>| tstats count where index=* | addinfo | eval diff = info_max_time - info_min_time | eval EPS = count / diff | table EPS</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Hosts</title>
        <search>
          <query>| tstats dc(host) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention | stats count</query>
          <earliest>0</earliest>
          <latest></latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sourcetypes</title>
        <search>
          <query>| tstats dc(sourcetype) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sources</title>
        <search>
          <query>| tstats dc(source) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Total Event Count Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by _time | timechart  count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
    <panel>
      <chart>
        <title>Event Count by Index Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by index, _time | timechart count by index</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>