Wednesday, September 19, 2018

Spelunking your Splunk – Part V (Splunk Stats)

By Tony Lee

Welcome to the fifth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance.  Here is a quick recap of the previous articles:


This article focuses on understanding your Splunk environment at a high-level.  Have you ever wondered the following?


  • How many events ingested over a user-defined time period
  • How that equates to events per second (EPS)
  • The distinct host count
  • Number of indexes with data
  • Number of sourcetypes
  • Number of sources
  • Visually what the data ingest looks like by total event count and by index

This dashboard will give it to you and do it fast!  As a bonus we will provide the dashboard code at the end of the article.

Figure 1:  Splunk Stats dashboard


Finding detailed index information quickly

There are at least two places within Splunk to discover index information. The first uses a RESTful call and provides detailed information about indexes. The second requires more calculation and is less efficient. For this exercise, lets try copying and pasting the following RESTful search into your Splunk search bar to see what data is returned:

| rest /services/data/indexes-extended


Figure 2:  Results of the restful search (remember to scroll right)


| dbinspect index=*

Figure 3:  Column headers from dbinspect (remember to scroll right)

Now try the following which combines both (thank you Splunk!):

| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention


Now that you understand the basics, the sky is the limit.  :-)

Finding source, sourcetype, and host data quickly 

You may remember from the first article of this series (Spelunking your Splunk Part I (Exploring Your Data) called tstats.  In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly.  These indexed fields by default are index, source, sourcetype, and host.  It just so happens that these are the fields that we need to understand the environment.  Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time.  This works great for our dashboard!


Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight.  We hope this helps you too.  Enjoy!


Dashboard XML code


Below is the dashboard code needed to enumerate your Splunk stats.  Feel free to modify the dashboard as needed:

<form>
  <label>Splunk Stats</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="time">
      <label>Time Range Selector</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <single>
        <title>Distinct Events</title>
        <search>
          <query>| tstats count where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Events Per Second (EPS)</title>
        <search>
          <query>| tstats count where index=* | addinfo | eval diff = info_max_time - info_min_time | eval EPS = count / diff | table EPS</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Hosts</title>
        <search>
          <query>| tstats dc(host) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention | stats count</query>
          <earliest>0</earliest>
          <latest></latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sourcetypes</title>
        <search>
          <query>| tstats dc(sourcetype) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sources</title>
        <search>
          <query>| tstats dc(source) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Total Event Count Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by _time | timechart  count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
    <panel>
      <chart>
        <title>Event Count by Index Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by index, _time | timechart count by index</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>

Tuesday, September 4, 2018

Troubleshooting Windows Account Lockouts with Splunk - Part III

By Tony Lee

Welcome to part III of the series dedicated to troubleshooting Windows account lockouts using Splunk. In this article we will give you a dashboard that we affectionately named Lockout Hunter. It combines the knowledge (and some dashboard panels) from both part I and part II of this series into a single interactive dashboard that allows users to drilldown on data without leaving the dashboard. You will notice in the screenshot below that the first row is event ID 4740 related panels. The far right two panels are hyperlink clickable and will cause the second row of event ID 4625 events to populate. This filter can be cleared by clicking the "Reset Filters" link or clicking on a different user or computer.

Figure 1:  Lockout Hunter Dashboard

Background

In part I (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) of the series, we highlighted and examined a 4740 event pulled from a domain controller. This 4740 event contained the following information:
  • The domain controller that handled the authentication request and reported the lockout
  • Domain name
  • Account name
  • The original host where the account attempted authentication

In part II (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-ii.html) of the series, we highlighted and examined a 4625 event (and Event ID 529 for EOL operating systems) pulled from workstations. The most important takeaways from this event are:

  • Why the authentication attempt is failing
  • The actual process (caller process name) failing authentication
When combined these two log sources are quite powerful.

Conclusion

We wanted to take lockout hunting up one more notch by releasing the lockout hunter dashboard. Our original intention was to help security practitioners find brute force attempts via account lockouts, however it ended up having a huge impact with ITOps. These dashboards have saved help desks quite a few hours in determining the root cause for account lockout tickets. We hope you find them useful too. Happy Splunking!


Dashboard Code

The following dashboard code relies on the index name of wineventlog.  If this is not your Windows event log index, just change it to suit your needs. Also, the past few cases we worked had either a Qualys on Nessus scanner generating some noise. We left the Qualys filter in but disabled it.  Feel free to also tweak that as needed.  Be sure to name the dashboard lockout_hunter so the "Reset Filters" link works properly.

<form>
  <label>Lockout Hunter - 4740 &amp; 4625</label>
  <description>Click on Top User or Top Caller_Computer_Name to pivot on the next row</description>
  <fieldset submitButton="true">
    <input type="time" searchWhenChanged="true" token="time">
      <label>Time Range</label>
      <default>
        <earliest>-4h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="user" searchWhenChanged="true">
      <label>User</label>
      <default>*</default>
    </input>
    <input type="text" token="src" searchWhenChanged="true">
      <label>Source</label>
      <default>*</default>
    </input>
    <input type="text" searchWhenChanged="true" token="wild">
      <label>Wildcard Search</label>
      <default>*</default>
    </input>
    <input type="radio" searchWhenChanged="true" token="notqualys">
      <label>Exclude Qualys</label>
      <choice value="NOT Qualys">Yes</choice>
      <choice value="*">No</choice>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
  </fieldset>
       <row>
     <panel>
       <html>
       <u1><h3>Event ID 4740 row - Click a user or host below to drill in on the second row</h3></u1>
       <a href="lockout_hunter?form.user=*&amp;form.src=*" style="margin-left:0px">Reset Filters</a>      
     </html>
     </panel>
   </row>
  <row>
    <panel>
      <table>
        <title>Top Domain</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Reporting Server</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 dvc</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top User (Click pivots to 4625)</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <drilldown>
          <set token="form.user">$click.value$</set>
        </drilldown>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Caller_Computer_Name (Click pivots to 4625)</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Caller_Computer_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <drilldown>
          <set token="form.src">$click.value$</set>
        </drilldown>
      </table>
    </panel>
  </row>
     <row>
     <panel>
       <html>
       <u1><h3>Event ID 4625 and 529 logs from the hosts</h3></u1>
     </html>
     </panel>
   </row>
  <row>
    <panel>
      <table>
        <title>Top Failure_Reason</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Failure_Reason</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Domain</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top User</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top src</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 src</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Top Process</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Caller_Process_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Status</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Status</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>10 Day Glance of Total Lockouts (Independent of Dashboard Time Range Input) :</title>
      <chart>
        <title>Unique Lockouts per 2 minutes</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" |bin _time span=2min|dedup user _time| timechart count span=1h</query>
          <earliest>-10d@d</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisLabelsY.majorUnit">25</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.maximumNumber">285</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">column</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="refresh.display">progressbar</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
  </row>
</form>