Sunday, October 7, 2018

Troubleshooting Data Sources with Incorrect Times using Splunk

By Tony Lee

Have you ever had a data source that you thought was sending data using the wrong time? This can be a problem since Splunk tries to parse and use the event time instead of the ingest time, this can cause issues when trying to find ingested data. If you suspect this is the case you may be experiencing one of the following scenarios:
  • Systems not using NTP that experience clock drift
  • Systems using broken or faulty NTP
  • Systems using the wrong timezone (ex: Sending events in central time, but specifies GMT)
Depending on the time range selected, this can result in data not showing up within Splunk (or any SIEM) because the data may appear to be in the past or the future.  For example, events that are lagging current time by 5 hours will not show up if "Last 4 hours" is selected for the time range.  In a similar fashion, events that are sent with a future date and time will only show up when the time range selector of "All Time" is selected.

Enough about the problems, let's walk through building one possible solution. As a bonus we provide the dashboard shown below at the bottom of the article.

Figure 1:  Last Communicated Calculator

Dashboard Components

To assist in usability, we provide a drop down input at the top that contains a list of the indexes.  This is list of indexes is populated dynamically. This is derived using the dbinspect command which contains data about existing indexes within Splunk. The following creates the drop down input in the dashboard.

| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index


The upper (host detail) panel consists of columns indicating the host, total count, first written time, last written time and so on--perfect information to determine time issues.  This information can be found using the metadata command which can quickly query info about hosts, sources, and sourcetypes. In this case, we care about the hosts.

| metadata index=<index we care about> type=hosts


The lower panel (a time-based area chart), represents the volume of data at a given time for a given host. We used the tstats command that we covered in previous article, but looks like the following:

| tstats prestats=t count where index=<index we care about> AND host=<host we care about> by host, _time | timechart useother=false count by host


It is certainly noteworthy that every search on this dashboard uses metadata and that's why it is so quick to discover these details. As a result, you will probably notice that there is no time wasted waiting for the search to return as the data renders almost instantly.

Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight into the last time a host communicated. This can be used to identify, troubleshoot, and finally confirm the time being reported by devices. We hope this article helps you troubleshoot these very frustrating issues. Enjoy!

Dashboard XML code

Below is the dashboard code needed to see the Last Communicated Times for hosts by Index.  Feel free to modify the dashboard as needed:

<form>
  <label>Last Communicated Calculator</label>
  <description>Select an Index (or Indexes) - High Number is bad...</description>
  <fieldset submitButton="true">
    <input type="time" token="time">
      <label>Time Range</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="multiselect" token="index">
      <label>Index</label>
      <fieldForLabel>Index</fieldForLabel>
      <fieldForValue>index</fieldForValue>
      <search>
        <query>| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index</query>
        <earliest>-30d@d</earliest>
        <latest>now</latest>
      </search>
      <valuePrefix>index=</valuePrefix>
      <delimiter> OR </delimiter>
    </input>
    <input type="text" token="host">
      <label>Host</label>
      <default>*</default>
    </input>
  </fieldset>
  <row>
    <panel>
      <table>
        <title>Hosts</title>
        <search>
          <query>| metadata $index$ type=hosts | dedup host | eval currentTime=now() | eval seconds=now()-lastTime | eval minutes=(seconds/60) | eval hours=(minutes/60) | convert ctime(lastTime) ctime(firstTime) ctime(currentTime) | table host, totalCount, firstTime, lastTime, currentTime, hours, minutes, seconds | sort - seconds | rename hours AS "Last Comm (in hrs)", minutes AS "Last Comm (in mins)", seconds AS "Last Comm (in secs)" | search host=$host$</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">true</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Visual (Keep in mind your time range.  Anything beyond the time range will not show up)</title>
        <search>
          <query>| tstats prestats=t count where $index$ AND host=$host$ by host, _time | timechart useother=false count by host</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
</form>

Wednesday, September 19, 2018

Spelunking your Splunk – Part V (Splunk Stats)

By Tony Lee

Welcome to the fifth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance.  Here is a quick recap of the previous articles:


This article focuses on understanding your Splunk environment at a high-level.  Have you ever wondered the following?


  • How many events ingested over a user-defined time period
  • How that equates to events per second (EPS)
  • The distinct host count
  • Number of indexes with data
  • Number of sourcetypes
  • Number of sources
  • Visually what the data ingest looks like by total event count and by index

This dashboard will give it to you and do it fast!  As a bonus we will provide the dashboard code at the end of the article.

Figure 1:  Splunk Stats dashboard


Finding detailed index information quickly

There are at least two places within Splunk to discover index information. The first uses a RESTful call and provides detailed information about indexes. The second requires more calculation and is less efficient. For this exercise, lets try copying and pasting the following RESTful search into your Splunk search bar to see what data is returned:

| rest /services/data/indexes-extended


Figure 2:  Results of the restful search (remember to scroll right)


| dbinspect index=*

Figure 3:  Column headers from dbinspect (remember to scroll right)

Now try the following which combines both (thank you Splunk!):

| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention


Now that you understand the basics, the sky is the limit.  :-)

Finding source, sourcetype, and host data quickly 

You may remember from the first article of this series (Spelunking your Splunk Part I (Exploring Your Data) called tstats.  In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly.  These indexed fields by default are index, source, sourcetype, and host.  It just so happens that these are the fields that we need to understand the environment.  Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time.  This works great for our dashboard!


Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight.  We hope this helps you too.  Enjoy!


Dashboard XML code


Below is the dashboard code needed to enumerate your Splunk stats.  Feel free to modify the dashboard as needed:

<form>
  <label>Splunk Stats</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="time">
      <label>Time Range Selector</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <single>
        <title>Distinct Events</title>
        <search>
          <query>| tstats count where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Events Per Second (EPS)</title>
        <search>
          <query>| tstats count where index=* | addinfo | eval diff = info_max_time - info_min_time | eval EPS = count / diff | table EPS</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Hosts</title>
        <search>
          <query>| tstats dc(host) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention | stats count</query>
          <earliest>0</earliest>
          <latest></latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sourcetypes</title>
        <search>
          <query>| tstats dc(sourcetype) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
    <panel>
      <single>
        <title>Distinct Sources</title>
        <search>
          <query>| tstats dc(source) where index=*</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </single>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Total Event Count Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by _time | timechart  count</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
    <panel>
      <chart>
        <title>Event Count by Index Over Time</title>
        <search>
          <query>| tstats prestats=t count where index=* by index, _time | timechart count by index</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">area</option>
        <option name="charting.drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Indexes with Data</title>
        <search>
          <query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended 
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime &gt;= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", " "), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", " "), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days" 
| rename title AS index] | fields index raw_size_gb event_count buckets  minTime maxTime retention
            | rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event"  maxTime AS "Latest Event"   retention AS Retention</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
</form>

Tuesday, September 4, 2018

Troubleshooting Windows Account Lockouts with Splunk - Part III

By Tony Lee

Welcome to part III of the series dedicated to troubleshooting Windows account lockouts using Splunk. In this article we will give you a dashboard that we affectionately named Lockout Hunter. It combines the knowledge (and some dashboard panels) from both part I and part II of this series into a single interactive dashboard that allows users to drilldown on data without leaving the dashboard. You will notice in the screenshot below that the first row is event ID 4740 related panels. The far right two panels are hyperlink clickable and will cause the second row of event ID 4625 events to populate. This filter can be cleared by clicking the "Reset Filters" link or clicking on a different user or computer.

Figure 1:  Lockout Hunter Dashboard

Background

In part I (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) of the series, we highlighted and examined a 4740 event pulled from a domain controller. This 4740 event contained the following information:
  • The domain controller that handled the authentication request and reported the lockout
  • Domain name
  • Account name
  • The original host where the account attempted authentication

In part II (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-ii.html) of the series, we highlighted and examined a 4625 event (and Event ID 529 for EOL operating systems) pulled from workstations. The most important takeaways from this event are:

  • Why the authentication attempt is failing
  • The actual process (caller process name) failing authentication
When combined these two log sources are quite powerful.

Conclusion

We wanted to take lockout hunting up one more notch by releasing the lockout hunter dashboard. Our original intention was to help security practitioners find brute force attempts via account lockouts, however it ended up having a huge impact with ITOps. These dashboards have saved help desks quite a few hours in determining the root cause for account lockout tickets. We hope you find them useful too. Happy Splunking!


Dashboard Code

The following dashboard code relies on the index name of wineventlog.  If this is not your Windows event log index, just change it to suit your needs. Also, the past few cases we worked had either a Qualys on Nessus scanner generating some noise. We left the Qualys filter in but disabled it.  Feel free to also tweak that as needed.  Be sure to name the dashboard lockout_hunter so the "Reset Filters" link works properly.

<form>
  <label>Lockout Hunter - 4740 &amp; 4625</label>
  <description>Click on Top User or Top Caller_Computer_Name to pivot on the next row</description>
  <fieldset submitButton="true">
    <input type="time" searchWhenChanged="true" token="time">
      <label>Time Range</label>
      <default>
        <earliest>-4h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="user" searchWhenChanged="true">
      <label>User</label>
      <default>*</default>
    </input>
    <input type="text" token="src" searchWhenChanged="true">
      <label>Source</label>
      <default>*</default>
    </input>
    <input type="text" searchWhenChanged="true" token="wild">
      <label>Wildcard Search</label>
      <default>*</default>
    </input>
    <input type="radio" searchWhenChanged="true" token="notqualys">
      <label>Exclude Qualys</label>
      <choice value="NOT Qualys">Yes</choice>
      <choice value="*">No</choice>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
  </fieldset>
       <row>
     <panel>
       <html>
       <u1><h3>Event ID 4740 row - Click a user or host below to drill in on the second row</h3></u1>
       <a href="lockout_hunter?form.user=*&amp;form.src=*" style="margin-left:0px">Reset Filters</a>      
     </html>
     </panel>
   </row>
  <row>
    <panel>
      <table>
        <title>Top Domain</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Reporting Server</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 dvc</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">none</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top User (Click pivots to 4625)</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <drilldown>
          <set token="form.user">$click.value$</set>
        </drilldown>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Caller_Computer_Name (Click pivots to 4625)</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Caller_Computer_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <drilldown>
          <set token="form.src">$click.value$</set>
        </drilldown>
      </table>
    </panel>
  </row>
     <row>
     <panel>
       <html>
       <u1><h3>Event ID 4625 and 529 logs from the hosts</h3></u1>
     </html>
     </panel>
   </row>
  <row>
    <panel>
      <table>
        <title>Top Failure_Reason</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Failure_Reason</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Domain</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top User</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top src</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 src</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Top Process</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Caller_Process_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Status</title>
        <search>
          <query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Status</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>10 Day Glance of Total Lockouts (Independent of Dashboard Time Range Input) :</title>
      <chart>
        <title>Unique Lockouts per 2 minutes</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" |bin _time span=2min|dedup user _time| timechart count span=1h</query>
          <earliest>-10d@d</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisLabelsY.majorUnit">25</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.maximumNumber">285</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">column</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="refresh.display">progressbar</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
  </row>
</form>

Thursday, August 30, 2018

Troubleshooting Windows Account Lockouts with Splunk - Part II

By Tony Lee

Welcome to part II of the series dedicated to troubleshooting Windows account lockouts using Splunk. In part I (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) of the series, we highlighted and examined a 4740 event pulled from a domain controller. This 4740 event contained the following information:

  • The domain controller that handled the authentication request and reported the lockout
  • Domain name
  • Account name
  • The original host where the account attempted authentication

In this article we will look at a 4625 event from the originating host because it will contain further authentication details such as the reason for failure and the application that is attempting to authenticate. Our dashboard provided at the end of the article with also include searches for Event ID 529 to include Windows operating systems that are end of life (EOL).


Figure 1:  Combining Event ID 4740 and Event ID 4625 to gain more insight into account lockout

Examine the Problem

As we did with the 4740 event, we will now examine a fictional 4625 event and we will highlight and summarize the key points below. This fictional 4625 event was pulled from a host indicated by the 4740 event pulled from the domain controller.

LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4625
EventType=0
Type=Information
ComputerName=WIN-R9H5Y.MYFAKEDOMAIN.COM
TaskCategory=Logon
OpCode=Info
RecordNumber=267153
Keywords=Audit Failure
Message=An account failed to log on.

Subject:
Security ID: S-1-5-18
Account Name: WIN-R9H5Y$
Account Domain: MYFAKEDOMAIN
Logon ID: 0x3E7

Logon Type: 8

Account For Which Logon Failed:
Security ID: S-1-0-0
Account Name: John
Account Domain: MyFakeDomain.com

Failure Information:
Failure Reason: Unknown user name or bad password.
Status: 0xC000006D
Sub Status: 0xC000006A

Process Information:
Caller Process ID: 0x5aac
Caller Process Name: C:\Windows\System32\inetsrv\w3wp.exe

Network Information:
Workstation Name: WIN-R9H5Y
Source Network Address: 192.1.1.100
Source Port: 49770

Detailed Authentication Information:
Logon Process: Advapi  
Authentication Package: Negotiate
Transited Services: -
Package Name (NTLM only): -
Key Length: 0


The most important takeaways from this event are:

  • Failure Reason:  In this case it was an unknown username or password.  We know the username is correct, so it must be a bad password
  • Caller Process Name:  A quick Google search for w3wp.exe shows that it is most likely associated with an Exchange server running IIS.
After pulling a few more events, we see several more bad passwords and then the eventual lockout. Common causes for account lockouts indicated by this process are mobile devices (phone or tablet) that contain stale credentials. The mobile device continues to attempt to authenticate until it locks out the account. Mystery solved!


Conclusion

Even though we presented fictional event logs, this example is based on real situations. Fortunately we had the 4740 events from the domain controllers and we were collecting the 4625 logs from the rest of the servers (and some workstations). It would be very difficult and time consuming to perform this sort of correlation without a central point of aggregation such as Splunk. Even if you were to do this manually for one or two instances, you would not want to do it for the entire enterprise. To make your life easier, we are including dashboard code in the section below to display the 4625 events. We eventually added some workflow integration between the 4740 dashboard provided in the previous article and the 4625 dashboard below, but we will leave that exercise up to the reader. Have fun and happy Splunking.


Dashboard Code

The following dashboard code relies on the index name of wineventlog.  If this is not your Windows event log index, just change it to suit your needs. Also, the past few cases we worked had either a Qualys on Nessus scanner generating some noise. We left the Qualys filter in but disabled it.  Feel free to also tweak that as needed.

<form>
  <label>Auth Examination - 4625</label>
  <description>Event ID 4625 or 529</description>
  <fieldset submitButton="true">
    <input type="time" token="time" searchWhenChanged="true">
      <label>Time Range</label>
      <default>
        <earliest>-4h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="wild" searchWhenChanged="true">
      <label>Wildcard Search</label>
      <default>*</default>
    </input>
    <input type="radio" token="notqualys" searchWhenChanged="true">
      <label>Exclude Qualys</label>
      <choice value="NOT Qualys">Yes</choice>
      <choice value="*">No</choice>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
  </fieldset>
  <row>
    <panel>
      <table>
        <title>Top Failure_Reason</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Failure_Reason</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Domain</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top User</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top src</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 src</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Top Process</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Caller_Process_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Top Status</title>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Status</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Timechart by Account_Name</title>
      <chart>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by user</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">area</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
    <panel>
      <title>Timechart by reporting host</title>
      <chart>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by dvc</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">area</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <title>Timechart by Account_Domain</title>
      <chart>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by Account_Domain</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">area</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
    <panel>
      <title>Timechart by src</title>
      <chart>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by src</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">area</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">none</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
        <option name="trellis.enabled">0</option>
        <option name="trellis.scales.shared">1</option>
        <option name="trellis.size">medium</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <title>Details</title>
      <table>
        <search>
          <query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="drilldown">cell</option>
      </table>
    </panel>
  </row>
</form>