Monday, August 6, 2018

Detecting Data Feed Issues with Splunk - Part II

by Tony Lee

As a Splunk admin, you don’t always control the devices that generate your data. As a result, you may only have control of the data once it reaches Splunk. But what happens when that data stops being sent to Splunk? How long does it take anyone to notice and how much data is lost in the meantime?

We have seen many customers struggle with monitoring and detecting data feed issues so we figured we would cast some light onto the subject. Part I of this series discusses the challenges and steps required to build a potential solution. We highly recommend a quick read since it lays the ground work for the dashboard shown here:  http://www.securitysynapse.com/2017/11/detecting-data-feed-issues-with-splunk.html

In this article, we build on that work and provide a handy dashboard (screenshot shown below) that can be used for heads up awareness.


Figure 1:  Data Feed Monitor dashboard

Dashboard Explanation

The search to generate the percentage drop is similar to the search we created in part I of this series.  It looks back over the past two days and calculates the last two days worth of traffic. It takes the difference and generates a percentage drop. Anything over 50% drop will be displayed as a tile. Notice that we are also excluding a few indexes such as test, main, and lastchanceindex.  This can also be customized depending on your needs.
 
| tstats prestats=t count where earliest=-2d@d latest=-0d@d index!=lastchanceindex index!=test index=* by index, _time span=1d | timechart useother=false limit=0 span=1d count by index | eval _time=strftime(_time,"%Y-%m-%d") | transpose | rename column AS DataSource, "row 1" AS TwoDaysAgo, "row 2" AS Yesterday | eval PercentageDiff=(100-((Yesterday/TwoDaysAgo)*100)) | where PercentageDiff>50 AND DataSource!="catch_all" | table DataSource, PercentageDiff | eval tmp="anything" | xyseries tmp DataSource PercentageDiff | fields - tmp | sort PercentageDiff


The dashboard code uses a trellis layout where tiles are dynamically created when the percentage drop exceeds 50%.  Then range colors are used to indicate severity.  Anything below 50% (which typically is not shown is green, 50 - 80% is yellow, and over 80% is red.  These can also be customized to fit your needs.

Conclusion

This dashboard can be one more tool use to help detect data loss. It is not as real-time as it could be, but if it is made too real-time, there can be false positives when legitimate dips in traffic occur (e.g. employees go home for the day). Because you have the code, you are welcome to adjust it as needed to fit your situation.  Enjoy!

Dashboard Code

<dashboard>
  <label>Data Feed Monitor</label>
  <description>Percentage Drop Shown Below</description>
  <row>
    <panel>
      <single>
        <search>
          <query>| tstats prestats=t count where earliest=-2d@d latest=-0d@d index!=test index!=main index!=lastchanceindex index=* by index, _time span=1d | timechart useother=false limit=0 span=1d count by index | eval _time=strftime(_time,"%Y-%m-%d") | transpose | rename column AS DataSource, "row 1" AS TwoDaysAgo, "row 2" AS Yesterday | eval PercentageDiff=(100-((Yesterday/TwoDaysAgo)*100)) | where PercentageDiff&gt;50 AND DataSource!="catch_all" | table DataSource, PercentageDiff | eval tmp="anything" | xyseries tmp DataSource PercentageDiff | fields - tmp | sort PercentageDiff</query>
          <earliest>-48h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="colorBy">value</option>
        <option name="colorMode">block</option>
        <option name="drilldown">none</option>
        <option name="numberPrecision">0</option>
        <option name="rangeColors">["0x65a637","0xf58f39","0xd93f3c"]</option>
        <option name="rangeValues">[50,80]</option>
        <option name="refresh.display">progressbar</option>
        <option name="showSparkline">1</option>
        <option name="showTrendIndicator">1</option>
        <option name="trellis.enabled">1</option>
        <option name="trellis.scales.shared">0</option>
        <option name="trellis.size">medium</option>
        <option name="trellis.splitBy">DataSource</option>
        <option name="trendColorInterpretation">standard</option>
        <option name="trendDisplayMode">absolute</option>
        <option name="unit">%</option>
        <option name="unitPosition">after</option>
        <option name="useColors">1</option>
        <option name="useThousandSeparators">1</option>
      </single>
    </panel>
  </row>
</dashboard>

Monday, July 23, 2018

Splunk Vulnerability Lookup Tool Using the Qualys Knowledge Base

By Tony Lee

Are you a Splunk + Qualys customer? If so, are you downloading the Qualys Knowledge Base data? Hint: This us usually accomplished by enabling the Qualys TA knowledge base input.  Chances are pretty good that you are since that data is used by the Qualys Splunk app to map Qualys QID codes to human readable names of vulnerabilities.

While this is very useful for the Qualys app's dashboards, we took the by-product of the mapping to the next level by creating a Vulnerability Lookup dashboard (see Figure 1 below) to be used by humans in a more flexible way that has nothing to do with the Qualys scans themselves. This dashboard provides SOC analysts the ability to search the knowledge base by QID, title of the vulnerability, CVE, and even vendor reference numbers such as MS or KB numbers.  Best of all, we included the code at the bottom of the article for anyone to use.  :-)


Figure 1:  Vulnerability Lookup dashboard


Understanding the Data

Once the Knowledge Base data is downloaded to the search head (per Qualys instructions), try to search for it. In a Splunk search box, copy and paste the following.

| inputlookup qualys_kb_lookup

If you see results, you are all set to use the dashboard code at the bottom of the article.

Figure 2:  Sample KB data.  If you see data returned with this query, you should be good to go.


Conclusion

If you are going to spend the time and resources downloading the Qualys Knowledge Base, you might as well benefit twice by getting a handy localized vulnerability lookup tool at no extra cost. We hope this proves useful to others.  Enjoy!



Dashboard Code

<form>
  <label>Vulnerability Lookup</label>
  <description>Enter the known field below</description>
  <!-- Add time range picker -->
  <fieldset autoRun="false" submitButton="true">
    <input type="text" searchWhenChanged="true" token="qid">
      <label>Enter the QID.  ex: 90464</label>
      <default>*</default>
    </input>
    <input type="text" searchWhenChanged="true" token="title">
      <label>Enter the Title.  ex: *August 2017*</label>
      <default>*</default>
    </input>
    <input type="text" searchWhenChanged="true" token="cve">
      <label>Enter the CVE.  ex: *2017-0272*</label>
      <default>*</default>
    </input>
    <input type="text" searchWhenChanged="true" token="vr">
      <label>Enter the Vendor Reference (MS or KB).  ex: *08-067* or *4022747*</label>
      <default>*</default>
    </input>
  </fieldset>
  <row>
    <panel>
      <table>
        <title>Details</title>
        <search>
          <query>| inputlookup qualys_kb_lookup | rename VULN_TYPE as TYPE | table  QID, SEVERITY, TYPE, TITLE, CATEGORY, PATCHABLE, CVSS_BASE, CVSS_TEMPORAL, CVE, VENDOR_REFERENCE, PUBLISHED_DATETIME | fillnull | search TITLE="$title$" QID=$qid$ CVE=$cve$ VENDOR_REFERENCE=$vr$</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

Sunday, July 8, 2018

Spelunking your Splunk – Part IV (User Metrics)

By Tony Lee

Welcome to the fourth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance.  Here is a quick recap of the previous articles:



This article will focus on understanding the users within the environment--even when spread over a search head cluster. We will show you that it is possible to check the amount of concurrent Splunk users, how much they are searching, successful and failed logins and aged accounts. This information is useful not only from an accountability perspective, but also from a resource perspective. When a search head (or cluster) becomes overloaded with users, it may be a good time to consider horizontal scaling.

Finding and understanding user information

There are at least two places within Splunk to discover user information. The first requires a RESTful call and provides information about authenticated users. The second is a search against the _audit index filtering on user activity. Try copying and pasting the following two searches into your Splunk search bar one at a time to see what data is returned:

| rest /services/authentication/httpauth-tokens splunk_server=*

Figure 1:  Current authenticated users via httpauth-tokens


index=_audit user=*

Figure 2:  _audit index with a focus on user activity

Now that you understand the basics, the sky is the limit. You can audit each user or display the statistics for all users. Take a look at our dashboard below to see what is possible. If you find it useful, we provide the code for it at the bottom of this article. Give it a try and let us know what you think.

Figure 3:  User Metrics dashboard with all panels



Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for monitoring gaining additional insight.  We hope this helps you too.  Enjoy!


Dashboard XML code


Below is the dashboard code needed to enumerate your user metrics.  Feel free to modify the dashboard as needed:

<form>
  <label>User Metrics</label>
  <description>Displays Interesting Usage Metrics</description>
  <!-- Add time range picker -->
  <fieldset autoRun="True">
    <input type="time" searchWhenChanged="true">
      <default>
        <earliestTime>-24h@h</earliestTime>
        <latestTime>now</latestTime>
      </default>
    </input>
    <input type="text" token="wild">
      <label>Search</label>
      <default>*</default>
      <suffix/>
    </input>
  </fieldset>
  <row>
    <panel>
      <chart>
        <title>Current Active Users</title>
        <search>
          <query>| rest /services/authentication/httpauth-tokens splunk_server=* | where NOT userName="splunk-system-user" | stats dc(userName) AS "Total Users"</query>
          <earliest>$earliest$</earliest>
          <latest>$latest$</latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">false</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">fillerGauge</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
      </chart>
    </panel>
    <panel>
      <table>
        <title>Current Logged in Users</title>
        <search>
          <query>| rest /services/authentication/httpauth-tokens splunk_server=* | where NOT userName ="splunk-system-user" | stats max(timeAccessed) AS "Latest Activity" by userName | rename userName AS "User" | sort -"Latest Activity"</query>
          <earliest>$earliest$</earliest>
          <latest>$latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Total Searches</title>
        <search>
          <query>index=_audit user=* (action="search" AND info="granted") | where NOT user ="splunk-system-user" | stats count(action) AS Searches by user | sort - Searches</query>
          <earliest>$earliest$</earliest>
          <latest>$latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>Successful Logins</title>
        <search>
          <query>index=_audit user=* (action="login attempt" AND info="succeeded") | stats count(action) AS Logins by user | rename user AS User, Logins AS Successes | sort - Successes</query>
          <earliest>$earliest$</earliest>
          <latest>$latest$</latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Failed Logins</title>
        <search>
          <query>index=_audit user=* (action="login attempt" AND info="failed") | stats count(action) AS Logins by user | rename user AS User, Logins AS Failures | sort - Failures</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>Aged Accounts (15 days or older)</title>
        <search>
          <query>index=_audit user=* (action="login attempt" AND info="succeeded") | dedup user | eval age_days=round((now()-_time)/(60*60*24)) | where age_days &gt;= 15 | eval time=strftime(_time, "%m/%d/%Y %H:%M:%S") | table user, time, age_days | sort -age_days</query>
          <earliest>-15d@d</earliest>
          <latest>now</latest>
        </search>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="count">10</option>
      </table>
    </panel>
  </row>
</form>

Wednesday, December 20, 2017

Spelunking your Splunk – Part III (License Usage)

By Tony Lee

In our first article of the series, Spelunking your Splunk Part I (Exploring Your Data), we looked at a clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.  In our second article of the series, Spelunking your Splunk – Part II (Disk Usage), we provided a dashboard that can be used to monitor data distribution across multiple indexers.  In this article, we will dive into understanding your license usage.

Finding and understanding license usage information

There easiest way to query your Splunk license information is to use the query below in the search bar:

index=_internal source=*license_usage.log type=Usage

This should return raw license usage data which includes:  index, host, source, sourcetype, and number of bytes as shown in the screenshot below.

Figure 1:  License usage fields

If this search returns nothing, you may need to forward your _internal index to the search peers as described in the article below:

https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Forwardmasterdata

After figuring out the fields you can get a little fancier and convert the bytes into GB and display that data over time as shown below.  Try this as both as a statistics table and a column chart.

index=_internal source=*license_usage.log type=Usage | timechart span=1d eval(round(sum(b)/1024/1024/1024,2)) AS "Total GB Used"

Now that you understand the basics, the sky is the limit.  You can display the license usage per index, source, sourcetype, host, etc.  Take a look at our dashboard at the end of this article and give it a try.


Figure 2:  One of our favorite dashboards for license usage

Conclusion

Splunk provides decent visibility into license usage via the Monitoring Console / DMC (Distributed management console), but we found this visual representation to be quite helpful for monitoring gaining additional insight.  We hope this helps you too.


Dashboard XML code

Below is the dashboard code needed to enumerate your license usage.  Feel free to modify the dashboard as needed:


<form>
  <label>License Usage</label>
  <fieldset submitButton="false" autoRun="true">
    <input type="time" searchWhenChanged="true" token="time1">
      <label></label>
      <default>
        <earliest>-7d@d</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <chart>
        <title>Daily License Usage by Index</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage  | rename idx AS index  | timechart span=1d eval(round(sum(b)/1024/1024/1024,2)) AS "Total GB Used" by index</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.text">Date</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.text">License Usage</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">false</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">column</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisStart</option>
        <option name="charting.legend.placement">right</option>
        <option name="charting.axisLabelsY.majorUnit">10</option>
        <option name="charting.axisY.maximumNumber">60</option>
        <option name="charting.axisY.minimumNumber">0</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Total Daily License  Usage</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage  | timechart span=1d eval(round(sum(b)/1024/1024/1024,2)) AS "Total GB Used"</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.text">Date</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.text">GB</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">column</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisStart</option>
        <option name="charting.legend.placement">right</option>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="charting.axisLabelsY.majorUnit">25</option>
        <option name="charting.chart.showDataLabels">all</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
      </chart>
    </panel>
    <panel>
      <table>
        <title>Daily License Usage by Index Stats</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage earliest=-7d@d  | rename idx AS index  | timechart span=1d eval(round(sum(b)/1024/1024/1024,2)) AS "Total GB Used" by index</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="count">10</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>License Usage by Host</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by h | eval GB= round(bytes/1024/1024/1024,2) | fields h GB | rename h as host | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">false</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">pie</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisStart</option>
        <option name="charting.legend.placement">right</option>
      </chart>
    </panel>
    <panel>
      <chart>
        <title>License Usage by Sourcetype</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by st | eval GB= round(bytes/1024/1024/1024,2) | fields st GB | rename st as Sourcetype | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">false</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">pie</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisStart</option>
        <option name="charting.legend.placement">right</option>
      </chart>
    </panel>
    <panel>
      <chart>
        <title>License Usage by Source</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by s | eval GB= round(bytes/1024/1024/1024,2) | fields s GB | rename s as Source | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="charting.chart">pie</option>
        <option name="charting.axisY2.enabled">undefined</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <title>License Usage by Host Stats</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by h | eval GB= round(bytes/1024/1024/1024,2) | fields h GB | rename h as host | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="count">10</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>License Usage by Sourcetype Stats</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by st | eval GB= round(bytes/1024/1024/1024,2) | fields st GB | rename st as Sourcetype | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="count">10</option>
      </table>
    </panel>
    <panel>
      <table>
        <title>License Usage by Source Stats</title>
        <search>
          <query>index=_internal source=*license_usage.log type=Usage | stats sum(b) AS bytes by s | eval GB= round(bytes/1024/1024/1024,2) | fields s GB | rename s as Sourcetype | sort -GB</query>
          <earliest>$time1.earliest$</earliest>
          <latest>$time1.latest$</latest>
        </search>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="count">10</option>
      </table>
    </panel>
  </row>
</form>


Wednesday, November 22, 2017

Spelunking your Splunk – Part II (Disk Usage)

By Tony Lee

In our first article of the series, Spelunking your Splunk Part I (Exploring Your Data), we looked at a clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.  Now we will examine disk usage!

You may know this already--Splunk stores data on indexers. But have you ever wanted to visually see indexer capacity?  Or in a distributed environment, have you ever wondered how well the data is distributed across the indexers?  We have a solution for both and will provide the code at the bottom of the article.

Finding disk usage information

There are a number of ways to query disk utilization within Splunk.  For example, you could create scripted input that makes a call to the operating system, but Splunk makes it even simpler than that...  Try copying and pasting this RESTful query into the search bar:

| rest splunk_server=* /services/server/status/partitions-space | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table updated, splunk_server, mount_point, fs_type, capacity, compare_usage, pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server


This should result in something that looks like the following screenshot which provides information such as the server name, mount point, file system type, drive capacity, disk usage, and percentage of disk usage. If you receive information from non-indexers or mount points that are not related to your actual indexer mount points, you can either ignore them or filter them out of the search.


Figure 1:  The search that starts it all

Adding a gauge

This is pretty interesting information, especially in a distributed environment, but let's take it up a notch so we can see a visual representation.  The dashboard code at the bottom of the page will give you the basic building blocks to customize gauges on your disk usage page.

Figure 2:  Adding a filler gauge for each indexer

Note:  For the gauges, you should change two values:  splunk_server to match the value in the splunk_server column and mount_point to match the value in the Mount Point column in our original search.

For environments with clustered indexers, just add a gauge for each indexer.  The end result should look something like the following:

Figure 3:  Filler gauges across the index cluster

In this example, it is very easy to see one indexer that is not properly load balanced. This dashboard can also be used to trigger alerts based on disk usage.

Conclusion

Splunk provides good visibility into indexer health via the Monitoring Console / DMC (Distributed management console), but we found this visual representation quite helpful for monitoring disk usage and indexer cluster load balancing.   We hope this helps you too.


Dashboard XML code is below:

Below is the dashboard code needed to enumerate your servers and mount point and to create one gauge.  Now just copy the gauge code for as many gauges as needed:

<dashboard stylesheet="custom.css">
  <label>Disk Usage</label>
  <row>
    <panel>
      <chart>
        <title>Indexer-1</title>
        <search>
          <query>| rest splunk_server=* /services/server/status/partitions-space | search splunk_server=server_name_here mount_point="/" | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">fillerGauge</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.rangeValues">[0,50,75,100]</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.gaugeColors">["0x84E900","0xFFE800","0xBF3030"]</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <search>
          <query>| rest splunk_server=* /services/server/status/partitions-space | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table updated, splunk_server, mount_point, fs_type, capacity, compare_usage, pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server</query>
          <earliest>-15m</earliest>
          <latest>now</latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</dashboard>