Thursday, October 15, 2020

VirusTotal Lookups in Microsoft Power BI - Processing JSON

 By Tony Lee

So we are on this Power BI binge... and figured a VirusTotal lookup report and dashboard would be useful as a stand alone tool as well as a drill through fed by other reports. Presented below are the results of our efforts, steps we took, and limitations we encountered (as newbz). In typical form, we released the work for free so everyone can use, contribute, and modify as they see fit.  Enjoy!:

Figure 1:  Resulting VT Lookup report in Power BI


Let's start with what this report provides -- from top left to bottom right:

  • Slicer (filter) to filter on engines that hit or missed the sample
  • Permalink to the results
  • Gauge to indicate number of hits vs engines
  • Verbose message from VT
  • What Resource we actually queried
  • MD5, SHA1, and SHA256 hash values
  • Table of engines, conviction results, malware classification, engine update, and engine version
  • Treemap of conviction results - useful for determining the highest malware family consensus


At a high-level, here is what we did to create the dashboard:

  1. Created a parameter that holds the Resource we want to look up
  2. Used "Get Data" to query the VT API to pull in the JSON response
  3. Used Power Query Editor to manipulate and extract the nested JSON as needed
  4. Built a cool dashboard in Power BI Desktop
Since there are some intricate details in processing JSON data, we figured we would break down the steps in a bit more detail.

Creating the Parameter

We created two parameters to prompt users for data.  The first is the VT API key and the second is the hash they want to query.  Parameters are created in Power Query Editor by clicking on the down arrow for Manager Parameters > New Parameter

Name the field, Add a description, select Text for the Type, and enter a hash you know VT has for the Current Value.  In our example, we used the hash:  0a52fdc2219346d25295c8d6838122bcdd7ddf119386cd1030458b035cd97999

Figure 2:  Creating and managing the parameter that stores the query value

Note:  We also created a parameter for the API key so the user is prompted for the key when opening the Power BI template for the first time. When this is saved as a .pbix, the parameter value is also saved making it easy to use by the team.

Creating the VirusTotal Web API call for Get Data

To create the Web API call perform the following:
  • Get Data > Web
  • Advanced
  • URL parts:
    • apikey=
    • Select Parameter from the dropdown and use API_Key
      • Note:  The free VT API key can be used
    • &resource=
    • Select Parameter from dropdown and use "Resource"

Figure 3:  Creating API call to VirusTotal to get the JSON data

Note:  The data returned will be JSON.  Now we need to make it columnar.

Use Power Query Editor to Manipulate the JSON

When the JSON formatted data is returned from the VirusTotal API query, follow these steps to make it columnar:

Convert > Into Table

Figure 4:  Convert JSON to a table

Transform > Transpose > Use First Row as Headers

Figure 5:  Transposed the table and then used the first row as the header

Take note!  The field called scans has a value of: "Record".  When clicking on this object, it expands below revealing the results of each scan vendor. Notice that those values are also "Record" which means we have two nested JSON objects, one inside the other.  We will need to make both columnar data to use it in Power BI.

Figure 6:  The scans field is a JSON object that contains more JSON objects

Expand the scans column

Here is a trick that we learned from a Guy in a Cube video -- thanks for the videos fellas! There is a little double arrow box in the far right portion of the header for scans. If you click this, it will allow you to expand (convert) the JSON to columnar data once again.  Be sure to uncheck the box at the bottom to avoid prepending a prefix that we would strip away later.

Figure 7:  First expansion of JSON object called "scans"

Highlight the antivirus vendor columns > Unpivot Columns

Figure 8:  Unpivoting the antivirus vendor columns to make each entry its own row

Rename Attribute field to Engine

Expand the Value field that contains more nested JSON

Figure 9:  Renaming column header and expanding second JSON object

Rename new columns to remove "Value." in front of detected, version, result, and update.

Building the dashboard

The dashboard consists of a filter, some tables, a gauge, and a treemap. Please see our other Power BI articles on building visualizations. The toughest portion of this task was taking JSON input and making it columnar so it can be processed by Power BI. As a bonus, we are also templating this project and making it available to everyone.


As we just mentioned, we saved this work as a template and will distribute it for free. When opening the template, simply insert your VT API Key (which can be the free API key) and the hash that you want to query (called a resource).

Figure 10:  Templated VirusTotal lookup report in Power BI


Currently we believe we have the following limitations, but these may be easy to overcome depending on how many ideas and solutions we can pull from our readers:
  1. Unable to modify hash parameter from report / dashboard to refresh data feed with new data natively from Power BI
  2. Drill through lookup using hash from previous report may be limited due to parameter update issue
  3. Potential for AV column rename to fail due to hard coded AV vendor names in power query
  4. Error handling could be improved for instances where VirusTotal does not have the data
    • "response_code": 0 vs. "response_code": 1


We hope you enjoyed this article and have some ideas for us to correct some of the limitations. Feel free to contribute ideas and help us make this report better by using the comments feature below or submitting pull requests via Github. Or just give us a shout out if you are using this solution.  Enjoy!

No comments:

Post a Comment