Make a simple pivot table in Python

One of the first things I do when I open a vulnerability scan is make a pivot table on the title of the vulnerability and the count of that title. Then I do the same for all the systems. It’s easy to do in Excel once you’ve seen someone do it once, but if you have a lot of data it can be crash prone. Here’s how to make a simple pivot table in Python.

This example is for Qualys data but it’s easy to adapt it to another scanner. Just change the names of the columns. Python is so much faster and more reliable for this that I rarely make pivots in Excel anymore. I make them in Python, then load the output file in Excel for viewing.

A barebones example

This is a very bare bones example. You’ll want to pretty it up, and probably want to add another pivot. But it’s simple and it works, even on huge files that Excel would have to truncate.


import pandas as pd
import xlsxwriter
#
inputfile = 'in.csv'
outputfile = 'out.xlsx'
#
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
raw = pd.readcsv(inputfile, skiprows=4) # may have to omit the skiprows option on CSVs from sources other than Qualys
raw.to_excel(writer, sheet_name=inputfile, index=False)
#
# make pivot
raw['Count'] = raw['DNS'] pivot1 = pd.pivot_table(raw, index=['DNS'], values=['Count'], aggfunc='count')
pivot1.sort_values(by=['Count'], inplace=True, ascending=False
pivot1.to_excel(writer, sheet_name='Pivot')
writer.save()

How it works

pivot table in python
The code to make a simple pivot table in Python is surprisingly simple.

This simple program uses the Pandas library, which has hundreds of powerful functions for data analysis. It also uses the xlsxwriter library, which handles Excel files.

To use it, you’ll need to install Python for your system if you don’t have it, then use these two commands to install the libraries:

pip install pandas

pip install xlsxwriter

The cool thing about Python is that you can write useful programs that fit in a single screen. Now let’s get into the quick and dirty of the program. Unlike Excel, Pandas doesn’t let you use the same column for both the index and the count. So line 12 makes a copy of the DNS column that I can use for counting. It’s a hack. It works. I’m not worried about it.

Line 13 just makes the pivot, using the DNS name as the index and counting the number of occurrences. Like 14 sorts it in descending order, since I care more about the host with lots of vulnerabilities on it. When I do vulnerability analysis, I find there’s usually a root cause keeping certain systems from getting updates. Fix that and the system mostly fixes itself at that point.

I’ll leave pivoting on the vulnerability title as an exercise to you. When a vulnerability is missing on all systems, that tells me it never got pushed. When it’s missing on some systems but not others… That can be a difficult sysadmin problem, comparable in effort and skill required to that of a forensic investigation. I don’t think a lot of security professionals recognize this, and it leads to unrealistic expectations. That’s my current theory of why most vulnerability management programs aren’t successful. It’s also another topic.

Back to the topic at hand: I first learned to program by typing simple programs out of computer magazines in the 1980s, then enhancing them for my own use. You can learn Python the same way I learned Basic and 6502 assembly language, using examples you find online.

Enhancement ideas

If you really want to be slick, modify the program to make API calls to pull the data straight from the source, rather than a local file. You can also make multiple pivots. To do so in a way that won’t make my computer science professor cringe, use a loop, rather than coding in a straight linear fashion. But if your code works, I won’t judge.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: