Dealing with JSON is a common task when you’re coding in Python. Two functions you may use a lot without fully understanding them are json.loads and json.dumps. Let’s take a look at json.loads vs json.dumps, what they do, and how they can help you, especially when wrangling complex JSON data from APIs.
json.loads and json.dumps are two functions for converting JSON to and from simple strings. Sometimes one is better than the other, and being able to convert between them reduces the trade-offs you have to make.
json.loads vs json.dumps
It can be really hard to remember what json.loads does vs what json.dumps does. They are opposites. json.loads converts a string to a JSON object. By contrast, json.dumps converts a json object to a string.
You can remember it as load a string into JSON, and dump JSON to a string. Or just come back here, that works too. I don’t mind. If you don’t use them every day, being able to instantly look up json.loads vs json.dumps is one of the best things about the world we live in now.
On the other hand, this would be an awfully good job interview question. If I see Python on someone’s resume, you can bet I’ll probably ask this question because it will tell me a lot about how advanced they are.
What does json.loads do?
When you encounter JSON in a file or in a stream of text from an API, it’s just a plaintext string. It’s a highly formatted and structured plaintext string, but there’s minimal benefit in it in that form.
The function json.loads, which you can remember as the call you make to load a JSON string, converts it to a Python dictionary. JSON translates extremely well into a Python dictionary, which makes it a useful data structure because it makes all of the elements addressable.
The data I handle every day in my day job can have 125 or more elements to it. At any given moment I probably care about five of them, but which five I care about varies, depending on whether I’m talking to system administrators, security analysts, IT managers, or data scientists that day. Or maybe I should say that hour, because I may talk to all of them in a single day.
Handling that data as JSON makes it easy to extract the data I need and only the data I need, and it also gives me options for sorting based on the data I care about the most.
What does json.dumps do?
As useful as JSON is, it’s not always ideal for searching, and it’s certainly not always readable. It’s designed to be human-readable, but if you mash it all together in a single line, it’s easy to get lost in it.
The way to remember what JSON dumps does is to remember the phrase “dump JSON to a string.”
The first time I encountered json.dumps was when I needed to pretty-print JSON to make it possible to read and decode. There are lots of ways to pretty print JSON, but you probably already have the json library loaded, so json.dumps can do double duty without the overhead of loading another library.
But json.dumps is also helpful for dealing with deeply nested JSON. My career changed forever when I learned I could load my vulnerability data into Pandas for analysis. Pandas lets me do everything I can do in Excel, but without that pesky 1.04 million row limit. So I can deal with huge datasets quickly, then cut them down to fit in Excel and write out Excel files for viewing and further analysis. Complex analysis that used to take me weeks to do in Excel takes mere minutes in Pandas. It’s amazing.
One time when strings beat JSON
The problem I run into is that some of my JSON data doesn’t translate neatly into a two-dimensional Pandas dataframe. Tag data is the most frequent example, since my tools don’t have any practical limit on how many tags they may assign to a system. When I load that data into Pandas, I end up with a column full of JSON, which doesn’t lend itself well to the kind of searches I do.
CVEs are another, since there’s usually not a 1:1 relationship between scanner signatures, vendor patches, and CVEs. One signature or one patch can be associated with multiple CVEs.
But json.dumps solves that problem neatly. It converts that JSON into a mess of text, but it renders that column searchable. I don’t care what that column looks like. I just care if I can find the tag or the CVE I care about in it. The json.dumps solution is a hacky solution to this problem, but it’s fast and it works.
json.loads vs json.dumps: In conclusion
Programming in Python is a very useful skill, even if you’re not a full-time software developer. One reason is because dealing with APIs comes so naturally to it, especially dealing with JSON inputs and outputs. The functions json.loads and json.dumps are two reasons for that. These two functions mean I don’t have a yes/no choice when it comes to handling JSON. When it’s useful to break the rules and treat JSON as unstructured text, I can do that. When it’s useful to treat it as highly structured data, I can do that too.
My computer science professors from college would probably say I’m abusing JSON by doing things this way. But hackers don’t follow the rules and that’s who I have to keep pace with. Abusing json.dumps makes dealing with sometimes-unpredictable data much easier.