What is fuzzing? Fuzz testing, or fuzzing, is a concept in computer security. Like the name suggests, it’s the practice of sending messed-up data to a system to see how it behaves. A good computer system should handle fuzzing gracefully. As you might guess, not all do.
When a computer receives data it doesn’t expect, it may malfunction in unpredictable ways. Fuzzing attempts to find those malfunctions.
A super-simple example of fuzzing command-line options
To understand fuzzing, it helps to understand a simple example. The whole idea is to feed a computer something it’s not expecting. When a computer expects to see a number, what does it do when you give it a letter? Or how about a color? Bad guys do fuzzing to see if they can get the computer to act in a way that benefits them. Good guys do it to try to find those problems before the bad guys do.
A really good example surfaced in October 2019. A standard GNU tool, shipped with Linux operating systems, had a serious bug. The tool was called sudo. Sudo lets a user run a command in someone else’s name. It’s not something ordinary computer users do very often necessarily, but system administrators have to do it a lot.
The bug in sudo
Sudo expects you to give it a user number or name. If I’m logged in as dave and want to run a command as dan, I can use the command sudo -u dan followed by the command I want to run as dan. If I know dan‘s user number, I can use that instead if I want. No big deal so far. But in Unix and Linux, there’s an administrative user with a user number of 0, normally named root. The root user has superpowers. You may or may not be allowed to run commands as root.
In October 2019, a bug surfaced. If you specified a user number of -1 or 4294967295, sudo interpreted that as 0 and let you run commands as root, even if you’re not allowed. It’s a perfect example of a program not handling expected input correctly. When you enter a negative number or an absurdly large number, the program should give an error and halt. It does now.
To find bugs like this, it helps to understand how computers operate. The number 429467295 wasn’t random. It’s the largest number you can represent in 32 bits. When zero is disallowed, the goal is to find a number that will wrap around and become zero. Numbers like that are good candidates.
Taking it to another level
Fuzzing seeks to find these kinds of problems, especially when they’re buried really deeply. It’s one thing to find a bug like this in a program that reads user input from the keyboard. But how do you find it in a web application, or a database, or an application program like Microsoft Word?
That’s where fuzzing comes in, performing this same kind of test, but at a deeper level, and ideally, in an automated fashion.
What does fuzzing look like?
To do fuzz testing, you just take valid computer data and change it to be subtly invalid. Or maybe blatantly. But you’ll probably start with subtly invalid data and make it more and more blatant to see how well a program responds to it.
For example, you might locate a number in a stream of data, then see what happens if you make that number negative. You can also try certain numbers that have predictable behavior. Numbers like 255, 32767, 65535, and 4294967295 are interesting because they’re all binary numbers that are prone to wrap around to zero if the computer runs out of memory to store them. Fuzz testing ought to try those types of numbers, and those types of numbers plus one, and look for bad behavior.
But that’s not all you can try. What about sending letters when the computer expects a number? Or pictures when it expects music? Maybe you send it valid data but in a different format, like sending dates in MM/DD/YYYY format instead of DD/MM/YYYY format. Or maybe you go all Y2K on it and send it a two-digit year and see what it decides to do with it. If I tell a banking application that my mortgage started in “99,” then it has disastrous implications if it decides the first two digits are 20.
You start out with a stream of data, all formatted the way the program expects, but with some gibberish values in it. Then you increase the gibberish. And once you’re satisfied you’ve tested enough conditions with the data itself, you might start messing with the formatting too.
Automating the tests
It’s a bit tedious to generate all of this data by hand, so ideally, you have a program that tries lots of different combinations of fuzzing to generate the data rapidly, and, hopefully, send it as well. If you don’t have software that generates the fuzzing for you, you’ll probably need to write it. But it’s just a matter of generating correctly formatted data so the system will accept it, fuzzing the parameters, then sending it to the software you’re testing and logging the results. That part doesn’t usually require an ace software developer. It actually wouldn’t be a bad intermediate-level project to put together in Python.
In some cases there are ready built programs that will create the fuzzed data for you, and all you have to do is send it and log the results. Typically a simple scripting language will do.
What to test
Basically anything that can do any kind of I/O is a candidate for fuzz testing. That means the file format, any open ports, and in the case of programs with user interaction, the elements of the UI, the command-line options, and file imports and exports. For a web app, you’ll want to test the URL handling, all user generated content, the API, remote procedure calls, and anything else along these lines.
Fuzzing tends to find fairly simple bugs, but don’t pooh-pooh them. Of course everyone assumed feeding the maximum-sized 32-bit integer to sudo wouldn’t do anything interesting, until someone tried it. Everyone assumed someone else had already thought of that. No one thought it was a minor thing after it got disclosed.
Every bug seems obvious after someone discovers it. The key is to be the first, and not assume everyone else already thought of that. Other than that, all it takes is an inquisitive mind with some creativity.