The big Fuzz

In theory, theory and practice are the same; in practice they are not. While practice can bite fairly large pieces out of theory's proverbial butt, it does not mean that theory is simply a vehicle for academics to publish more papers. Computer security practitioners often suffer from a lack of one of the two while blindly worshipping the other. A good example is the lack of fundamental discussion around the topic of fuzzing, while at the same time the tools are massively hyped.

For those who hid themselves successfully so far from the hype, the term fuzzing is, well, not defined. Everyone talks about it, but it means different things to different people. In general, it means throwing semi-valid data against a system to automate security testing. Many of today's issues in input parsers can be uncovered by constantly throwing data at them and watching them crash at some point in time. Supposedly, many people find their 0day bugs that way. The result is that more and more fuzzing tools appear, talks are held on conferences and people start to think that they can secure their products if they just sit long enough in the line-of-fire of a fuzzer before being shipped.

I was recently very surprised how much vulnerabilities one (me) could find by throwing a lot of junk against the target and just waiting for it to crash. I have to admit that I was first following another, more conventional approach of reading the code. What's also important is that my case was extremely well suited for fuzzing attacks, which I admittedly didn't see right away.

In theory, this is called boundary value testing. Boundary testing in contrast to fuzzing is a well-defined term. If a given function F is implemented in a program and the function has two parameters x and y, these two have known or unknown boundaries a < x < b and c < y < d. What boundary testing does is testing the function F with values of x and y close to or equal to a, b, c and d. In the form that should be used for security related testing, it will be called "Robust Worst Case Tests", at least that's what Paul C. Jorgensen calls it in "Software Testing, a craftsman's approach". Here, you would test x = a-1, x = a, x = a+1, x = b-1, x = b, x = b+1 and for each of these y = c-1, y = c, y = c+1, y = d-1, y = d, y=d+1. You can easily see that this is subject to a combinatorial explosion and will take a lot of time.

Another interesting aspect of boundary testing is it's limitations. Boundary testing only works well on independent bounded physical quantities. Mind you, we are talking about independent values representing physical quantities such as offset values in a file format (physical = in the physical file, no matter how physical files are by themselves). If the variable you are testing has no clear boundaries in the sense of a < x, it's almost impossible to use boundary testing effectively. Jorgensen has the nice example testing a calendar function and the February and leap year problems. There is basically no sense of the nature of those test values or any semantic meaning in the current context, which also means if there is another value referring to the one you are testing, it will not be taken into account and might (or most likely will) invalidate your test results.

So how about the practice? It's easy to see that implementing a good boundary testing tool is fairly hard. None of the tools I have seen so far implements fuzzing as Robust Worst Case Tests. It is readily visible why, given the observation from above about the combinatorial explosion, since none of our protocols or file formats work with only two values. Additionally, security testers have in almost no case a clear understanding of what the boundaries for a given parameter x are, or, to stay with the notation used above, a and b for a < x < b are unknown. Therefore, they cannot implement boundary testing without reading the entire target code to understand the boundaries. But that would be a different testing approach and wash all the sexiness of a quick-and-dirty hacker tool from fuzzing, making it a test case engine applied after a source code audit.

So what is it these tools do? So far, the tools I have seen use what could be considered experience-derived boundaries for testing. 32 Bit integer values are for example mostly tested against 0x00000000, 0xFFFFFFFF, 0x7FFFFFFF and 0x80000000, which are the boundaries for the entire unsigned 32 Bit integer and it's signed representation. Other test values are derived from vulnerabilities in the past, like web servers crashing with long header fields. Most fuzzers concentrate on automatically testing "what's usually broken".

This leaves the question why fuzzing, given all the theory above, is such a successful attack strategy in practice. The short answer is: It shows nicely how bad software quality is in general that a small subset of a very limited testing method uncovers so many bugs. The longer answer is more complicated and I don't understand all aspects of it so far. Part of the story is that some applications are better suited for fuzzing, namely the ones who operate on independent sets of data with physical values (surprise!). Examples of these include packet parsers for connection-less protocols without any checksums. If the data in the packet has to be taken at face value from the parser since it doesn't have any other chance, fuzzing can be very successful. Think HTTP and ISAKMP.

What's not clear so far: Does the success or failure of fuzzing against a specific product tell you anything about the security or quality of the product in question? Or does it just show the limitations of the method and the actual fuzzer used? What is clear is that fuzzing by no means is an adequate product testing tool for vendors. Even if I get a good answer on the questions above, just remember February 29.