Fuzzing at Mozilla

Jesse Ruderman

Fuzzing

Give random input to Firefox
See what goes wrong

Detecting bugs

Crashes

Hangs

Assertion failures

Leaks

Valgrind warnings

If Firefox crashes as a result of some input, it's a bug. Valgrind is a memory debugger; it will tell you about things that could have crashed or could result in undefined behavior. Assertions document what the developer believes to be true, and allows finding bugs before they lead to crashes.

These are the kinds of bugs you can find without knowing what output is expected, without being a UI designer or standards expert.

JavaScript engine fuzzer

(function() {
  var x;
  eval("for (x in (gc)()) for each(e in [0]) { print }")
})()

triggers

Assertion failure: !JSVAL_IS_PRIMITIVE(regs.sp[-2]), at ../jsops.cpp:489

JavaScript engine fuzzer

makeExpression randomly returns one of:
- makeExpression() + rndElt(binaryOps) + makeExpression()
- makeLValue() + " = " + makeExpression()
- makeID()
- ...

jsfunfuzz is 4000 lines of randomly-recursive JavaScript. The way it generates input (functions) is pretty boring: there are functions like "makeStatement" and "makeExpression", and each picks a template at random and fills it in with calls to other make* functions.

JavaScript: Detecting bugs

Crashes

~~Hangs~~

Assertion failures

Leaks

Valgrind warnings

It can find all the normal "easy to detect" kinds of bugs. But it accidentally puts the engine into an infinite loop or exponential blowup a lot, so I don't really bother with hangs. But also...

JavaScript: Detecting bugs

What's more interesting is what kinds of bugs it can find. It can find assertions and crashes like any other fuzzer, but it can also find several types of "correctness" bugs. The first way of finding correctness bugs is simple but temporal.

Crashes

~~Hangs~~

Assertion failures

Leaks

Valgrind warnings

Cross-engine consistency

Decompiler consistency

JavaScript: Comparing engines

var v = 0; 
for each (var a in [0, {}, {}, {}]) {
  v = v >>> 0;
  for each (var b in [{}, {}, new String(''), 42, new String(''), {}, 42]) { 
  }
}
print(v);

Interpreter: 0
Tracing JIT: -6.3 ⋅ 10⁶⁶

When you refactor, or introduce a new execution mode, you can feed fuzz inputs into both engines to ensure they give the same answers. Here I was trying to find bugs in the tracing JIT, which does special things when variables change type, so I instructed the fuzzer to make code where variables change type.

Aside: Translation party

The quick brown fox jumped over the lazy dog.
速い茶色のキツネは、怠け者の犬を飛び越えた。
The quick brown fox jumped over the lazy dog.

There's this site called Translation Party where you can see what happens when Google Translate tries to take a sentence to Japanese and back. With some inputs it detects convergence right away.

Aside: Translation party

Isn't it ironic?
それは皮肉なことではないか。
Is it not ironic.
それは皮肉なことではありません。
Ironically, it is not.
皮肉なことに、そうではありません。
Ironically, it is not.

But with other inputs, the translator has some trouble. "Isn't it ironic" became "Is it not ironic". Ok, that's not too bad. But then it turned into "Ironically, it is not", which has a very different meaning. Maybe it's a comment about the song.

I don't know Japanese, so I can't tell you exactly where it went wrong. But I think it's safe to say something did go wrong.

JavaScript: Decompilation party

function() { return x*2 }
name "x", int8 2, mul, return, stop
function() { return x * 2; }
name "x", int8 2, mul, return, stop
function() { return x * 2; }

So here's the same idea applied to JavaScript. When SpiderMonkey prints out a function, it's actually decompiling bytecode. In this case, the compiler turned my function into 5 bytecodes, and the decompiler turned it back into exactly the same text I had entered.

Now, I can read bytecode about as well as I can read Japanese, but seeing the round-trip work gives me some confidence that both parts worked correctly.

JavaScript: Decompilation party

function() { new (new C)(0); }
name "C", null, new 0, null, zero, new 1, pop, stop
function() { new new C(0); }
name "C", null, zero, new 1, null, new 0, pop, stop
function() { new new C(0); }

Something went wrong here. In fact, the first one passes undefined to C, and the second one passes 0. The decompiler screwed up by not including enough parentheses.

JavaScript engine fuzzer

922 bugs
- 107 security holes
The JavaScript engine team loves us

Seems like a lot of bugs, but it was over a period of 4 years. Some were regressions that were only in trunk for a few days. On the other hand, some were 10 years old.

The JavaScript engine team loves getting bugs from the fuzzer. Because otherwise they'll hit the bug in some complicated web page, all tangled up in application logic and jQuery.

JavaScript engine fuzzer

DOM fuzzer

DOM fuzzer: mutation modules

Move nodes
Move attributes
Mutation: easy to implement
Effectiveness depends on starting pages
Tests layout engine's handling of changes, not just weird DOMs

The first part of the DOM fuzzer just picks two nodes at random in a document and performs an appendChild or insertBefore call. (In other words, it picks a node to move and a place to put it.) It quickly turns any page into a DOM soup, so I called it Stir DOM.

Very easy to implement and reasonably effective. Great bang-for-buck, 300 bugs out of maybe 30 lines of JS. This is how I got started and it's a good place to start fuzzing a new feature.

After running for a while you have a "DOM soup" containing the ingredients of the original page but all mixed together. The effectiveness depends on starting points. I used to use real web pages, but now I use crashtest and reftest files from our own regression test suite. The starting points can be really good by choosing features that go well together (tables; image maps) that would be hard to create randomly.

You're testing not only the final rendering but also how DOM and layout code react to dynamic changes. This code is harder to get right, more performance-sensitive, and has more room for dangling-pointer bugs.

DOM fuzzer: Generator modules

HTML
HTML tables
XUL
SVG
MathML

Lists of elements, attributes, attribute values. Requires reading specs or implementations or documentation, or following bugs and commits, to compile those lists. Great at creating weird combinations of elements, but not so great at building meaningful combinations. But we have Stir DOM and the starting points for that.

This information is used not only for DOM but also for setting innerHTML. Without these modules, rare elements would get little testing, especially with other rare elements.

DOM fuzzer: other modules

Do stuff in event handlers
Styles
Stylesheets
Text
Editor, range, selection
Introspection and function calls
Privileged function calls

There are also a bunch of other modules. I don't have time to talk about them all, but I'd like to point out that mutation events are evil.

DOM fuzzer example


嵠

̇
]]>

DOM fuzzer example

]]>

DOM fuzzer: Detecting bugs

Crashes

Hangs

Assertion failures

Leaks

Valgrind warnings

DOM fuzzer: Dealing with known bugs

Crashes	Ignoring 14 crash signatures
Hangs	Ignoring hangs in the presence of 16 web features
Assertion failures	Ignoring 408 assertions (272 bugs)
Leaks	Ignoring 11 leak patterns
Valgrind warnings	Suppressing Valgrind warnings for 17 bugs

Leaks are the trickiest. I have a hierarchy, with the top occupied by nsDocShell, nsDocument, and nsGlobalWindow. Unfortunately, there are bugs for all of these, so at the present I will only notice small leaks.

DOM fuzzer

1756 bugs
- 313 security holes

DOM fuzzer

Automation

“This seems tedious. How can I automate it?”

2005	Ad-hoc layout testing	Fuzzing
2006	Reproducing fuzz bugs	Record fuzz actions
2006	Reducing fuzz testcases	Disentangle fuzz actions; automate line-by-line reduction

One of my favorite aspects of being a programmer is that I can automate any task that becomes tedious. When ad-hoc testing starts feeling tedious, maybe it's time to automate it -- with a regression test, an exhaustive test, or a fuzzer.

Unfortunately, fuzzing has a tendency to give you gigantic testcases unsuitable for bug reports. If a fuzzer finds lots of bugs, it's worth investing in automating the pieces around it.

Automated line-by-line reduction

“Lithium” automates the process of reducing a 3000-line testcase.
- Which chunks of 2048 lines can be removed?
- Which chunks of 1024 lines can be removed?
- Which chunks of 512 lines can be removed?
- ...
- Which individual lines can be removed?

Testcase reduction is a lot like trying to do a bunch of simultaneous binary searches, except you don't know how many things you're looking for. So it shouldn't be too surprising that you can do it in time proportional to the size of the reduced testcase, times the log of the size of the original testcase.
Sometimes slightly faster thanks to locality, sometimes slightly slower due to imperfect assumptions. Terrible in situations where it can introduce fatal syntax errors, or when the bug comes out as intermittent.
In practice, it takes maybe fifteen minutes for it to reduce a testcase as much as it can, and another fifteen minutes for me to clean it up some more.
This concept comes from the "Delta debugging" group at some university in Germany. There's a similar program called "Delta" that the gcc project uses to reduce testcases people put in their bug database.
Lithium is a standalone Python tool; you can use it to reduce anything linear against any criterion you want.

More automation

“This seems tedious. How can I automate it?”

2007	Reducing starting parts	serializeDOMAsScript
2007	Creating starting points	Use crashtests
2008	Adding crashtests	Crashtestify extension
2010	Remembering to remove "ignores"	expect_bug.py
2010	Keeping fuzz boxes chugging	Tinderbox-like behavior

More automation

“This seems tedious. How can I automate it?”

It can be worth automating something even if the automation only works half the time. Automate what you can and free up your mind for the interesting cases.

Now, it is possible to automate too much. If it's an easy thing you're only going to do twice, it might not be worth it. Or if you try to automate the process of automating tedious tasks. It's really hard, but even worse, if you succeed, you might have just enough AI to bring about the technological singularity.

Mind the technological singularity

Be careful.

How to fuzz

How to fuzz: pick your target

Bugs are likely
- Complex data structures
- Complex invariants
- History of bugginess
Other types of testing are difficult
- Combinatorial explosion of interacting inputs
Security bugs are possible
- Untrusted input
- Pointer-heavy C++ code

Pick a component to fuzz, and try to fuzz it directly. If your component has too much complexity for ad-hoc testing or normal regression test suites, because there are too many combinations that could interact. Can't rely on the web to find bugs, or you're very sad when the web finds bugs.

How to fuzz: types of input

Completely random (mostly invalid)
Valid
Almost valid

"Almost valid" or "Mostly valid" is where you want to aim, so you get into deep code paths but also the error cases off of those paths.

How to fuzz: where to start

Identify the component's inputs
Mutation is simpler than generation
- Find a test suite, mutate files from it
Start hands-on (minimal automation)
- Reproducibility
- automation.py

Identifying a component's inputs can be subtle. For the layout module, it's largely changes to the DOM that form input, but also user zooming. For the regular expression engine, the input is regexps AND text to match against.

Other ways to help

Fix bugs on my ignore lists
Add regression tests that can be used as fuzzing starting points, such as reftests and crashtests
Add assertions to document your invariants and assumptions
Tell me what my fuzzers are missing

Mozilla Summit - July 2010 - Fuzzing

Fuzzing at Mozilla

Jesse Ruderman

Fuzzing

Fuzzing

Detecting bugs

JavaScript engine fuzzer

JavaScript engine fuzzer

JavaScript engine fuzzer

JavaScript: Detecting bugs

JavaScript: Detecting bugs

JavaScript: Comparing engines

Aside: Translation party

Aside: Translation party

JavaScript: Decompilation party

JavaScript: Decompilation party

JavaScript engine fuzzer

JavaScript engine fuzzer

DOM fuzzer

DOM fuzzer: mutation modules

DOM fuzzer: Generator modules

DOM fuzzer: other modules

DOM fuzzer example

DOM fuzzer example

DOM fuzzer: Detecting bugs

DOM fuzzer: Dealing with known bugs

DOM fuzzer

DOM fuzzer

Automation

Automated line-by-line reduction

More automation

More automation

Mind the technological singularity

How to fuzz

How to fuzz: pick your target

How to fuzz: types of input

How to fuzz: where to start

Other ways to help

End