Archive for the 'Fuzzing' Category

Releasing jsfunfuzz and DOMFuzz

Tuesday, July 28th, 2015

Today I'm releasing two fuzzers: jsfunfuzz, which tests JavaScript engines, and DOMFuzz, which tests layout and DOM APIs.

Over the last 11 years, these fuzzers have found 6450 Firefox bugs, including 790 bugs that were rated as security-critical.

I had to keep these fuzzers private for a long time because of the frequency with which they found security holes in Firefox. But three things have changed that have tipped the balance toward openness.

First, each area of Firefox has been through many fuzz-fix cycles. So now I'm mostly finding regressions in the Nightly channel, and the severe ones are fixed well before they reach most Firefox users. Second, modern Firefox is much less fragile, thanks to architectural changes to areas that once oozed with fuzz bugs. Third, other security researchers have noticed my success and demonstrated that they can write similarly powerful fuzzers.

My fuzzers are no longer unique in their ability to find security bugs, but they are unusual in their ability to churn out reliable, reduced testcases. Each fuzzer alternates between randomly building a JS string and then evaling it. This construction makes it possible to make a reproduction file from the same generated strings. Furthermore, most DOMFuzz modules are designed so their functions will have the same effect even if other parts of the testcase are removed. As a result, a simple testcase reduction tool can reduce most testcases from 3000 lines to 3-10 lines, and I can usually finish reducing testcases in less than 15 minutes.

The ease of getting reduced testcases lets me afford to report less severe bugs. Occasionally, one of these turns out to be a security bug in disguise. But most importantly, these bug reports help me establish positive relationships with Firefox developers, by frequently saving them time.

A JavaScript engine developer can easily spend a day trying to figure out why a web site doesn't work in Firefox. If instead I can give them a simple testcase that shows an incorrect result with a new JS optimization enabled, they can quickly find the source of the bug and fix it. Similarly, they much prefer reliable assertion testcases over bug reports saying "sometimes, Google Maps crashes after a while".

As a result, instead of being hostile to fuzzing, Firefox developers actively help me fuzz their code. They've added numerous assertions to their code, allowing fuzzers to notice as soon as the smallest thing goes wrong. They've fixed most of the bugs that impede fuzzing progress. And several have suggested new ways to test their code, even (especially) ways that scare them.

Developers working on the JavaScript engine have been especially helpful. First, they ensured I could test their code directly, apart from the rest of the browser. They already had a JavaScript shell for running regression tests, and they added a --fuzzing-safe option to disable the more dangerous testing functions.

The JS team also created a large set of testing functions to let me control things that would normally be based on heuristics. Fuzzers can now choose when garbage collection happens and even how much. They can make expensive JITs kick in after 2 loop iterations rather than 100. Fuzzers can even simulate out-of-memory conditions. All of these things make it possible to create small, reliable testcases for nasty classes of bugs.

Finally, the JS team has supported differential testing, a form of fuzzing where output is checked for correctness against some oracle. In this case, the oracle is the same JavaScript engine with most of its optimizations disabled. By fixing inconsistencies quickly and supporting --enable-more-deterministic, they've ensured that differential testing doesn't get stuck finding the same problems repeatedly.

Andreas Gal, a developer working on Firefox's JavaScript engine, once commented on Bugzilla: 'From this day forward, I shall never write a JIT again without Jesse.'

Please join us on IRC, or just dive in and contribute! Your suggestions and patches can have a large impact: fuzzer modules often act together to find complex interactions within the browser. For example, bug 893333 was found by my designMode module interacting with a <table> module contributed by a Firefox developer, Mats Palmgren. Likewise, bug 1158427 was found by Christoph Diehl's WebAudio module combined with my reflection-based API-discovery modules.

To the next 6450 browser bug fixes!

Fuzzers love assertions

Monday, February 3rd, 2014

Fuzzers make things go wrong.
Assertions make sure we find out.

Assertions can improve code quality in many ways, but they truly shine when combined with fuzzing. Fuzzing is normally limited to finding obvious symptoms like crashes, because it's rare to be able to tell correct behavior from incorrect behavior when the input is generated randomly. Assertions expand the scope of fuzzing to include everything they check.

Assertions can even help find crash bugs: some bugs are relatively easy for fuzzers to trigger, but only lead to crashes when additional conditions are met. A well-placed assertion can let us know every time we trigger the bug.

Fuzzing JS and DOM has found about 4000 assertion bugs, including about 300 security bugs.

Asserting safe use of generic data structures

Assertions in widely-used data structures can find bugs in many callers.

  • Array indices must be within bounds. This simple precondition assert in nsTArray has caught about 90 bugs.
  • Hash tables must not be modified during enumeration. If the modification happened to resize the hash table, it would leave stack pointers dangling. This PLDHashTable assertion has caught over 50 bugs.
  • Cached values should not be out of date. When a cache's get method takes a key and a closure for computing values in the case of a cache miss, debug builds can check whether the cached values are still correct. This is effectively a form of differential testing that notices bugs in cache-invalidation logic.

Asserting module invariants

When an entire module must maintain an invariant, a single assertion can catch dozens of bugs.

Making the frame arena safer

Gecko's CSS box objects, called "frames", are created and destroyed manually. They are allocated within an arena to reduce malloc overhead and fragmentation. The arena also made it possible to reduce the risk associated with manual memory management. A combination of assertions (in debug builds) and runtime mitigations (in all builds) mitigates dangling pointer bugs that involve frames.

Requests for Gecko developers

Please add assertions, especially when:

  • A bug would be a security hole
  • Crashing is not guaranteed
  • Many callers must fulfill a precondition
  • Complex, extensive code must maintain an invariant

Also consider:

Fuzzing for consistent rendering

Saturday, March 3rd, 2012

My DOM fuzzer can now find bugs where the layout of a DOM tree depends on its history.

In this example, forcing a re-layout swapped a “1” and “3” on the screen. My fuzzer didn’t know which rendering was correct, but it could tell that Firefox was being inconsistent.

Initial DOM tree
  • DIV
    • ت
    • SPAN
      • 1
      • SPAN
      • 3
31ت
Random change:
remove the inner span
  • DIV
    • ت
    • SPAN
      • 1
      • 3
31ت
Force re-layout
  • DIV
    • ت
    • SPAN
      • 1
      • 3
13ت

Gecko developer Simon Montagu quickly determined that 13ت is the correct rendering and attached a patch. Later, when a user reported that the bug affected Persian comments on Facebook, we were able to backport Simon’s fix to Firefox 11.

How it works

The fuzzer starts by making random dynamic changes to a page. Then it compares two snapshots: one taken immediately after the dynamic changes, and another taken after also forcing a relayout.

To force a relayout, it removes the root from the document and then adds it back:

  var r = document.documentElement; 
  document.removeChild(r);
  document.appendChild(r);

Like reftest, it uses drawWindow() to take snapshots and compareCanvases() to compare them.

In theory, I could also look for bugs where dynamic changes do not repaint enough of the window. But I've been told that testing for painting invalidation bugs is tricky, so I'll wait until most of the layout bugs are fixed.

Exceptions

Since the testcases are random, I have to be heavy-handed in ignoring known bugs. If I file a rendering bug where the weirdest part of the testcase is floats, I'll have the fuzzer ignore inconsistent rendering in testcases with floats until the bug is fixed.

The current list of exceptions is fairly large and includes key web technologies:

Fuzzing in the pool

Tuesday, November 23rd, 2010

In mid-2009, John O'Duinn offered to let my DOM fuzzer run on the same pool of machines as Firefox regression tests. I'd have an average of 20 computers running my fuzzer across a range of operating systems, and I wouldn't have to maintain the computers. All I had to do was tweak my script to play nicely with the scheduler, and not destroy the machines.

Playing nicely with the scheduler

Counter-intuitively, to maximize the amount of fuzzing, I had to minimize the duration of each fuzz job. The scheduler tries to avoid delays in the regression test jobs so developers don't go insane watching the tree. A low-priority job will be allowed to start much more often if it only takes 30 minutes.

Being limited to 30 minutes means the fuzz jobs don't have time to compile Firefox. Instead, fuzz jobs have to download Tinderbox builds like the regression test jobs do. I fixed several bugs in mozilla-central to make Tinderbox builds work for fuzzing.

I also modified the testcase reducer to split its work into 30-minute jobs. If the fuzzer finds a bug and the reducer takes longer than 30 minutes, it uploads the partially-reduced testcase, along with the reduction algorithm's state, for a subsequent job to continue reducing. To avoid race conditions between uploading and downloading, I use "ssh mv" synchronization.

Not destroying the test slaves

I wasn't trying to fill up the disks on the test slaves, really!

Early versions of my script filled up /tmp. I had incorrectly assumed that /tmp would be cleared on each reboot. Luckily, Nagios caught this before it caused serious damage.

Due to a security bug in some debug builds of Firefox, the fuzzer created randomly-named files in the home directory. This security bug has been fixed, but I'm afraid RelEng will be finding files named "undefined" and "[Object HTMLBodyElement]" for a while.

By restarting Firefox frequently, fuzzing accelerated the creation of gigantic console.log files on the slaves. We're trying to figure out whether to make debug-Firefox not create these files or make BuildBot delete them.

Results so far

Running in the test pool gets me a variety of operating systems. The fuzzer currently runs on Mac32 (10.5), Mac64 (10.6), Linux32, Linux64, and Win32. This allowed me to find a 64-bit-only bug and a Linux-only bug in October. Previously, I had mostly been testing on Mac.

The extra computational power also makes a difference. I can find regressions more quickly (which developers appreciate) and find harder-to-trigger bugs (which developers don't appreciate quite as much). I also get faster results when I change the fuzzer, such as the two convoluted testcases I got shortly after I added document.write fuzzing.

Unexpectedly, getting quick results from fuzzer changes makes me more inclined to tweak and improve to the fuzzer. I know that the change will still be fresh in my mind when I learn about its effects. This may turn out to be the most important win.

With cross-platform testing and the boost to agility, I suddenly feel a lot closer to being able to share and release the fuzzer.

How my DOM fuzzer ignores known bugs

Sunday, November 21st, 2010

When my DOM fuzzer finds a new bug, I want it to make a reduced testcase and notify me so I can file a bug report. To keep it from wasting time finding duplicates of known bugs, I maintain several ignore lists:

Some bugs are harder to distinguish based on output. In those cases, I use suppressions based on the fuzzer-generated input to Firefox:

Fixing any bug on those lists improves the fuzzer's ability to find additional bugs. But I'd like to point out a few that I'd especially like fixed:

In rare cases, I'll temporarily tell the fuzzer to skip a feature entirely:

Several bugs interfere with my ability to distinguish bugs. Luckily, they're all platform-specific, so they don't prevent me from finding cross-platform bugs.

  • Bug 610311 makes it difficult to distinguish crashes on Linux, so I ignore crashes there.
  • Bug 612093 makes it difficult to distinguish PR_Asserts and abnormal exits on Windows. (It's fixed in NSPR and needs to be merged to mozilla-central.)
  • Bug 507876 makes it difficult to distinguish too-much-recursion crashes on Mac. (But I don't currently know of any, so I'm not ignoring them at the moment!)

GCC correctness fuzzing

Wednesday, November 3rd, 2010

In 2008 I wrote about generating random JavaScript to find differences between optimization modes and differences between JavaScript engines (rough list of bugs).

How do you do this kind of testing on a language like C where the behavior of many programs is undefined per spec? John Regehr explains how in his talk Exposing Difficult Compilers Bugs With Random Testing at GCC Summit 2010.

Fuzzing talk at the Mozilla Summit

Wednesday, July 14th, 2010

At the 2010 Mozilla Summit, I talked about my JavaScript engine and DOM fuzzers, which have each found many hundreds of bugs. I also talked about the automations that keep me sane when I fuzz these complex components.

My slides are in the S5 web-based presentation format. You can click the Ø button to view the presentation in "handout mode" and see what I planned to say while each slide was up.

I shared a presentation slot with Mozilla contractor Paul Nickerson, who has a separate slide deck. He wisely saved the best part of his talk for the end: a demo of his font fuzzer causing Windows 7 to blue-screen.

CSS grammar fuzzer

Monday, March 16th, 2009

I wrote a CSS grammar fuzzer to test Gecko's CSS parser. This fuzzer's tricks:

Declarative context-free grammar. This makes it easy to add new CSS features to the fuzzer, or even use it to test grammars other than CSS. Each symbol can be a star, concatenation, or list of alternatives. Unlike a parser, a fuzzer has to make decisions about what to create, so alternatives can be given weights and stars have a suggested number of repetitions. This alone was enough to find at least one bug in Gecko.

Breaking rules. Like any good fuzzer, it doesn't always follow the given context-free grammar. Sometimes it does weird stuff, such as inserting a random symbol, to throw the parser off. I was surprised that this only found one additional bug in Gecko. Perhaps this reflects the comprehensive error handling requirements of the CSS specifications and the corresponding test suites.

Grammatical recursion. When the fuzzer notices that a symbol is the same as an ancestor, it can repeat the parts of the final string between the two symbols. This is effective at finding bugs where large input can cause a recursive algorithm to run out of stack space and crash. This found four grammar-recursion crashes in Gecko.

CSS serialization. The fuzzer makes sure that any text that comes out of the browser's stylesheet serializer survives another trip through the parser and serializer. This is the same trick jsfunfuzz uses to test the JavaScript engine decompiler. This helped fine four incorrect-serialization bugs in Gecko.

None of these Gecko bugs seem to be security holes. I shared the fuzzer with other browser vendors privately for over a month, and nobody asked me to delay the release, so I believe it didn't find security holes in other browsers either. But I think this has more to do with CSS parsing being fairly simple and self-contained than any weakness in the fuzzer.

After CanSecWest, I'm going to try using this fuzzer to generate JavaScript expressions that crash parsers that use recursion incautiously. Unless someone beats me to it, of course ;) (jsfunfuzz already creates many types of weird JavaScript, but can't look for grammatical recursion opportunities easily because it is written in a functional style rather than a declarative style.)

This fuzzer is written in JavaScript and is MIT licensed. I'd love to hear what other people manage to do with it. Get it here.