Crashes |
Hangs |
Assertion failures |
Leaks |
Valgrind warnings |
If Firefox crashes as a result of some input, it's a bug. Valgrind is a memory debugger; it will tell you about things that could have crashed or could result in undefined behavior. Assertions document what the developer believes to be true, and allows finding bugs before they lead to crashes.
These are the kinds of bugs you can find without knowing what output is expected, without being a UI designer or standards expert.
(function() { var x; eval("for (x in (gc)()) for each(e in [0]) { print }") })()
triggers
Assertion failure: !JSVAL_IS_PRIMITIVE(regs.sp[-2]), at ../jsops.cpp:489
jsfunfuzz is 4000 lines of randomly-recursive JavaScript. The way it generates input (functions) is pretty boring: there are functions like "makeStatement" and "makeExpression", and each picks a template at random and fills it in with calls to other make* functions.
Crashes |
Assertion failures |
Leaks |
Valgrind warnings |
It can find all the normal "easy to detect" kinds of bugs. But it accidentally puts the engine into an infinite loop or exponential blowup a lot, so I don't really bother with hangs. But also...
Crashes |
Assertion failures |
Leaks |
Valgrind warnings |
Cross-engine consistency |
Decompiler consistency |
var v = 0; for each (var a in [0, {}, {}, {}]) { v = v >>> 0; for each (var b in [{}, {}, new String(''), 42, new String(''), {}, 42]) { } } print(v);
When you refactor, or introduce a new execution mode, you can feed fuzz inputs into both engines to ensure they give the same answers. Here I was trying to find bugs in the tracing JIT, which does special things when variables change type, so I instructed the fuzzer to make code where variables change type.
There's this site called Translation Party where you can see what happens when Google Translate tries to take a sentence to Japanese and back. With some inputs it detects convergence right away.
But with other inputs, the translator has some trouble. "Isn't it ironic" became "Is it not ironic". Ok, that's not too bad. But then it turned into "Ironically, it is not", which has a very different meaning. Maybe it's a comment about the song.
I don't know Japanese, so I can't tell you exactly where it went wrong. But I think it's safe to say something did go wrong.
So here's the same idea applied to JavaScript. When SpiderMonkey prints out a function, it's actually decompiling bytecode. In this case, the compiler turned my function into 5 bytecodes, and the decompiler turned it back into exactly the same text I had entered.
Now, I can read bytecode about as well as I can read Japanese, but seeing the round-trip work gives me some confidence that both parts worked correctly.
Something went wrong here. In fact, the first one passes undefined to C, and the second one passes 0. The decompiler screwed up by not including enough parentheses.
Seems like a lot of bugs, but it was over a period of 4 years. Some were regressions that were only in trunk for a few days. On the other hand, some were 10 years old.
The JavaScript engine team loves getting bugs from the fuzzer. Because otherwise they'll hit the bug in some complicated web page, all tangled up in application logic and jQuery.
The first part of the DOM fuzzer just picks two nodes at random in a document and performs an appendChild or insertBefore call. (In other words, it picks a node to move and a place to put it.) It quickly turns any page into a DOM soup, so I called it Stir DOM.
Very easy to implement and reasonably effective. Great bang-for-buck, 300 bugs out of maybe 30 lines of JS. This is how I got started and it's a good place to start fuzzing a new feature.
After running for a while you have a "DOM soup" containing the ingredients of the original page but all mixed together. The effectiveness depends on starting points. I used to use real web pages, but now I use crashtest and reftest files from our own regression test suite. The starting points can be really good by choosing features that go well together (tables; image maps) that would be hard to create randomly.
You're testing not only the final rendering but also how DOM and layout code react to dynamic changes. This code is harder to get right, more performance-sensitive, and has more room for dangling-pointer bugs.
Lists of elements, attributes, attribute values. Requires reading specs or implementations or documentation, or following bugs and commits, to compile those lists. Great at creating weird combinations of elements, but not so great at building meaningful combinations. But we have Stir DOM and the starting points for that.
This information is used not only for DOM but also for setting innerHTML. Without these modules, rare elements would get little testing, especially with other rare elements.
There are also a bunch of other modules. I don't have time to talk about them all, but I'd like to point out that mutation events are evil.
嵠 ̇ ]]>