War on Orange update
Clint Talbert organized a meeting today on the topic of the intermittent failures. It was well-attended by members of the Automation, Metrics, and Platform teams, but we forgot to invite the Firefox front-end team.
There was some discussion of culture and policy around intermittence. For example, David Baron promoted the idea of estimating regression ranges for intermittent failures, and backing out patches suspected of causing the failures. But most of the meeting focused on metrics and tools.
Joel Maher demonstrated Orange Factor, which calculates the average number of intermittent failures per push. It shows that the average number of oranges dropped from 5.5 in August to 4.5 in September.
Daniel Einspanjer is designing a database for storing information about Tinderbox failures. He wants to know the kinds of queries we will run so he can make the database efficient for common queries. Jeff Hammel, Jonathan Griffin, and Joel Maher will be working on a new dashboard with him.
Two key points were raised about the database. The first is that people querying "by date" are usually interested in the time of the push, not the time the test suite started running. There was some discussion of whether we need to take the branchiness of the commit DAG into account, or whether we can stick with the linearity of pushes to each central repository.
The second key point is that we don't consistently have one test result per test and push. We might have skipped the test suite because the infrastructure was overloaded, or because someone else pushed right away. Another failure (intermittent or not) might have broken the build or made an earlier test in the suite cause a crash. Contrariwise, we might have run a test suite multiple times for a single push in order to help track down intermittent failures! The database needs to capture this information in order to estimate regression ranges and failure frequencies accurately.
We also discussed the sources of existing data about failures: Tinderbox logs, "star" comments attached to the logs, and bug comments created by TBPLbot (example) when a bug number in a "star" comment matches a bug number that had been suggested based on its summary. Each source of data has its own types of noise and gaps.
October 3rd, 2010 at 3:22 pm
It may be worth checking out bbchop (http://github.com/Ealdwulf/bbchop) a tool for tracking down intermittent bugs.
October 26th, 2010 at 12:32 pm
[…] have silently put up a tool call Orange Factor early last month as part of the War On Orange (WOO) project. Over the last few weeks I have been iterating on this […]