Archive for the 'Rapid release' Category

On the Isle of Rapidity

Saturday, August 27th, 2011

Not all of our neighbors followed us. Some asked — demanded? — that we send back supplies.

We acknowledged their request, but our immediate task was to explore this Isle of Rapidity. What surprises would we discover? What surprises would discover us?

To survive in this strange land, we would have to befriend new neighbors. Living for so long atop Mount Annum, we had almost forgotten how to introduce ourselves.

But we had brought much to share. We had barely opened our packs when the wind seemed to whisper:

Here, gifts arrive almost before you send them.

Maybe it wouldn’t be so hard to make friends here.

And there was something inexplicably familiar about this island. Was it the scent of the flowers? The rhythmic waves in the distance? The chattering of wildlife, almost a chorus?

Here, a gift to your neighbor is equally a gift to yourself.

We felt a sudden shift in perception: the Isle of Rapidity was home.

Venturing from Mount Annum

Saturday, August 27th, 2011

Our friends thought us mad.

We had thrived atop Mount Annum. It was the highest peak as far as the eye could see.

Once, we had sprained an ankle on the Foothills of Many Betas in the west. We remembered miserable visits to the Bog of Eternal Driver Approval in the east. Every time, we had quickly returned home.

But there we were, on the summit of Mount Annum, climbing into a cannon aimed at 4°N, 6°W.

So far away, and yet oddly specific. We lit the fuse and plugged our ears.

We flew over the stifling Bog of Eternal Driver Approval. We flew over the perilous Sea of Recklessness.

We landed, as we hoped, on the uncharted Isle of Rapidity. The landing was unexpectedly soft.

We’ve shaken off the dust and gunpowder. We’ve begun to tend our wounds. We’re excited about the upcoming climb.

Rapid releases and crashes

Saturday, August 27th, 2011

In the months before Firefox's first rapid release, one concern echoed throughout engineering: crashes.

We had always relied on long stabilization periods to get crash rates down. Firefox 4 would be our last high-stability release. We hoped improvements on other aspects of quality would outweigh the decreased stability.

But then something surprising happened. We released Firefox 5, and Firefox didn’t get crashier.

VersionCrashes per 100 active daily users
3.6.201.8
4.0.11.6
5.0.11.4
6.01.6

KaiRo’s explanation parallels what I’ve seen helping with MemShrink:

  • The channel cascade gives each release 12 weeks of pure stabilization.
  • The channel audiences help by comparing alphas to alphas.
  • The short cycles enable backouts and reduce the desire to land half-baked features.

“Rapid release” doesn't mean building Firefox the way we always have, x times faster. It’s a new process that fits together in beautiful yet fragile ways.

Improving intranet compatibility

Thursday, August 25th, 2011

Some organizations are reluctant to keep their browsers up-to-date because they worry that internal websites might not be compatible.

Organization-internal sites can have unusual compatibility constraints. Many have small numbers of users, yet are highly sensitive to downtime. Some were developed with the assumption that the web would always be as static as it was in 2003.

Rapid releases help in some ways: fewer things change at a time; we can deprecate APIs before removing them; and the permanent Aurora and Beta audiences help test each new release consistently.

But frequent releases make manual testing impractical. (Let's pretend for now that the roughly-monthly security "dot releases" never broke anything.)

As with the problem of extension compatibility, overlapping releases could be part of a solution. But we should start by thinking about ways to attack compatibility problems directly.

Automated testing

A tool could scan for the use of deprecated web features, such as the “-moz-” prefixed version of border-radius. This tool, similar in spirit to the AMO validator, could be run on the website's source code or on the streams received by Firefox.

There is already a compatibility detector for HTML issues, but my intuition is that CSS and DOM compatibility problems are more common.

User testing

Not all visitors have the technical skills and motivation to report issues they encounter. In some organizations, bureaucracy can stifle communication between visitors and developers. Automating error reports could help.

It would be cool if a Firefox extension could report warnings and errors from internal sites to a central server.

Depending on the privacy findings from this extension, it could become an onerror++ API available to all web sites, similar to the Web Performance APIs. This seems more sensible than adding API-specific error reporting.

Sample contracts

We could suggest things to put in contracts for outsourced intranet development.

Often, the best solution is to align incentives. That could take the form of specifying that the developers are responsible for maintenance costs for a specified length of time.

When that isn't practical, I'd suggest specifying requirements such as:

Channel management

There is a roughly exponential distribution of home users between the Nightly, Aurora, Beta, and Release channels. This helps Mozilla and public web sites fix incompatibilities before they affect large numbers of users.

Large organizations should strive for a similar channel distribution so that internal websites benefit in the same way. It might make sense for Mozilla to provide tools to help, perhaps based on our own update service or Build Your Own Browser.

Counterintuitively, the best strategy for security-conscious organizations may be to standardize on the Beta channel, with the option to downgrade to Release in an emergency. This isn't as crazy as it sounds. Today's betas are as stable as yesteryear's release candidates, thanks to the Aurora audience and the discipline made possible by the 6-week cadence. And since Beta gets security fixes sooner, they are safer in some ways.

The loss of release overlap takes away some options from IT admins and intranet developers, but rapid releases also make possible new strategies that could be better in the long term.

Secure and compatible

Thursday, August 25th, 2011

Previously, I discussed some of the ways Firefox's new rapid release process improves its security. But improving Firefox's security only helps users who actually update, and some people have expressed concern that rapid releases could make it more difficult for users to keep up.

There is some consensus on how to make the update process smoother and how to reduce the number of regressions that reach users. But incompatibilities pose a tougher problem.

We don't want users to have to choose between insecure and incompatible.

There are three ways to avoid facing Firefox users with this dilemma:

  • Prevent incompatibilities from arising in the first place.
  • Hasten the discovery of incompatibilities, and fix them quickly.
  • Overlap releases by delaying the end-of-life for the old version.

Overlapping releases

For many years, Mozilla tried to prevent this dilemma by providing overlap between releases. For example, Firefox 2 was supported for six months after Firefox 3.

We observed two security-related patterns that made us reluctant to continue providing overlapping releases:

First, some fixes were never backported, so users on the "old but supported" version were not as safe as they believed. Sometimes backports didn't happen because security patches required additional backports of architectural changes. Sometimes we were scared to backport large patches because we did not have large testing audiences for old branches. Some backports didn't happen because they affected web or add-on compatibility. And sometimes backports didn't happen simply because developers aren't focused on 2-year-old code. Providing old versions gave users a false sense of security.

Second, a feedback cycle developed between users lagging behind and add-ons lagging behind. Many stayed on the old version until the end-of-life date, and then encountered surprises when they were finally forced to update. Providing old versions did not actually shield users from urgent update dilemmas.

I feel we can only seriously consider returning to overlapping releases if we can first overcome these two problems.

Improving add-on compatibility

Everything we do to improve add-on compatibility helps to break the lag cycle.

The most ambitious compatibility project is Jetpack, which aims to create a more stable API for simple add-ons. Jetpack also has a permission model that promises to reduce the load on add-on reviewers, especially around release time.

Add-ons hosted on AMO now have their maxVersion bumped automatically based on source code scans. Authors of non-AMO-hosted add-ons can use the validator by itself or run the validator locally.

This month, Fligtar announced a plan of assuming compatibility. Under this proposal, only add-ons with obvious, fatal incompatibilities (such as binary XPCOM components compiled against the wrong version) will be required to bump maxVersion.

With assumed compatibility, we will be able to make more interesting use of crowdsourced compatibility data from early adopters. Meanwhile, the successful Firefox Feedback program may expand to cover add-ons.

Breaking the add-on lag cycle

Even with improvements to add-on compatibility, some add-ons will be updated late or abandoned. In these cases, we should seek to minimize the impact on users.

In Aurora 8, opt-in for third-party add-ons allows users to shed unwanted add-ons that had been holding them back.

When users have known-incompatible add-ons, Firefox should probabilistically delay updates for a few days. Many add-ons that are incompatible on the day of the release quickly become compatible. Users of those add-ons don't need to suffer through confusing UI.

But after a week, when it is likely that the add-on has been abandoned, Firefox should disable the add-on in order to update itself. It would be dishonest to ask users to choose between security and functionality once we know the latter is temporary.

Once we've solved the largest compatibility problems, we can have a more reasonable discussion about the benefits and opportunity costs of maintaining overlapping releases.

Rapid releases and security

Thursday, August 25th, 2011

Several people have asked me whether Mozilla's move to rapid releases has helped or hurt Firefox's security.

I think the new cadence has helped security overall, but it is interesting to look at both the ways it has helped and the ways it has hurt.

Security of new features

With release trains every 6 weeks, developers feel less pressure to rush half-baked features in just before each freeze.

Combined with Curtis's leadership, the rapid release cycle has made it possible for security reviews to happen earlier and more consistently. When we need in-depth reviews or meetings, we aren't overwhelmed by needing twenty in one month.

The rapid release process also necessitated new types of coordination, such as the use of team roadmaps and feature pages. The security team is able to take advantage of the new planning process to track which features might need review, even if the developers don't come to the security team and ask.

Security reviews are also more effective now. When we feel a new feature should be held back until a security concern is fixed, we create less controversy when the delay is only 6 weeks.

Security improvements and backports

Many security improvements require architectural changes that are difficult to backport to old versions. For example:

  • Firefox 3.6 added frame poisoning, making it impossible to exploit many crashes in layout code. Before frame poisoning, the layout module was one of the largest sources of security holes.
  • Firefox 4 fixed the long-standing :visited privacy hole. Mozilla was the first to create a patch, but due to Firefox's long development cycle, several other browsers shipped fixes first.
  • Firefox 6 introduced WeakMap, making it possible for extensions to store document-specific data without leaking memory and without allowing sites to see or corrupt the information. Extension authors might not be comfortable using WeakMap until most users are on Firefox 6 or higher.

Contrary to the hopes of some Linux distros and IT departments, it is no longer possible to backport "security fixes only" and have a browser that is safe, stable, and compatible.

We're constantly adding security features and mitigations for memory safety bugs. We're constantly reducing Firefox's attack surface by rewriting components from C++ into JavaScript (and soon Rust).

Disclosure windows

One area where rapid releases may hurt is the secrecy of security patches.

We keep security bug reports private until we've shipped a fix, but we check security patches into a public repository. Checking in patches allows us to test the fixes well, but opens the possibility of an attacker reverse-engineering the bug.

We check security patches into a public repository for two reasons:

  • We cannot rely entirely on in-house testing. Because of the variety of websites and extensions, our experience is that some regressions introduced by security patches are only found by volunteer testers.
  • We cannot ship public binaries, even just for testing, based on private security patches. This would violate both the spirit and letter of some open-source licenses. It also wouldn't be effective for secrecy: attackers have been using binary patch analysis to attack unpatched Microsoft software. (And that's without access to the to the old source!)

But we can shorten and combine the windows between patch disclosure and release. Instead of landing security patches on mozilla-central and letting them reach users in the normal 12-18 weeks, we can:

  • Land security fixes to all active branches at once. This may be safer now that the oldest active branch is at most 18 weeks old compared to mozilla-central. (In contrast, the Firefox 3.6.x series is based on a branch cut 2009-08-13, making it 24 months old.) But even 18 weeks is enough to create some regression risk, and it is not clear which audiences would test fixes on each branch.
  • Accelerate the channel flow for security bugs. For example, hold security fixes in a private repository until 3 weeks before each release. Then, give them a week on nightly, a week on aurora, and a week on beta. The same channel audiences that make rapid release possible would test security fixes, just for a shorter period of time. Something like this already happens on an ad-hoc basis in many bugs; formalizing the process could make it smoother.

There are significant security wins with rapid releases, and a few new challenges. Overall, it's a much better place to be than where we were a year ago.

(I posted followups about add-on compatibility and intranet compatibility.)