Performance Tip of the Week #87: Two-way doors

Originally posted as Fast TotW #87 on October 16, 2024

Updated 2025-05-12

Jeff Bezos divides decisions between “one-way doors”–ones that are hard to reverse–and “two-way doors”–those that are easy to reverse. Different optimizations fall on each side of this divide. In this episode, we discuss patterns common to two-way doors to reduce risk without exhaustively analyzing the situation. Good decisions endure, while missteps can be corrected along the way.

Assessing reversibility

As we explore a new optimization idea, we want to prioritize blockers to landing and ignore (for now) less important details. This is easier said than done, since we need to figure out which subproblems are actually on our critical path and those which can be ignored.

Outside of major technical blockers–does a proposed optimization work at all, or to a sufficient degree to be meaningful–an important consideration is which decisions we need to make upfront because they are hard to change later. We approach this with a strong prior that most software decisions are easy to reverse, but discuss the hallmarks of those that are harder to.

Feature flags

Feature flags are a common technique for gating new functionality so that it can be developed and gradually tested before being rolled out. If an issue is recognized with a release, the flag update can be rolled back and the system restored to the previous, good known state.

Broadly speaking, fine-grained flag changes are easy to undo and thus two-way doors. The system was working before and we can go back to the drawing board, if and when a problem arises, to remedy it if we need to rollback.

Flags aren’t all equally important: A flag to launch (or turn down) a major product feature is far more weighty than a flag that controls the size of a cache as part of an optimization. Flags gating major product features, however, undergo far more consideration as part of the launch process.

APIs

Unlike flags where we can make a central decision to roll forward or roll back a particular setting and even adopt application-specific values (at least temporarily), new API surfaces can prove to be more of a one-way door at times.

While a new library is being developed, a build visibility rule might help constrain adoption to a manageable set of users. This can provide invaluable feedback on the ergonomics of the library and further battletesting while leaving things in a tractable enough state to “undo” later and migrate away from it.

For optimization projects, we might consider how impact scales with adoption. For example, adopting RCU might get most of its benefits from tackling a handful of the most contended data structures, but a vocabulary type like SwissMap might need thousands of usages to get meaningful traction. For SwissMap, nearly drop-in API compatibility makes a hypothetical rollback possible, whether by changing usage or by making the implementation a wrapper for std::unordered_map.

At the other end of the spectrum are new programming paradigms. While a simple API might be possible to decompose in terms of other libraries, moving back and forth between coroutines and other asynchronous primitives might be challenging at the very least.

Similar considerations apply to releasing open source libraries: Having internal experience first, where it’s possible to talk to every user and update them as needed, can provide needed confidence that rough edges have been sanded down. Once released, compatibility guarantees might make it more challenging to make substantial changes without breaking existing users.

Data at rest versus data in flight

Data formats that only live “in flight,” for example, during an active RPC, face far different reversibility considerations than data that lives at rest, for example, if stored to disk.

Protocol buffers serve both of these roles by joining an in-memory representation to a wire format.

If we change an implementation detail of the in-memory format, whether by optimizing our parsing routines or changing how we lay out fields, we can improve efficiency without long-term ramifications. The data is “in flight” and the layout doesn’t have to be consistent or compatible with future (or past) binaries.

When we look at the wire format, some data will be serialized, sent over the network, and immediately decoded by another server. If we were to introduce a new wire format, we might consider negotiating this new version and only if both ends of the connection supported it, use the new format instead. This transition period of limiting ourselves to “in flight” data gives us a series of breadcrumbs to follow for undoing it: We can stop negotiating the new format and phase it out if we need to.

Once this data is persisted to disk where it becomes “at rest” data, we face a different set of considerations: We need to be able to read that data for as long as it is useful. Practically speaking, this might be “forever” if we cannot centrally transcode formats.

Experimentation

In their ideal lifecycle, feature flags aid the rollout of features and then are removed when we get to complete adoption. One challenge might be where a new feature introduces drastically different tradeoffs between applications, for example, one that saves CPU but causes some applications to use more RAM. While it might be best to avoid this altogether or to self-tune, these still might be desirable optimizations.

Experiments can help us dip our toe into the waters of a change while still allowing us to centrally roll things back. By making the change over a small slice of each application, we can avoid “getting stuck” in a half-way state between some users having the new feature and others not. Even if the change is very beneficial for some users who might be hesitant to see it rolled back, the narrow slice minimizes how sticky it might be. This can allow us to fine-tune things to sandblast away the roughest parts of the tradeoff.

Dark and counterfactual launches

A distinct kind of running experimental functionality in production is to enable it in parallel with the status quo in dark launch. For example, to consider different compression algorithms, we can compress a small fraction of the data using experimental settings, discarding it. Comparing this counterfactual data with the still-enabled primary compression algorithm we can choose the best settings for the given application. This approach isn’t always applicable–for example, we can’t use it for decompression–but when it works it is a useful tool in higher-risk scenarios such as experimenting with data-at-rest.

Handling bugs and regressions

Bugs, regressions, and outages are never fun, but they are inevitable as the system changes, whether intentionally (code updates) or unintentionally (workloads shift). The goal isn’t to get to zero risk, but instead is to manage it. The possibility of a bug does not make the door one-way.

Regardless of the type of change we’re making, we still need to do testing, progressive rollouts, and monitoring to look for issues. When we are trying to decide how to proceed, we should aim for how additional flight miles can let us gain new information to make progress, since no amount of analysis will ever completely derisk things.

Conclusion

Thinking about how reversible decisions are can help avoid analysis paralysis. Many software decisions are easily reversed, allowing us to mitigate the risk of regressions from changes and to shift our focus onto the more consequential decisions.

Performance Guide

Fast Tips

Fast TotW #7: Optimizing for application productivity

Fast TotW #9: Optimizations past their prime

Fast TotW #21: Improving the efficiency of your regular expressions

Fast TotW #39: Beware microbenchmarks bearing gifts

Fast TotW #52: Configuration knobs considered harmful

Fast TotW #53: Precise C++ benchmark measurements with Hardware Performance Counters

Fast TotW #60: In-process profiling: lessons learned

Fast TotW #62: Identifying and reducing memory bandwidth needs

Fast TotW #64: More Moore with better API design

Fast TotW #70: Defining and measuring optimization success

Fast TotW #72: Optimizing optimization

Fast TotW #74: Avoid sweeping street lights under rugs

Fast TotW #75: How to microbenchmark

Fast TotW #79: Make at most one tradeoff at a time

Fast TotW #83: Reducing memory indirections

Fast TotW #87: Two-way doors

Fast TotW #88: Measurement methodology: Avoid the jelly beans trap

Fast TotW #90: How to estimate

Fast TotW #93: Robots never sleep