Performance Tip of the Week #52: Configuration knobs considered harmful

Originally posted as Fast TotW #52 on September 30, 2021

By Chris Kennelly

Updated 2023-09-30


Flags, options, and other mechanisms to override default behaviors are useful during a migration or as a short-term mechanism to address an unusual need. In the long term they go stale (not providing real benefit to users), are almost always haunted (in the haunted graveyard sense), and prevent centralized consistency/optimization efforts. In this episode, we discuss the tradeoffs in technical debt and optimization velocity for adding configurability.

The ideal flag lifecycle

When developing a new feature, it’s straightforward and often recommended to guard it behind a flag. This approach of using feature flags makes it possible to decouple pushing the code changes to production from turning on a new feature, which might have undiscovered correctness bugs or different resource requirements.

For a commonly-used library, flags also allow early opt-ins from users. When the default is changed, the flag also provides an escape hatch to revert to the old behavior.

For example, this was employed successfully for the rollout of TCMalloc’s Huge Page Aware Allocator optimization: many applications opted-in early, but even with extensive testing, a few applications saw changes in their resource requirements. These could be opted-out while deeper investigation occurred without rolling back the efficiency gains seen by most other users of TCMalloc.

These experiences suggest flags are an unalloyed good in theory, but practice is wholly different. Whether flags are considered good or not is dependent on what percentage of users will use the feature:

  • If the number of users of a flag is always expected to be small, its existence hampers future evolution.
  • If that number is mid-ranged, this can be a well-justified level of complexity but it can be challenging to set the flags optimally. Some teams have observed that, often, only the authors of features have the necessary context to set the flag appropriately - either to know when to set it at all or to which value it should actually be set.
  • If that number is near 100% – then probably we’re transitioning to a new default and the flag exists to provide an opt-out - this can be a good use of the flag. Nonetheless, it is important to clean that flag up after the rollout is complete so it doesn’t linger indefinitely. Without the cleanup, this becomes technical debt that hinders future changes or becomes a “standard knob with a weird name.”

Flags failing to convey intent

The units for many flags are entirely opaque and often have second or third order effects that may not be immediately intuitive.

In his 2021 CppCon talk, Titus Winters makes a real-world note of this phenomenon: The “popcorn button” of microwaves should not be used for microwave popcorn, as the button does not align with the settings required.

Moving to Google’s C++ codebase, SwissMap, Abseil’s high performance hashtables, does not provide an implementation of the flag max_load_factor. The low utility of max_load_factor was uncovered during the migration to SwissMap. Even worse, in many of the situations where max_load_factor was set, it was set incorrectly.

Even when the role of max_load_factor was correctly understood, its value was often misconfigured to achieve a desired goal. While max_load_factor(0.25) might convey an intent to “trade RAM for speed,” such a setting can make CPU performance worse while simultaneously using more RAM, defeating the intent of its user.

In other situations, different implementations can be API-compatible, but their behaviors do not transfer effectively between implementations. Open addressing hashtables have typical load factors <1, while chained hashtables have load factors typically ≥1. Changing between these implementations would cause the max_load_factor to have a surprisingly different effect.

This experience led the SwissMap authors to make max_load_factor a no-op, providing it only for API compatibility.

Stale configuration parameters

Tuning a configuration is another optimization that does not age well.

For flags defined in commonly used libraries, the defaults themselves have probably evolved: a feature was launched or an optimization landed. The nature of Google’s production configuration languages often means that once a service has hard-coded a flag’s value, it takes precedence over the default. This was the whole reason for choosing a non-default value in the first place; but with the codebase evolving at a high rate, it’s easy to overlook that the underlying infrastructure has improved and that overriding value now is worse than the default.

The key action here is to use customized flags lightly and regularly reconsider their use. When designing new options, prefer good defaults or make parameters self-tune if possible. Self-tuning may come in the form of adapting automatically to workloads, rather than requiring careful tuning through flags.

Reduced long-term velocity

Titus Winters notes that “If 99% of your users understand an API’s behavior through the lens of the default setting, the 1% of users that change that setting are at risk: APIs built at a higher level have a good chance of assuming the default behavior, leaving your 1% semi-supported.”

Configurability can be a great short-term boon; but long-term, configurability is a double edged sword. Options increase the state-space that has to be considered with every future change, making it more difficult to reason about, test, and successfully land new features in production. Beyond just optimizing costs, this complexity also hampers achieving better business objectives: Extra complexity that delays an improvement to product experiences is a non-obvious externality.

For example, TCMalloc has a number of tuning options and customization points, but ultimately, several optimizations came from sanding away extra configuration complexity. The rarely used malloc hooks API required careful structuring of TCMalloc’s fast path to allow users who didn’t use hooks–most users–to not pay for their possible presence. In another case, removing the sbrk allocator allowed TCMalloc to structure its virtual address space carefully, enabling several enhancements.

Beyond knobs

While this discussion has largely focused on knobs and tunables, APIs and libraries have the same challenges.

An existing library, X, might be inadequate or insufficiently expressive, which can motivate building a “better” alternative, Y, along some dimensions. Realizing the benefit of using Y is dependent on users both discovering Y and picking between X and Y correctly–and in the case of a long-lived code base, keeping that choice optimal over time.

For some uses, this strategy is infeasible. my::super_fast_string will probably never replace std::string because the latter is so entrenched and the impedence mismatch of living in an independent string ecosystem exceeds the benefits. Multiple vocabulary types suffer from impedance mismatch–costly interconversions can overwhelm the overall benefits. The costs of migrating the world also need to be considered upfront. Without active migration, we end up with two things.

There are times where a new library or API is truly needed – SwissMap needed to break stability guarantees provided by std::unordered_map on an instance-by-instance basis to avoid waiting for every problematic usage to be fixed. In that case, however, the performance benefits it provided were only realized by active migration. Being able to aim for a complete migration eases maintenance and educational burdens as well. A compelling performance case simplified to “just use SwissMap” avoids the need for painstaking benchmarking with every use where the optimal choice could get out of date.

Best practices

When adding new customization points, consider how they’ll evolve over the long-term.

  • When using flags to gate new features that will be enabled by default, make a plan for removing any opt-outs so the flag itself can be removed, rather than end up as technical debt.
  • Flags are a powerful tool for tuning and optimization, but the author of a customization point has the most context for how to use it effectively. Choosing good defaults or making features self-tune is often better for the codebase as a whole.

    Discoverability, let alone optimal selection, is challenging.

Subscribe to the Abseil Blog