Originally posted as Fast TotW #93 on May 27, 2025
Updated 2025-06-03
Quicklink: abseil.io/fast/93
Techniques like presubmits are essential tools for effective software engineering. For performance optimization, changing observable but unspecified behaviors can be a large source of opportunities. In this episode, we discuss strategies for leaning on automation and additional tools to make it easier to evolve software.
Use technical means, rather than (eventually fallible) humans, to prevent problems. Humans can be a valuable line of defense for unknown unknowns, but they cannot be the only line of defense for everything. While human-executed checklists are great, they also have to be short and that makes them unsuitable for encoding a bunch of knowledge about a problem domain. It can be tempting to add to the checklist after every incident and never subtract from it, but this leads to inevitable toil or skipped steps. Robots never get tired.
When we introduced GoodFastHash
as a faster alternative to std::hash
, it was
intended that we could change the algorithm from time to time. This property was
documented in a comment. Over time, we accreted code depending on its existing
behavior. This ranged from brittle tests relying on the stable order of
iteration of std::unordered_map
paired with GoodFastHash
to production
designs that relied on it for stable cache keys. This stymied performance and
hash quality improvements to GoodFastHash
.
Automated testing is a well-known software technique, but it can be worth bringing the automation to bear in new ways on new problems to shift-left. Consider an example outside of performance: Many outages have been caused by large, unanticipated changes in configurations. For example, a 100.0% reduction in configuration size is probably a bug (and an outage waiting to happen). An empty configuration file can be detected mechanically, leaving humans to worry about all of the much more subtle gotchas. Presubmit checks that detect and block large diffs without an explicit exception can stop many issues in their tracks.
We can apply automation towards keeping code flexible over the long run, enabling us to make it more efficient while simultaneously ensuring reliability.
As the complexity of our software stack grows, tending to every corner reliably and manually is challenging. An API might promise one thing, but the software built on top of it relies on its implementation details much farther down the stack.
The very first C++ TotW was for string_view
, which can improve
performance by avoiding string copies. This makes it attractive for regular
usage, but it is bug-prone due to dangling references.
Reference-like types like absl::string_view
and absl::Span
allow us to write
code agnostic to the underlying data storage. A string could be backed by
constant data in a global or a heap allocated copy. A container of N elements
might be stored as an absl::InlinedVector
, removing a
heap indirection in the common case.
As with other references, these types come with downsides: The usage of the
reference might not coincide with the lifetime of the underlying storage,
leading to use-after-free/use-after-return bugs. While sanitizers can detect
these bugs, they can have false negatives depending on test coverage.
Fortunately, source code annotations like ABSL_ATTRIBUTE_LIFETIME_BOUND
can
find many of them at compile time. These annotations allow us to shift-left on
finding bugs, allowing the engineer actively writing code to fix the issue right
away, rather than waiting for a production outage with its commensurately higher
costs.
Over time, we’ve developed several techniques for warding off Hyrum’s Law. While there are any number of implementation details we can obfuscate, the following techniques have proven useful:
Memory addresses for low-cost entropy: Taking the address of a global is
extremely efficient (a rip
-relative lea
on x86) and powers
absl::Hash
’s randomization. Having learned from the lessons of
GoodFastHash
, absl::Hash
deliberately used a random number to seed its
hash computation. This ensured tests were more robust because they could not
rely on fragile implementation details. Consequently, it has been possible
to land several optimizations to replace its implementation wholesale
without being blocked by brittleness.
This introduces just enough entropy from ASLR and linker orderings to make it hard to rely on its values across tests and production tasks. Similarly, heap allocations can provide instance-specific entropy. We use this in SwissMap to ensure different table instances have different iteration orders.
Extra checks in debug builds / sampling: We want to maintain implementation freedom to ensure we can improve performance in future. However, expensive checks would reduce performance if done in optimized builds, defeating part of the purpose. For these, we can put most of our checks in debug builds.
Sized delete from C++14 allows our deallocation path to avoid a costly radix tree walk, but incorrect sizes would lead to data corruption. In debug builds, TCMalloc checks the provided size against its internal metadata groundtruth. TCMalloc already samples periodically to drive heap profiling, allowing it to provide extra checks on these sampled objects to proactively find further issues in production.
Counterfactual checks: In sanitizer builds, we check that SwissMap’s iterators remain valid after any potential invalidations. Even if a particular insert doesn’t cause a rehash but could have caused one, the iterator is considered invalid. This allows us to prevent dependencies on SwissMap’s growth rate, iterator invalidation behaviors, and so forth, even if the present implementation used in day-to-day builds overdelivers against its guarantees.
Enabling these defenses widely means we get benefits from everyone’s tests. These approaches ensure that code does not depend on characteristics that are sensitive to implementation details, so we can change implementations quickly without breaking code.
Tools like Clang-Tidy can spot problematic patterns and flag them at code review time. This can shift left on preventing problems, rather than waiting for a failed runtime check in production.
For example, some protobuf optimizations are sensitive to misuse of
const_cast
. This misuse is partly stopped by placing default instances in
global, read-only storage, causing the program to crash if the instance is
mutated. Since it isn’t ideal to crash on potentially under-tested code paths
and because this technique doesn’t cover all types of misuse, a Clang-Tidy check
can flag const_cast
where the type is a protobuf instance.
Matt Kulukundis popularized the phrase “ratchet-and-pawl” to describe incremental migrations that prevented backsliding. This allows us to make progress where we can, secure in the knowledge that the problem will not get worse.
During the SwissMap migration, it was infeasible to randomize all existing
unordered containers due to failing tests and actual production dependencies on
the current behavior. Individual instances could be migrated where tests passed
(or were made to pass). As the migration progressed, visibility allowlists on
lesser used containers (dense_hash_map
) and Clang Tidy on more common ones
(std::unordered_map
) reduced new usages of the legacy containers. An allowlist
with a thousand entries might seem inelegant, but it’s a powerful tool for
preventing backsliding. The list can be ratcheted down over time by hand or by
automated cleanups as uses are removed.
Preserving implementation freedom carries costs, whether due to performance overheads when we need to work around it, opportunity costs, or simply worse ergonomics.
With hashtables, changing hash algorithms, initial sizes, and growth rates had
come up time and time again, stymied by brittle tests, prior to the introduction
of SwissMap and Abseil Hash. SwissMap using its heap allocation’s address as an
entropy source makes copies more expensive on primitive types: Rather than
simply memcpy
‘ing the raw data, we need to rehash our keys. Making the order
more deterministic could let us improve performance, but the benefits outweigh
these small costs.
As useful as it is to maintain as much implementation flexibility, it is important to focus on the implementation details most likely to change. Just because we can obscure an implementation detail doesn’t necessarily mean it’s worth the runtime and engineering cost if it’s unlikely to ever change or valuable to do so.
For large, performance-critical optimizations, we may want to carefully test that the desired performance characteristics remain, even where the code would be functionally correct with or without the optimization.
If we like our optimizations, we should put a test on it. Automation prevents regressions upfront, rather than waiting for after-the-fact debugging.
Shifting problem-finding away from humans and onto automated mechanisms lets us focus on the bigger picture and improve our velocity. When things do break, we save human time in the long-run because we can spend less effort figuring out what caused a regression.