Tip of the Week #147: Use Exhaustive switch Statements Responsibly

Originally published as TotW/147 on 2018-04-25

by Jim Newsome ([email protected])

Introduction

Using the -Werror compiler flag, a switch statement over a value of an enum type without a default label will fail to compile if any enumerator of the enum doesn’t have a corresponding case. This is sometimes called an exhaustive or defaultless switch statement.

An exhaustive switch statement is an excellent construct for ensuring at compile time that every enumerator of a given enum is explicitly handled. However, we must ensure that we handle the fall-through case when the variable (legally!) has a non-enumerator value, and one of:

  1. The owner of the enum guarantees no new enumerators will be added,
  2. The owner of the enum is willing and able to fix our code when new enumerators are added (e.g. the enum definition is part of the same project),
  3. The owner of the enum will not be blocked by breaking our build (e.g. their code is in a separate source control repository), and we’re willing to be forced to update our switch statements when updating to the latest version of the enum owner’s code.

An Initial Attempt

Suppose we are writing a function that maps each enumerator of an enum to a std::string. We decide to use an exhaustive switch statement to ensure we didn’t forget to handle any of the enumerators:

std::string AnEnumToString(AnEnum an_enum) {
  switch (an_enum) {
    case kFoo:
      return "kFoo";
    case kBar:
      return "kBar";
    case kBaz:
      return "kBaz";
  }
}

Assuming that AnEnum indeed has only those three enumerators, this code will compile, and will seem to have the desired effect. However, there are two important issues that must be accounted for.

Enums with Non-Enumerator Values

In C++, enums are permitted to have values other than the explicit enumerators. All enums can legally take on at least all of the values representable by an integral type with just enough bits to represent every enumerator, and enums with a fixed underlying type (e.g. those declared with enum class) can take on any value representable by that type. This is sometimes intentionally leveraged to use an enum as a bitfield or to represent enumerators that didn’t exist when we compiled our code (as in proto 3).

So what happens in our code if an_enum isn’t one of the handled enumerator types?

In general when a switch statement doesn’t have a case matching the switch condition and doesn’t have a default case, execution falls through past the whole switch statement. This can lead to surprising behavior; in our example it leads to undefined behavior. After execution falls through the switch statement, it reaches the end of the function without returning a value, which is undefined behavior for a function with a non-void return type.

We can address this issue by explicitly handling the case where execution falls through the switch statement. This ensures we always get defined and predictable behavior at run time, while continuing to benefit from the compile-time check that all enumerators are explicitly handled.

In our example, we’ll log a warning and return a sentinel value. Another reasonable alternative, especially if we’re convinced that the function (currently) can’t receive a non-enumerator value, would be to immediately crash with a debuggable error message and stack trace, e.g. using LOG(FATAL).

std::string AnEnumToString(AnEnum an_enum) {
  switch (an_enum) {
    case kFoo:
      return "kFoo";
    case kBar:
      return "kBar";
    case kBaz:
      return "kBaz";
  }
  LOG(ERROR) << "Unexpected value for AnEnum: " << an_enum;
  return kUnknownAnEnumString;
}

We’ve now ensured that something reasonable happens for any possible value of an_enum, but there’s still potentially a problem.

What Happens When a New Enumerator Is Added?

Suppose someone later wants to add a new enumerator to AnEnum. Doing so causes AnEnumToString to no longer compile. Whether that’s a bug or a feature depends on who owns AnEnum and what guarantees they provide.

If AnEnum is part of the same project as AnEnumToString, then the engineer adding a new enumerator is likely to be blocked from submitting their change before fixing AnEnumToString due to compilation errors. They are also reasonably likely to be willing and able to do so. In this case our use of an exhaustive switch statement is a win: it successfully ensured that the switch statement is updated appropriately, and everyone is happy.

Similarly, if AnEnum is part of a different project in a different repository, then the breakage won’t surface until the engineers on our project try to update to a newer version of that code. If we expect that those engineers will be willing and able to fix the switch statement, then all is well.

However, if AnEnum is owned by a different project in the same repository the situation is a bit more precarious. A change to AnEnum might cause our code to greak at head, and the engineer making the change might not be willing or able to fix it for us. Indeed, if there were many similar exhaustive switch statements over AnEnum, it’d be extremely challenging for them to fix all such usages.

For these reasons, it’s best to use exhaustive switch statements only on enum types that either we own, or whose owner has explicitly guaranteed that no new enumerators will be added.

In our example, let’s suppose that AnEnum is owned by a different project, but the documentation promises that no new enumerators will be added. Let’s add a comment so that future readers understand our reasoning.

std::string AnEnumToString(AnEnum an_enum) {
  switch (an_enum) {
    case kFoo:
      return "kFoo";
    case kBar:
      return "kBar";
    case kBaz:
      return "kBaz";
    // No default. The API of AnEnum guarantees no new enumerators will be
    // added.
  }
  LOG(ERROR) << "Unexpected value for AnEnum: " << an_enum;
  return kUnknownAnEnumString;
}

Conclusions

Exhaustive switch statements can be an excellent tool for ensuring that all enumerators are explicitly handled, provided that we:

  • Explicitly handle the case where the enum has a non-enumerator value, falling through the entire switch statement. In particular if the enclosing function has a return value, we must ensure that the function still either returns a value or crashes in a well-defined and debuggable way.
  • Ensure that one of:
    • The owner of the enum type either guarantees no new enumerators will be added,
    • The owner of the enum is willing and able to fix our code when new enumerators are aded,
    • If our code uses exhaustive switch statements and is broken due to an enumerator being added, the owner of the enum is not blocked by this breakage.

When making an enum type available to other projects, we should either:

  • Explicitly guarantee that no new enumerators will be added, so that users can take advantage of exhaustive switch statements.
  • Explicitly reserve the right to add new enumerators without notice, to discourage consumers from writing exhaustive switch statements. One idiomatic way of doing so is to add a sentinel enumerator clearly not meant to be used in API consumers’ exhaustive switch statements; e.g. kNotForUseWithExhaustiveSwitchStatements.

FAQ

Why does the compiler allow omitting a return statement after an exhaustive switch?

Omitting a final return can be safe if additional steps are taken to ensure that the enum variable can only be one of its enumerators. It’s often better in such cases to still defensively add a final return or LOG(FATAL), but enough legacy code exists without those that google3’s default compiler flags allow code to compile without them.

The enum I’m switching on already has exhaustive switch statements all over the place. Since the owners are already effectively prevented from adding new enumerators, won’t adding my own exhaustive switch statement be harmless?

It’s usually better to get an explicit policy from the owner before further increasing their maintenance burden.

What about protobuf enums?

For authoritative guidance, see the protobuf documentation.

Exhaustive switch statements on proto3 enum types are not recommended. The parser doesn’t guarantee that enum fields will have enumerator values. Additionally, it’s not possible to write an exhaustive switch statement over proto3 enum types without referencing special sentinel enumerators that should be considered internal implementation details of the protobuf tools.

Exhaustive switch statements on proto2 enum types that you own (or whose owners guarantee will never be moved to proto3 and will never have new enumerators added) are safe and recommended by the protobuf team. The protobuf parser guarantees that enum fields will be assigned a compile-time enumerator, though care should still be taken if the enum value isn’t guaranteed to have come from the parser (e.g. if it’s part of a proto object received as a function parameter).

What about scoped enumerations (enum class)?

Everything in this tip applies to all enumeration types in C++ at time of writing (i.e. through at least C++20).

References


Subscribe to the Abseil Blog