Tip of the Week #165: `if` and `switch` statements with initializers

Originally posted as TotW #165 on August 17, 2019

Updated 2020-01-17

Unless you use conditional control flow, you can stop reading now.

A new syntax

C++17 allows if and switch statements to include an initializer:

if (init; cond) { /* ... */ }
switch (init; cond) { /* ... */ }

This syntax lets you keep the scope of variables as tight as possible:

if (auto it = m.find("key"); it != m.end()) {
  return it->second;
} else {
  return absl::NotFoundError("Entry not found");
}

The semantics of the initializer are exactly as in the for statement; details below.

When this is useful

One of the most important ways to manage complexity is to break complex systems down into non-interacting, local parts that can be understood in isolation and ignored in their entirety. In C++, the presence of variables increases complexity, and scopes allow us to limit the extent of this complexity: the less a variable is in scope, the less often a reader needs to remember that the variable exists.

When demanding reader attention, it is thus valuable to limit the scopes of variables to where they are actually needed. The new syntax offers one new tool for this. Contrast then this new syntax with the alternative code one would have written prior to C++17: Either we keep the scopes tight, and thus need to write additional braces:

{
  auto it = m.find("key");
  if (it != m.end()) {
    return it->second;
  } else {
    return absl::NotFoundError("Entry not found");
  }
}

Or, as seems to be the more typical solution, we do not keep the scopes tight and just “leak” the variables:

auto it = m.find("key");
if (it != m.end()) {
  return it->second;
} else {
  return absl::NotFoundError("Entry not found");
}

By contrast, the new style is self-contained: It is not possible to move the if statement without also moving the variable and its scope. The local meaning of the variable remains unchanged as code is moved around or copy-pasted. With the previous styles, code movement could accidentally change the scope of the variable (if the outer braces are not copied), or its meaning (if the variable itself is not copied and a variable with that name is in scope), or introduce a name clash.

The complexity considerations lead to the common adage that variable name length should match the variable scope’s size; that is, variables that are in scope for longer should have longer names (since they need to make sense to a reader that has long moved on). Conversely, smaller scopes permit shorter names. When variable names are leaked (as above), we see regrettable patterns emerge such as: multiple variables it1, it2, … become necessary to avoid clashes; variables are reassigned (auto it = m1.find(/* ... */); it = m2.find(/* ... */); or variables get intrusively long names (auto database_index_iter = m.find(/* ... */)).

Details, scopes, declarative regions

The new, optional initializer in if and switch statements works exactly like the initializer in a for statement. (The latter is essentially a while statement with initializer.) That is, the syntax-with-initializer is mostly just syntactic sugar around the following rewrites:

Sugared form	Rewritten as
`if (init; cond) BODY`	`{ init; if (cond) BODY }`
`switch (init; cond) BODY`	`{ init; switch (cond) BODY }`
`for (init; cond; incr) BODY`	`{ init; while (cond) { BODY; incr; } }`

Importantly, the names declared in the initializer are in scope of a potential else arm of an if statement.

There is one difference, though: In the sugared form, the initializer is in the same scope as the condition and body (of both the if and the else arm), rather than in a separate, larger scope. This means that variable names must be unique across all these parts, though they may shadow earlier declarations. The following examples illustrate the various disallowed redeclarations and allowed shadowing declarations:

int w;

if (int x, y, z; int y = g()) {   // error: y redeclared, first declared in initializer
  int x;                          // error: x redeclared, first declared in initializer
  int w;                          // OK, shadows outer variable
  {
    int x, y;                     // OK, shadowing in nested scope is allowed
  }
} else {
  int z;                          // error: z redeclared, first declared in initializer
}

if (int w; int q = g()) {         // declaration of "w" OK, shadows outer variable
  int q;                          // error: q redeclared, first declared in condition
  int w;                          // error: w redeclared, first declared in initializer
}

Interaction with structured bindings

C++17 also introduces structured bindings, a mechanism to assign names to the elements of a “destructurable” value (such as a tuple, an array, or a simple struct): auto [iter, ins] = m.insert(/* ... */);

That feature plays nicely with the new initializer in the if statement:

if (auto [iter, ins] = m.try_emplace(key, data); ins) {
  use(iter->second);
} else {
  LOG(ERROR) << "Key '" << key << "' already exists.";
}

Another example comes from using C++17’s new node handles that allow true moving of elements between maps or sets without copying. This feature defines an insert-return-type that is destructurable and that results from inserting a node handle:

if (auto [iter, ins, node] = m2.insert(m1.extract(k)); ins) {
  LOG(INFO) << "Element with key '" << k << "' transferred successfully";
} else if (!node) {
  LOG(ERROR) << "Key '" << k << "' does not exist in first map.";
} else {
  LOG(ERROR) << "Key '" << k << "' already in m2; m2 unchanged; m1 changed.";
}

Conclusion

Use the new if (init; cond) and switch (init; cond) syntax when you need a new variable for use within the if or switch statement that is not needed outside of it. This simplifies the ambient code. Moreover, since the variable’s scope is now small, its name can be shorter, too.