Tip of the Week #140: Constants: Safe Idioms

Originally posted as TotW #140 on December 8, 2017

By Matt Armstrong

Updated 2020-05-06

Quicklink: abseil.io/tips/140

What are best ways to express constants in C++? You probably have an idea of what the word means in English, but it is easy to express the concept incorrectly in code. Here we’ll first define a few key concepts, then get to a list of safe techniques. For the curious, we then go into more detail about what can go wrong, and describe a C++17 language feature that makes expressing constants easier.

There is no formal definition of a “C++ constant” so let’s propose an informal one.

  1. The value: A value never changes; five is always five. When we want to express a constant, we need a value, but only one.
  2. The object: At each point in time an object has a value. C++ places a strong emphasis on mutable objects, but mutation of constants is disallowed.
  3. The name: Named constants are more useful than bare literal constants. Both variables and functions can evaluate to constant objects.

Putting that all together, let’s call a constant a variable or function that always evaluates to the same value. There are a few more key concepts.

  1. Safe Initialization: Many times constants are expressed as values in static storage, which must be safely initialized. For more on that, see the C++ Style Guide.
  2. Linkage: Linkage has to do with how many instances (or “copies”) of a named object there are in a program. It is usually best for a constant with one name to refer to a single object within the program. For global or namespace-scoped variables this requires something called external linkage (you can read more about linkage here).
  3. Compile-time evaluation: Sometimes the compiler can do a better job optimizing code if a constant’s value is known at compile time. This benefit can sometimes justify defining the values of constants in header files, despite the additional complexity.

When we say we’re “adding a constant” we’re actually declaring an API and defining its implementation in such a way that satisfies most or all of the above criteria. The language doesn’t dictate how we do this, and some ways are better than others. Often the simplest approach is declaring a const or constexpr variable, marked as inline if it’s in a header file. Another approach is returning a value from a function, which is more flexible. We’ll cover examples of both approaches.

A note on const: it isn’t enough. A const object is read-only but this does not imply that it is immutable nor does it imply that the value is always the same. The language provides ways to mutate values we think of as const, such as the mutable keyword and const_cast. But even straightforward code can demonstrate the point:

void f(const std::string& s) {
  const int size = s.size();
  std::cout << size << '\n';
}

f("");  // Prints 0
f("foo");  // Prints 3

In the above code size is a const variable, yet it holds multiple values as the program runs. It is not a constant.

Constants in Header Files

All of the idioms in this section are robust and recommendable.

An inline constexpr Variable

From C++17 variables can be marked as inline, ensuring that there is only a single copy of the variable. When used with constexpr to ensure safe initialization and destruction this gives another way to define a constant whose value is accessible at compile time. See Tip #168 for more information.

// in foo.h
inline constexpr int kMyNumber = 42;
inline constexpr absl::string_view kMyString = "Hello";

An extern const Variable

// Declared in foo.h
extern const int kMyNumber;
extern const char kMyString[];
extern const absl::string_view kMyStringView;

The above example declares one instance of each object. The extern keyword ensures external linkage, while the const keyword helps prevent accidental mutation of the value. This is a fine way to go, though it does mean the compiler can’t “see” the constant values. This limits their utility somewhat, but not in ways that matter for typical use cases. It also requires defining the variables in the associated .cc file.

// Defined in foo.cc
constexpr int kMyNumber = 42;
constexpr char kMyString[] = "Hello";
constexpr absl::string_view kMyStringView = "Hello";

The constexpr keyword ensures each variable is a constant, is compile-time initialized, and has a trivial destructor. This is a convenient way to ensure it meets the style guide rules for globals.

You should define the variables in the .cc file with constexpr, unless you need to support an old toolchain.

NOTE: absl::string_view is a good way to declare a string constant. The type has a constexpr constructor and a trivial destructor, so it is safe to declare instances of it as global variables. Because a string_view knows its length, using them does not require a runtime call to strlen().

A constexpr Function

A constexpr function that takes no arguments will always return the same value, so it functions as a constant, and can often be used to initialize other constants at compile time. Because all constexpr functions are implicitly inline, there are no linkage concerns. The primary disadvantage of this approach is the limitations placed on the code in constexpr functions. Secondarily, constexpr is a non-trivial aspect of the API contract, which has real consequences .

// in foo.h
constexpr int MyNumber() { return 42; }

An Ordinary Function

When a constexpr function isn’t desirable or possible, an ordinary function may be an option. The function in the following example can’t be constexpr because it has a static variable:

inline absl::string_view MyString() {
  static constexpr char kHello[] = "Hello";
  return kHello;
}

NOTE: make sure you use static constexpr specifiers when returning array data, such as char[] strings, absl::string_view, absl::Span, etc, to avoid subtle bugs.

A static Class Member

Static members of a class are a good option, assuming you are already working with a class. These always have external linkage.

// Declared in foo.h
class Foo {
 public:
  static constexpr int kMyNumber = 42;
  static constexpr char kMyHello[] = "Hello";
};

Prior to C++17 it was necessary to also provide definitions for these static data members in a .cc file, but for data members that are both static and constexpr these are now unnecessary (and deprecated).

// Defined in foo.cc, prior to C++17.
constexpr int Foo::kMyNumber;
constexpr char Foo::kMyHello[];

It isn’t worth introducing a class just to act as a scope for a bunch of constants. Use one of the other techniques instead.

Discouraged Alternatives

#define WHATEVER_VALUE 42

Using the preprocessor is rarely justified, see the style guide.

enum : int { kMyNumber = 42 };

The enum technique used above can be justified in some circumstances. It produces a constant kMyNumber that cannot cause the problems talked about in this tip. But the alternatives already listed will be more familiar to most people, and so are generally preferred. Use an enum when it makes sense in its own right (for examples, see Tip #86 “Enumerating with Class”).

Approaches that Work in Source Files

All of the approaches described above also work within a single .cc file, but may be unnecessarily complex. Because constants declared within a source file are visible only inside that file by default (see internal linkage rules), simpler approaches, such as defining constexpr variables, often work:

// within a .cc file!
constexpr int kBufferSize = 42;
constexpr char kBufferName[] = "example";
constexpr absl::string_view kOtherBufferName = "other example";

The above is fine in a .cc file but not a header file (see caveat). Read that again and commit it to memory. I’ll explain why soon. Long story short: define variables constexpr in .cc files or declare them extern const in header files.

Within a Header File, Beware!

Unless you take care to use idioms explained above, const and constexpr objects are likely to be different objects in each translation unit.

This implies:

  1. Bugs: any code that uses the address of a constant is subject to bugs and even the dreaded “undefined behavior”.
  2. Bloat: each translation unit including your header gets its own copy of the thing. Not such a big deal for simple things like the primitive numeric types. Not so great for strings and bigger data structures.

When at namespace scope (i.e. not in a function or in a class), both const and constexpr objects implicitly have internal linkage (the same linkage used for unnamed-namespace variables and static variables not in a function or in a class). The C++ standard guarantees that every translation unit that uses or references the object gets a different “copy” or “instantiation” of the object, each at a different address.

Within a class, you must additionally declare these objects as static, or they will be unchangeable instance variables, rather than unchangeable class variables shared among every instance of the class.

Likewise, within a function, you must declare these objects as static, or they will take up space on the stack and be constructed every time the function is called.

An Example Bug

So, is this a real risk? Consider:

// Declared in do_something.h
constexpr char kSpecial[] = "special";

// Does something.  Pass kSpecial and it will do something special.
void DoSomething(const char* value);
// Defined in do_something.cc
void DoSomething(const char* value) {
  // Treat pointer equality to kSpecial as a sentinel.
  if (value == kSpecial) {
    // do something special
  } else {
    // do something boring
  }
}

Notice that this code compares the address of the first char in kSpecial to value as a kind of magic value for the function. You sometimes see code do this in an effort to short circuit a full string comparison.

This causes a subtle bug. The kSpecial array is constexpr which implies that it is static (with “internal” linkage). Even though we think of kSpecial as “a constant” – it really isn’t – it’s a whole family of constants, one per translation unit! Calls to DoSomething(kSpecial) look like they should do the same thing, but the function takes different code paths depending on where the call occurs.

Any code using a constant array defined in a header file, or code that takes the address of a constant defined in a header file, suffices for this kind of bug. This class of bug is usually seen with string constants, because they are the most common reason to define arrays in header files.

An Example of Undefined Behavior

Just tweak the above example, and move DoSomething into the header file as an inline function. Bingo: now we’ve got undefined behavior, or UB. The language requires all inline functions to be defined exactly the same way in every translation unit (source file) – this is part of the language’s “One Definition Rule.” This particular DoSomething implementation references a static variable, so every translation unit actually defines DoSomething differently, hence the undefined behavior.

Unrelated changes to program code and compilers can change inlining decisions, which can cause undefined behavior like this to change from benign behavior to bug.

Does this Cause Problems in Practice?

Yes. In one real-life bug we’ve encountered, the compiler was able to determine that in a particular translation unit (source file), a large static const array defined in a header file was only partially used. Rather than emit the entire array, it optimized away the parts it knew weren’t used. One of the ways the array was partially used was through an inline function declared in a header.

The trouble is, the array was used by other translation units in such a way that the static const array was fully used. For those translation units, the compiler generated a version of the inline function that used the full array.

Then the linker came along. The linker assumed that all instances of the inline function were the same, because the One Definition Rule said they had to be. And it discarded all but one copy of the function - and that was the copy with the partially-optimized array.

This kind of bug is possible when code uses a variable in a way that requires its address to be known. The technical term for this is “ODR used”. It is difficult to prevent ODR use of variables in modern C++ programs, particularly if those values are passed to template functions (as was the case in the above example).

These bugs do happen and are not easily caught in tests or code review. It pays to stick to safe idioms when defining constants.

Other Common Mistakes

Mistake #1: the Non-Constant Constant

Seen most often with pointers:

const char* kStr = ...;
const Thing* kFoo = ...;

The kFoo above is a pointer to const, but the pointer itself is not a constant. You can assign to it, set it null, etc.

// Corrected.
const Thing* const kFoo = ...;
// This works too.
constexpr const Thing* kFoo = ...;

Mistake #2: the Non-Constant MyString()

Consider this code:

inline absl::string_view MyString() {
  return "Hello";  // may return a different value with every call
}

The address of a string literal constant is allowed to change every time it is evaluated1, so the above is subtly wrong because it returns a string_view that might have a different .data() value for each call. While in many cases this won’t be a problem, it can lead to the bug described above.

Making the MyString() constexpr does not fix the issue, because the language standard does not say it does2. One way to look at this is that a constexpr function is just an inline function that is allowed to execute at compile time when initializing constant values. At run time it is not different from an inline function.

constexpr absl::string_view MyString() {
  return "Hello";  // may return a different value with every call
}

To avoid the bug, use a static constexpr variable in a function instead:

inline absl::string_view MyString() {
  static constexpr char kHello[] = "Hello";
  return kHello;
}

Rule of thumb: if your “constant” is an array type, store it in a function local static before returning it. This fixes its address.

Mistake #3: Non-Portable Code

For the extern const variables declared in header files the following approach to defining their values is valid according to the standard C++, and is generally preferable to C++20’s constinit (or the older ABSL_CONST_INIT), but runs afoul of a bug with at least one common compiler:

// Defined in foo.cc -- valid C++, but not supported by MSVC 19 by default.
constexpr absl::string_view kOtherBufferName = "other example";

Unfortunately, MSVC++19 incorrectly gives a C2370 error for this code unless the /Zc:externConstexpr option is used. If code needs to compile with MSVC++19 and cannot rely on /Zc:externConstexpr, as a workaround you can provide its value to other files through functions instead of as a global variable.

Mistake #4: Improperly Initialized Constants

The style guide has some detailed rules intended to keep us safe from common problems related to run-time initialization of static and global variables. The root issue arises when the initialization of global variable X references another global variable Y. How can we be sure Y itself doesn’t somehow depend on the value of X? Cyclic initialization dependencies can easily happen with global variables, especially with those we think of as constants.

This is a pretty thorny area of the language in its own right. The style guide is an authoritative reference.

Consider the above links required reading. With a focus on initialization of constants, the phases of initialization can be explained as:

  1. Zero initialization. This is what initializes otherwise uninitialized static variables to the “zero” value for the type (e.g. 0, 0.0, '\0', null, etc.).

    const int kZero;  // this will be zero-initialized to 0
    const int kLotsOfZeroes[5000];  // so will all of these
    

    Note that relying on zero initialization is fairly popular in C code but is fairly rare and niche in C++. It is generally clearer to assign variables explicit values, even if the value is zero, which brings us to…

  2. Constant initialization.

    const int kZero = 0;  // this will be constant-initialized to 0
    const int kOne = 1;   // this will be constant-initialized to 1
    

    Both “constant initialization” and “zero initialization” are called “static initialization” in the C++ language standard. Both are always safe.

  3. Dynamic initialization.

    // This will be dynamically initialized at run-time to
    // whatever ArbitraryFunction returns.
    const int kArbitrary = ArbitraryFunction();
    

    Dynamic initialization is where most problems happen. The style guide explains why at https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables.

    Note that documents like the Google C++ style guide have historically included dynamic initialization in the broad category of “static initialization.” The word “static” applies to a few different concepts in C++, which can be confusing. “Static initialization” can mean “initialization of static variables,” which can include run-time computation (dynamic initialization). The language standard uses the term “static initialization” in a different, narrower, sense: initialization that is done statically or at compile-time.

Initialization Cheat Sheet

Here is a super-quick constant initialization cheat sheet (not in header files):

  1. constexpr guarantees safe constant initialization as well as safe (trivial) destruction. Any constexpr variable is entirely fine when defined in a .cc file, but is problematic in header files for reasons explained earlier.
  2. constinit (ABSL_CONST_INIT prior to C++20) guarantees safe constant initialization. Unlike constexpr, it does not actually make the variable const, nor does it ensure the destructor is trivial, so care must still be taken when declaring static variables with it. See again https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables.
  3. Otherwise, you’re most likely best off using a static variable within a function and returning it. See the “ordinary function” example shown earlier.
  • https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
  • http://en.cppreference.com/w/cpp/language/constexpr
  • http://en.cppreference.com/w/cpp/language/inline
  • http://en.cppreference.com/w/cpp/language/storage_duration (linkage rules)
  • http://en.cppreference.com/w/cpp/language/ub (Undefined Behavior)

Conclusion

The inline variable from C++17 can’t come soon enough. Until then all we can do is use the safe idioms that steer us clear of the rough edges.

  1. We conclude that string literals are not required to evaluate to the same object from the following language in [lex.string] in the C++17 language standard. Equivalent language is also present in C++11 and C++14. 

  2. There is no language in [lex.string] describing different behavior in a constexpr context. 


Subscribe to the Abseil Blog