Tip of the Week #148: Overload Sets

Originally posted as TotW #148 on May 3, 2018

Updated 2020-04-06

One of the effects of living with electric information is that we live habitually in a state of information overload. There’s always more than you can cope with. –Marshall McLuhan

In my opinion, one of the most powerful and insightful sentences in the C++ style guide is this: “Use overloaded functions (including constructors) only if a reader looking at a call site can get a good idea of what is happening without having to first figure out exactly which overload is being called.”

On the surface, this is a pretty straightforward rule: overload only when it won’t cause confusion to a reader. However, the ramifications of this are actually fairly significant and touch on some interesting issues in modern API design. First let’s define the term “overload set”, and then let’s look at some examples.

What is an Overload Set?

Informally, an overload set is a set of functions with the same name that differ in the number, type and/or qualifiers of their parameters. (See Overload Resolution for all the gory details.) You may not overload on the return type from a function - the compiler must be able to tell which member of the overload set to call based on the invocation of the function, regardless of return type.

int Add(int a, int b);
int Add(int a, int b, int c);  // Number of parameters may vary

// Return type may vary, as long as the selected overload is uniquely
// identifiable from only its parameters (types and counts).
float Add(float a, float b);

// But if two return types have the same parameter signature, they can't form a
// proper overload set; the following won't compile with the above overloads.
int Add(float a, float b);    // BAD - can't overload on return type

String-ish Parameters

Thinking back on my earliest experiences with C++ at Google, I’m almost positive that the first overloads I encountered were of the form:

void Process(const std::string& s) { Process(s.c_str()); }
void Process(const char*);

The wonderful thing about overloads of this form is that they meet the letter and the spirit of the rule, in a very obvious fashion. There is no behavioral difference here: in both cases, we’re accepting some form of string-ish data, and the inline forwarding function makes it perfectly clear that the behavior of every member of the overload set is identical.

That turns out to be critical, and easy to overlook, since the Google C++ style guide doesn’t phrase it explicitly: if the documented behavior of the members of an overload set varies, then a user implicitly has to know which function is actually being called. The only way to ensure that they have a “good idea what is happening” without figuring which overload is being called is if the semantics of each entry in the overload set are identical.

So, the string-ish examples above work because they have identical semantics. Borrowing an example from the C++ Core Guidelines , we would not want to see something like:

// remove obstacle from garage exit lane
void open(Gate& g);

// open file
void open(const char* name, const char* mode ="r");

Hopefully, namespace differences suffice to disambiguate functions like these from actually forming an overload set. Fundamentally, this would be a bad design specifically because APIs should be understood and documented at the level of the overload set, not the individual functions that make it up.

StrCat

StrCat() is one of the most common Abseil examples for demonstrating that overload sets are often the right granularity for API design. Over the years, StrCat() has changed the number of parameters it accepts, and the form that it uses to express that parameter count. Years ago, StrCat() was a set of functions with varying arity. Now, it is conceptually a variadic template function … although for optimization reasons small-count arities are still provided as overloads. It has never actually been a single function - we just treat it conceptually as one entity.

This is a good use of overload sets - it would be annoying and redundant to encode the parameter count in the function name, and conceptually it doesn’t matter how many things are passed to StrCat() - it will always be the “convert to string and concatenate” tool.

Parameter Sinks

Another technique that the standard uses and that comes up in a lot of generic code is to overload on const T& and T&& when passing a value that will be stored: a value sink. Consider std::vector::push_back():

void push_back(const T&);
void push_back(T&&);

It’s worth considering the origin of this overload set: when the push_back() API first appeared, it contained push_back(const T&) which served as a cheap (and safe) parameter to pass. With C++11 the push_back(T&&) overload was added as an optimization for cases where the value is a temporary, or the caller has promised not to interfere with the parameter by writing out std::move(). Even though the moved-from object may be left in a different state, these still provide the same semantics for the user of the vector, so we consider them a well-designed overload set.

Put another way, the & and && qualifiers denote whether that overload is available for lvalue or rvalue expressions; if you have a var or var& argument, you will get the & overload, but if you have a temporary or have performed a std::move() on your expression, you will get the && overload. (See Tip #77 for more on move-semantics.)

Interestingly, these overloads are semantically the same as a single method – push_back(T) – but in some cases may be slightly more efficient. Such efficiency mostly matters when the body of the function is cheap compared to the cost of invoking the move constructor for T – possible for containers and generics, but unlikely in many other contexts. We generally recommend that if you need to sink a value (store in an object, mutate a parameter, etc) you just provide the single function accepting T (or const T&) for simplicity and maintainability. Only if you are writing very high-performance generic code is the difference likely to be relevant. See Tip #77 and Tip #117.

Overloaded Accessors

For methods on a class (especially a container or a wrapper), it is sometimes valuable to provide an overload set for accessors. Standard library types provide many great examples here - we’ll consider just vector::operator[] and optional::value().

In the case of vector::operator[], two overloads exist: one const and one non-const, which accordingly return a const or non-const reference, respectively. This matches our definition nicely; a user doesn’t need to know which thing is invoked. The semantics are the same, differing only in constness – if you have a non-const vector you get a non-const reference, and if you have a const vector you get a const reference. Put another way: the overload set is purely forwarding the const-ness of the vector, but the API is consistent – it just gives you the indicated element.

In C++17 we added std::optional<T>, a wrapper for at most one value of an underlying type. Just like vector::operator[], when accessing optional::value() both const and non-const overloads exist. However, optional goes one step further and provides value() overloads based on “value category” (roughly speaking, whether the object is a temporary). So the full pairwise combination of const and value category looks like:

T& value() &;
const T & value() const &;
T&& value() &&;
const T&& value() const &&;

The trailing & and && apply to the implicit *this parameter, just in the same way const qualifying a method does, and indicate acceptance of lvalue or rvalue arguments as noted in Parameter Sinks above. Importantly, however, you don’t actually need to understand move semantics to understand optional::value() in this case. If you ask for the value out of a temporary, you get the value as if it were a temporary itself. If you ask for the value out of a const ref, you get a const-ref of the value. And so on.

Copy vs. Move

The most important overload sets for types are often their set of constructors, especially the copy and move constructors. Copy and move, done right, form an overload set in all senses of the term: the reader should not need to know which of those overloads is chosen, because the semantics of the newly-constructed object should be the same in either case (assuming both constructors exist). The standard library is becoming more explicit about this: move is assumed to be an optimization of copy, and you should not depend on the particulars of how many moves or copies are made in any given operation.

Conclusion

Overload sets are a simple idea conceptually, but prone to abuse when not well understood - don’t produce overloads where anyone might need to know which function was chosen. But when used well, overload sets provide a powerful conceptual framework for API design. Understanding the subtleties of the style guide’s description of overload sets is well worth your time when thinking about API design.