Tip of the Week #227: Be Careful with Empty Containers and Unsigned Arithmetic

Originally posted as TotW #227 on November 16, 2023

Updated 2024-03-11

Index-Based Loops (Still Have Their Uses)

Modern C++ code uses index-based for loops much less often now that range-based for loops are available, but there are still times when we need to use an index while we iterate. Parallel iteration over multiple containers is one such case; another is when we want to process multiple adjacent elements from a single container. In this tip we’ll look at a pitfall for the second of these.

Plausible Code

Let’s start by looking at some code that might be correct:

for (int64_t i = 0; i < v.size() - 1; ++i) {
  ProcessPair(v[i], v[i+1]);
}

This code wisely takes some care to check for valid indexes before calling ProcessPair(), so why do we say that it only “might” be correct? A careful unit test (or almost any fuzz test) will cover the case where v is empty. If the code surrounding our for loop ensures that the for loop is never reached in that case, all is well. But if we execute our loop with an empty v, C++ makes trouble for us.

Unsigned Types to the Unrescue

Recall that our style guide warns against use of unsigned types in C++. The style guide also says

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.

(emphasis added)

Looking carefully, we can see that our example falls afoul of the exact problem discussed in the style guide. When checking whether we have valid v[i] and v[i+1] elements, we are seemingly correct in checking whether i is less than v.size() - 1 given that both elements need to be valid. However, for an empty container v.size() is zero (so far, so good!), but because the type of v.size() is unsigned, when we subtract one from that zero, we don’t get the value -1, but instead we get the maximum value of the given type. Then the check for whether i is less than v.size() - 1 evaluates as true for any small value of i, and so the code will use out-of-bounds indexes for v - yielding undefined behavior.

How Should We Fix This?

Interestingly, if we make the code express its intent a little more directly, our problem goes away.

What do we mean by “express its intent a little more directly”? What is the intent here? The purpose of the loop condition here is to ensure that the indexes i and i + 1 used to index into v are valid.

Given that indexes in C++ are zero-based, we test whether a (non-negative) index i into a container is valid by checking i < v.size(). It would be redundant to check validity of both indexes (though we could do so if we wished): if i + 1 is valid then we know that i is (because i is never negative here). “i + 1 is valid” translates directly into C++ as i + 1 < v.size(). Our original code i < v.size() - 1 does not have such a direct translation as a statement about the validity of an index.

The rewritten code i + 1 < v.size() looks almost the same as i < v.size() - 1, but it is crucially different in that we never subtract, so we avoid the danger of wrapping around to a huge positive value. Did we swap this for a risk of overflowing when we calculate i + 1? Only if i is already the largest value of its type (int64_t) – so we are safe. This difference is sometimes characterized by noting that the common, useful values of int64_t are a long way away from overflowing, whereas with unsigned types such as uint64_t, the very common value 0 is the smallest value of the type, so it’s much easier to unintentionally wrap around.

Fixed Code, Fuzz Free

With this one small change, our now-robust code looks like this:

for (int64_t i = 0; i + 1 < v.size(); ++i) {
  ProcessPair(v[i], v[i+1]);
}

The indexes into v are clearly safe, without a need to look further afield to know whether v might be empty.

Now we can let our fuzzer loose on the fixed code, and feel warm fuzzy feelings that our for loop is bug-free and reviewer-friendly.

Note: This is just one (robust) way to write this loop; there are many others.

Summary

While our fix doesn’t change many bytes of source code, it touches on a number of ideas:

As the style guide says, be wary of arithmetic on unsigned types in C++.
Remember that container.size() yields an unsigned type.
Prefer code where correctness can be verified as locally as possible.
Try to make code correspond as directly as possible to the underlying intent.