Tip of the Week #77: Temporaries, Moves, and Copies

Originally published as totw/77 on 2014-07-09

Updated 2017-10-20

In the ongoing attempt to figure out how to explain to non-language-lawyers how C++11 changed things, we present yet another entry in the series “When are copies made?” This is part of a general attempt to simplify the subtle rules that have surrounded copies in C++ and replace it with a simpler set of rules.

Can You Count to 2?

You can? Awesome. Remember that the “name rule” means that each unique name you can assign to a certain resource affects how many copies of the object are in circulation. (See TotW 55 on Name Counting for a refresher.)

Name Counting, in Brief

If you’re worrying about a copy being created, presumably you’re worried about some line of code in particular. So, look at that point. How many names exist for the data you think is being copied? There are only 3 cases to consider:

Two Names: It’s a Copy

This one is easy: if you’re giving a second name to the same data, it’s a copy.

std::vector<int> foo;
FillAVectorOfIntsByOutputParameterSoNobodyThinksAboutCopies(&foo);
std::vector<int> bar = foo;     // Yep, this is a copy.

std::map<int, string> my_map;
string forty_two = "42";
my_map[5] = forty_two;          // Also a copy: my_map[5] counts as a name.

One Name: It’s a Move

This one is a little surprising: C++11 recognizes that if you can’t refer to a name anymore, you also don’t care about that data anymore. The language had to be careful not to break cases where you were relying on the destructor (say, absl::MutexLock), so return is the easy case to identify.

std::vector<int> GetSomeInts() {
  std::vector<int> ret = {1, 2, 3, 4};
  return ret;
}

// Just a move: either "ret" or "foo" has the data, but never both at once.
std::vector<int> foo = GetSomeInts();

The other way to tell the compiler that you’re done with a name (the “name eraser” from TotW 55) is calling std::move().

std::vector<int> foo = GetSomeInts();
// Not a copy, move allows the compiler to treat foo as a
// temporary, so this is invoking the move constructor for
// std::vector<int>.
// Note that it isn’t the call to std::move that does the moving,
// it’s the constructor. The call to std::move just allows foo to
// be treated as a temporary (rather than as an object with a name).
std::vector<int> bar = std::move(foo);

Zero Names: It’s a Temporary

Temporaries are also special: if you want to avoid copies, avoid providing names to variables.

void OperatesOnVector(const std::vector<int>& v);

// No copies: the values in the vector returned by GetSomeInts()
// will be moved (O(1)) into the temporary constructed between these
// calls and passed by reference into OperatesOnVector().
OperatesOnVector(GetSomeInts());

Beware: Zombies

The above (other than std::move() itself) is hopefully pretty intuitive, it’s just that we all built up weird notions of copies in the years pre-dating C++11. For a language without garbage collection, this type of accounting gives us an excellent mix of performance and clarity. However, it’s not without dangers, and the big one is this: what is left in a value after it has been moved from?

T bar = std::move(foo);
CHECK(foo.empty()); // Is this valid? Maybe, but don’t count on it.

This is one of the major difficulties: what can we say about these leftover values? For most standard library types, such a value is left in a “valid but unspecified state.” Non-standard types usually hold to the same rule. The safe approach is to stay away from these objects: you are allowed to re-assign to them, or let them go out of scope, but don’t make any other assumptions about their state.

Clang-tidy provides some some static-checking to catch use-after move with the bugprone-use-after-move check. However, static-analysis won’t ever be able to catch all of these - be on the lookout. Call these out in code review, and avoid them in your own code. Stay away from the zombies.

Wait, `std::move` Doesn’t Move?

Yeah, one other thing to watch for is that a call to std::move() isn’t actually a move itself, it’s just a cast to an rvalue-reference. It’s only the use of that reference by a move constructor or move assignment that does the work.

std::vector<int> foo = GetSomeInts();
std::move(foo); // Does nothing.
// Invokes std::vector<int>’s move-constructor.
std::vector<int> bar = std::move(foo);

This should almost never happen, and you probably shouldn’t waste a lot of mental storage on it. I really only mention it if the connection between std::move() and a move constructor was confusing you.

Aaaagh! It’s All Complicated! Why!?!

First: it’s really not so bad. Since we have move operations in the majority of our value types (including protobufs), we can do away with all of the discussions of “Is this a copy? Is this efficient?” and just rely on name counting: two names, a copy. Fewer than that: no copy.

Ignoring the issue of copies, value semantics are clearer and simpler to reason about. Consider these two operations:

void Foo(std::vector<string>* paths) {
  ExpandGlob(GenerateGlob(), paths);
}

std::vector<string> Bar() {
  std::vector<string> paths;
  ExpandGlob(GenerateGlob(), &paths);
  return paths;
}

Are these the same? What about if there is existing data in *paths? How can you tell? Value semantics are easier for a reader to reason about than input/output parameters, where you need to think about (and document) what happens to existing data, and potentially whether there is an pointer ownership transfer.

Because of the simpler guarantees about lifetime and usage when dealing with values (instead of pointers), it is easier for the compiler’s optimizers to operate on code in that style. Well-managed value semantics also minimizes hits on the allocator (which is cheap but not free). Once we understand how move semantics help rid us of copies, the compiler’s optimizers can better reason about object types, lifetimes, virtual dispatch, and a host of other issues that help generate more efficient machine code.

Since most utility code is now move-aware, we should stop worrying about copies and pointer semantics, and focus on writing simple easy-to-follow code. Please make sure you understand the new rules: not all legacy interfaces you encounter may be updated to return by value (instead of by output parameter), so there will always be a mix of styles. It’s important that you understand when one is more appropriate than the other.