string_view
operator+
vs. StrCat()
absl::Status
std::bind
absl::optional
and std::unique_ptr
absl::StrFormat()
make_unique
and private
Constructors.bool
explicit
= delete
)switch
Statements Responsibly= delete
AbslHashValue
and Youcontains()
std::optional
parametersif
and switch
statements with initializersinline
Variablesstd::unique_ptr
Must Be MovedAbslStringify()
vector.at()
auto
for Variable DeclarationsOriginally published as totw/77 on 2014-07-09
By Titus Winters ([email protected])
Updated 2017-10-20
Quicklink: abseil.io/tips/77
In the ongoing attempt to figure out how to explain to non-language-lawyers how C++11 changed things, we present yet another entry in the series “When are copies made?” This is part of a general attempt to simplify the subtle rules that have surrounded copies in C++ and replace it with a simpler set of rules.
You can? Awesome. Remember that the “name rule” means that each unique name you can assign to a certain resource affects how many copies of the object are in circulation. (See TotW 55 on Name Counting for a refresher.)
If you’re worrying about a copy being created, presumably you’re worried about some line of code in particular. So, look at that point. How many names exist for the data you think is being copied? There are only 3 cases to consider:
This one is easy: if you’re giving a second name to the same data, it’s a copy.
std::vector<int> foo;
FillAVectorOfIntsByOutputParameterSoNobodyThinksAboutCopies(&foo);
std::vector<int> bar = foo; // Yep, this is a copy.
std::map<int, string> my_map;
string forty_two = "42";
my_map[5] = forty_two; // Also a copy: my_map[5] counts as a name.
This one is a little surprising: C++11 recognizes that if you can’t refer to a
name anymore, you also don’t care about that data anymore. The language had to
be careful not to break cases where you were relying on the destructor (say,
absl::MutexLock
), so return
is the easy case to identify.
std::vector<int> GetSomeInts() {
std::vector<int> ret = {1, 2, 3, 4};
return ret;
}
// Just a move: either "ret" or "foo" has the data, but never both at once.
std::vector<int> foo = GetSomeInts();
The other way to tell the compiler that you’re done with a name (the “name
eraser” from TotW 55) is calling std::move()
.
std::vector<int> foo = GetSomeInts();
// Not a copy, move allows the compiler to treat foo as a
// temporary, so this is invoking the move constructor for
// std::vector<int>.
// Note that it isn’t the call to std::move that does the moving,
// it’s the constructor. The call to std::move just allows foo to
// be treated as a temporary (rather than as an object with a name).
std::vector<int> bar = std::move(foo);
Temporaries are also special: if you want to avoid copies, avoid providing names to variables.
void OperatesOnVector(const std::vector<int>& v);
// No copies: the values in the vector returned by GetSomeInts()
// will be moved (O(1)) into the temporary constructed between these
// calls and passed by reference into OperatesOnVector().
OperatesOnVector(GetSomeInts());
The above (other than std::move()
itself) is hopefully pretty intuitive, it’s
just that we all built up weird notions of copies in the years pre-dating C++11.
For a language without garbage collection, this type of accounting gives us an
excellent mix of performance and clarity. However, it’s not without dangers, and
the big one is this: what is left in a value after it has been moved from?
T bar = std::move(foo);
CHECK(foo.empty()); // Is this valid? Maybe, but don’t count on it.
This is one of the major difficulties: what can we say about these leftover values? For most standard library types, such a value is left in a “valid but unspecified state.” Non-standard types usually hold to the same rule. The safe approach is to stay away from these objects: you are allowed to re-assign to them, or let them go out of scope, but don’t make any other assumptions about their state.
Clang-tidy provides some some static-checking to catch use-after move with the misc-use-after-move check. However, static-analysis won’t ever be able to catch all of these - be on the lookout. Call these out in code review, and avoid them in your own code. Stay away from the zombies.
std::move
Doesn’t Move?Yeah, one other thing to watch for is that a call to std::move()
isn’t
actually a move itself, it’s just a cast to an rvalue-reference. It’s only the
use of that reference by a move constructor or move assignment that does the
work.
std::vector<int> foo = GetSomeInts();
std::move(foo); // Does nothing.
// Invokes std::vector<int>’s move-constructor.
std::vector<int> bar = std::move(foo);
This should almost never happen, and you probably shouldn’t waste a lot of
mental storage on it. I really only mention it if the connection between
std::move()
and a move constructor was confusing you.
First: it’s really not so bad. Since we have move operations in the majority of our value types (including protobufs), we can do away with all of the discussions of “Is this a copy? Is this efficient?” and just rely on name counting: two names, a copy. Fewer than that: no copy.
Ignoring the issue of copies, value semantics are clearer and simpler to reason about. Consider these two operations:
void Foo(std::vector<string>* paths) {
ExpandGlob(GenerateGlob(), paths);
}
std::vector<string> Bar() {
std::vector<string> paths;
ExpandGlob(GenerateGlob(), &paths);
return paths;
}
Are these the same? What about if there is existing data in *paths
? How can
you tell? Value semantics are easier for a reader to reason about than
input/output parameters, where you need to think about (and document) what
happens to existing data, and potentially whether there is an pointer ownership
transfer.
Because of the simpler guarantees about lifetime and usage when dealing with values (instead of pointers), it is easier for the compiler’s optimizers to operate on code in that style. Well-managed value semantics also minimizes hits on the allocator (which is cheap but not free). Once we understand how move semantics help rid us of copies, the compiler’s optimizers can better reason about object types, lifetimes, virtual dispatch, and a host of other issues that help generate more efficient machine code.
Since most utility code is now move-aware, we should stop worrying about copies and pointer semantics, and focus on writing simple easy-to-follow code. Please make sure you understand the new rules: not all legacy interfaces you encounter may be updated to return by value (instead of by output parameter), so there will always be a mix of styles. It’s important that you understand when one is more appropriate than the other.