Tip of the Week #64: Raw String Literals

Originally published as totw/64 on 2013-12-09

By Titus Winters ([email protected])

Updated 2017-10-23

Quicklink: abseil.io/tips/64

"(?:\"(?:\\\\\"|[^\"])*\"|'(?:\\\\'|[^'])*')"; — A cat walking over the keyboard . . . or maybe what the fox says . . . no, actually just a highly- escaped regexp found in real C++ code.

Odds are you’ve had trouble getting your regular expression understood properly in C++ due to escaping issues. Similarly, you’ve probably been annoyed with preserving quotes and newlines when embedding a text version of Protobuf or JSON data into your unittests. When you have to use significant escaping (or worse, multi-layer escaping), code clarity drops precipitously.

Luckily, there’s a new C++11 feature that removes this need for escaping: raw string literals.

The Raw String Literal Format

A raw string literal has the following special syntax:

R"tag(whatever you want to say)tag"

tag is a sequence of up to 16 characters (and an empty tag is both OK and common). The characters after ‘“tag(‘ and before the first following occurrence of ‘)tag”’ are used literally as the contents of the string literal. ‘tag’ may contain any character but parentheses, backslash, and whitespace.

Examine the difference:

const char concert_17_raw[] =
    "id: 17\n"
    "artist: \"Beyonce\"\n"
    "date: \"Wed Oct 10 12:39:54 EDT 2012\"\n"
    "price_usd: 200\n";

versus:

const char concert_17_raw[] = R"(
    id: 17
    artist: "Beyonce"
    date: "Wed Oct 10 12:39:54 EDT 2012"
    price_usd: 200)";

Special Cases

Note that indentation rules, combined with the fact that raw string literals may contain newlines, leave you with an awkward choice on how to indent the first line of your raw string block. Because text protobufs ignore whitespace, this problem can be avoided by throwing in a leading newline (ignored by the parser) in that case, but not all uses of raw strings are so forgiving.

A non-empty tag is useful when the sequence )" happens to appear in your string and therefore can’t act as the closing delimiter:

std::string my_string = R"foo(This contains quoted parens "()")foo";

Conclusion

Raw string literals are certainly not an everyday tool for most of us, but there are times when good use of this new language feature will increase readability. So the next time you’re scratching your head trying to figure out if you need two \\s or four, try raw string literals instead. Your readers will thank you for it, even if regular expressions are still hard:

R"regexp((?:"(?:\\"|[^"])*"|'(?:\\'|[^'])*'))regexp";

Subscribe to the Abseil Blog