The absl/strings
library provides classes and utility functions for
manipulating and comparing strings, converting other types (such as integers)
into strings, or evaluating strings for other usages. Additionally, the
strings
library also contains utility functions for “string-like” classes that
store data within contiguous memory.
This document outlines highlights and general use cases for the strings
library. For more detailed information about specific classes, functions, and
fields, consult source documentation within the particular header file.
Although “strings” are often thought of as a standard type in C++, they are not
a built-in type, but instead are provided in the Standard Library through the
std::string
class. Fundamentally, a string consists of a size, and an array of
char
characters.
absl::string_view
ContainerOftentimes, you need to access string data, but you don’t need to own it, and
you don’t need to modify it. For this reason, Abseil defines an
absl::string_view
class, which points to a contiguous span of characters,
often part or all of another std::string
, double-quoted string literal,
character array, or even another string_view
. A string_view
, as its name
implies, provides a read-only view of its associated string data.
Most C++ code has historically used either the (older) C char*
pointer type or
the C++ std::string
class to hold character data. Methods that wish to consume
data of both types would typically need to provide overloaded implementations if
they wanted to avoid copying data. A string_view
also acts as a wrapper around
APIs that accept both types of character data; methods can simply declare that
they accept absl::string_view
.
// A common use of string_view: Foo() can accept both char* and std::string via
// implicit conversion to string_view.
void Foo(absl::string_view s) { ... }
string_view
objects are very lightweight, so you should always pass them by
value within your methods and functions; don’t pass a const absl::string_view
&
. (Passing absl::string_view
instead of const absl::string_view &
has the
same algorithmic complexity, but because of register allocation and parameter
passing rules, it is generally faster to pass by value in this case.)
As noted above, because the string_view
does not own the underlying data
itself, it should only be used for read-only data. If you need to provide a
string constant to an external API user, for example:
// C++17: A read-only string in a header file.
inline constexpr absl::string_view kGreeting = "hi";
// C++14: A read-only string in a header file. Due to lifetime issues, a
// string_view is usually a poor choice for a return value (see below), but it's
// safe here because the static storage outlives it.
inline absl::string_view GetGreeting() {
static constexpr char kGreeting[] = "hi";
return kGreeting;
}
A string_view
is also suitable for local variables if you know that the
lifetime of the underlying object is longer than the lifetime of your
string_view
variable. However, beware of binding it to a temporary value:
// BAD use of string_view: lifetime problem
absl::string_view sv = obj.ReturnAString();
// GOOD use of string_view: str outlives sv
std::string str = obj.ReturnAString();
absl::string_view sv = str;
Due to lifetime issues, a string_view
is usually a poor choice for a return
value and almost always a poor choice for a data member. If you do use one this
way, it is your responsibility to ensure that the string_view
does not outlive
the object it points to.
A string_view
may represent a whole string or just part of a string. For
example, when splitting a string, std::vector<absl::string_view>
is a natural
data type for the output.
NOTE: For more information about string_view
, see
abseil.io/tips/1.
NOTE: For more information about safe idioms for constants, see abseil.io/tips/140.
absl::StrSplit()
for Splitting StringsThe absl::StrSplit()
function provides an easy way to split strings into
substrings. StrSplit()
takes an input string to be split, a delimiter on which
to split the string (e.g. a comma ,
), and (optionally), a predicate to act as
a filter on whether split elements will be included in the result set.
StrSplit()
also adapts the returned collection to the type specified by the
caller.
Examples:
// Splits the given string on commas. Returns the results in a
// vector of strings. (Data is copied once.)
std::vector<std::string> v = absl::StrSplit("a,b,c", ','); // Can also use ","
// v[0] == "a", v[1] == "b", v[2] == "c"
// Splits the string as in the previous example, except that the results
// are returned as `absl::string_view` objects, avoiding copies. Note that
// because we are storing the results within `absl::string_view` objects, we
// have to ensure that the input string outlives any results.
std::vector<absl::string_view> v = absl::StrSplit("a,b,c", ',');
// v[0] == "a", v[1] == "b", v[2] == "c"
StrSplit()
splits strings using a passed Delimiter object. (See
Delimiters below.) However, in many cases, you can simply pass a
string literal as the delimiter (which will be implicitly converted to an
absl::ByString
delimiter).
Examples:
// By default, empty strings are *included* in the output. See the
// `absl::SkipEmpty()` predicate below to omit them{#stringSplitting}.
std::vector<std::string> v = absl::StrSplit("a,b,,c", ',');
// v[0] == "a", v[1] == "b", v[2] == "", v[3] = "c"
// You can also split an empty string
v = absl::StrSplit("", ',');
// v[0] = ""
// The delimiter need not be a single character
std::vector<std::string> v = absl::StrSplit("aCOMMAbCOMMAc", "COMMA");
// v[0] == "a", v[1] == "b", v[2] == "c"
// You can also use the empty string as the delimiter, which will split
// a string into its constituent characters.
std::vector<std::string> v = absl::StrSplit("abcd", "");
// v[0] == "a", v[1] == "b", v[2] == "c", v[3] = "d"
One of the more useful features of the StrSplit()
API is its ability to adapt
its result set to the desired return type. StrSplit()
returned collections may
contain std::string
, absl::string_view
, or any object that can be explicitly
created from an absl::string_view
. This pattern works for all standard STL
containers including std::vector
, std::list
, std::deque
, std::set
,
std::multiset
, std::map
, and std::multimap
, and even std::pair
, which is
not actually a container.
Examples:
// Stores results in a std::set<std::string>, which also performs de-duplication
// and orders the elements in ascending order.
std::set<std::string> s = absl::StrSplit("b,a,c,a,b", ',');
// s[0] == "a", s[1] == "b", s[2] == "c"
// Stores results in a map. The map implementation assumes that the input
// is provided as a series of key/value pairs. For example, the 0th element
// resulting from the split will be stored as a key to the 1st element. If
// an odd number of elements are resolved, the last element is paired with
// a default-constructed value (e.g., empty string).
std::map<std::string, std::string> m = absl::StrSplit("a,b,c", ',');
// m["a"] == "b", m["c"] == "" // last component value equals ""
// Stores first two split strings as the members in a std::pair. Any split
// strings beyond the first two are omitted because std::pair can hold only two
// elements.
std::pair<std::string, std::string> p = absl::StrSplit("a,b,c", ',');
// p.first = "a", p.second = "b" ; "c" is omitted
The StrSplit()
API provides a number of “Delimiters” for providing special
delimiter behavior. A Delimiter implementation contains a Find()
function that
knows how to find the first occurrence of itself in a given absl::string_view
.
Models of the Delimiter concept represent specific kinds of delimiters, such as
single characters, substrings, or even regular expressions.
The following Delimiter abstractions are provided as part of the StrSplit()
API:
absl::ByString()
(default for std::string
arguments)absl::ByChar()
(default for a char
argument)absl::ByAnyChar()
(for mixing delimiters)absl::ByLength()
(for applying a delimiter a set number of times)absl::MaxSplits()
(for splitting a specific number of times)Examples:
// Because a `string` literal is converted to an `absl::ByString`, the following
// two splits are equivalent.
std::vector<std::string> v = absl::StrSplit("a,b,c", ",");
std::vector<std::string> v = absl::StrSplit("a,b,c", absl::ByString(","));
// v[0] == "a", v[1] == "b", v[2] == "c"
// Because a `char` literal is converted to an `absl::ByChar`, the following two
// splits are equivalent.
std::vector<std::string> v = absl::StrSplit("a,b,c", ',');
// v[0] == "a", v[1] == "b", v[2] == "c"
std::vector<std::string> v = absl::StrSplit("a,b,c", absl::ByChar(','));
// v[0] == "a", v[1] == "b", v[2] == "c"
// Splits on any of the given characters ("," or ";")
vector<std::string> v = absl::StrSplit("a,b;c", absl::ByAnyChar(",;"));
// v[0] == "a", v[1] == "b", v[2] == "c"
// Uses the `absl::MaxSplits` delimiter to limit the number of matches a
// delimiter can have. In this case, the delimiter of a literal comma is limited
// to matching at most one time. The last element in the returned collection
// will contain all unsplit pieces, which may contain instances of the
// delimiter.
std::vector<std::string> v = absl::StrSplit("a,b,c", absl::MaxSplits(',', 1));
// v[0] == "a", v[1] == "b,c"
// Splits into equal-length substrings.
std::vector<std::string> v = absl::StrSplit("12345", absl::ByLength(2));
// v[0] == "12", v[1] == "34", v[2] == "5"
Predicates can filter the results of a StrSplit()
operation by determining
whether or not a resultant element is included in the result set. A filtering
predicate may be passed as an optional third argument to the StrSplit()
function.
The predicates must be unary functions (or function objects such as
lambdas) that take a single
absl::string_view
argument and return a bool indicating whether the argument
should be included (true
) or excluded (false
).
One example where using predicates is useful: filtering out empty substrings. By
default, empty substrings may be returned by StrSplit()
as separate elements
in the result set, which is similar to the way split functions work in other
programming languages.
// Empty strings *are* included in the returned collection.
std::vector<std::string> v = absl::StrSplit(",a,,b,", ',');
// v[0] == "", v[1] == "a", v[2] == "", v[3] = "b", v[4] = ""
These empty strings can be filtered out of the result set by simply passing the
provided SkipEmpty()
predicate as a third argument to the StrSplit()
function. SkipEmpty()
does not consider a string containing all whitespace to
be empty. For that behavior use the SkipWhitespace()
predicate.
Examples:
// Uses absl::SkipEmpty() to omit empty strings. Strings containing whitespace
// are not empty and are therefore not skipped.
std::vector<std::string> v = absl::StrSplit(",a, ,b,", ',', absl::SkipEmpty());
// v[0] == "a", v[1] == " ", v[2] == "b"
// Uses absl::SkipWhitespace() to skip all strings that are either empty or
// contain only whitespace.
std::vector<std::string> v = absl::StrSplit(",a, ,b,", ',',
absl::SkipWhitespace());
// v[0] == "a", v[1] == "b"
// Passes a lambda as the predicate to keep only the lines that don't start
// with a `#`.
std::vector<std::string> non_comment_lines = absl::StrSplit(
file_content, '\n',
[](absl::string_view line) { return !absl::StartsWith(line, "#"); });
absl::StrCat()
and absl::StrAppend()
for String ConcatenationMost documentation on the usage of C++ strings mention that unlike other languages, strings in C++ are mutable; however, modifying a string can be expensive, as strings often contain a large amount of data, and many patterns involve the creation of temporary copies, which may involve significant overhead. Always look for ways to reduce creation of such temporaries.
For example, the following code is inefficient:
// Inefficient code
std::string s1 = "A string";
s1 = s1 + " another string";
The assignment operator above creates a temporary string, copies s1
into that
temporary string, concatenates that temporary string, and then assigns it back
to s1
. Instead use the optimized +=
operator for such concatenation:
// Efficient code
s1 += " another string";
Good compilers may be able to optimize the preceding inefficient code. However, operations that involve more than one concatenation cannot normally avoid temporaries:
// Inefficient code
std::string s1 = "A string";
std::string another = " and another string";
s1 += " and some other string" + another;
For that reason, Abseil provides the absl::StrCat()
and absl::StrAppend()
functions for efficiently concatenating and appending strings. absl::StrCat()
and absl::StrAppend()
are often more efficient than operators such as +=
,
since they don’t require the creation of temporary std::string
objects, and
their memory is preallocated during string construction.
// Inefficient code
std::string s1 = "A string";
std::string another = " and another string";
s1 += " and some other string" + another;
// Efficient code
std::string s1 = "A string";
std::string another = " and another string";
absl::StrAppend(&s1, " and some other string", another);
For this reason, you should get in the habit of preferring absl::StrCat()
or
absl::StrAppend()
over using the concatenation operators.
absl::StrCat()
absl::StrCat()
merges an arbitrary number of strings or numbers into one
string, and is designed to be the fastest possible way to construct a string out
of a mix of raw C strings, absl::string_view
elements, std::string
value,
and boolean and numeric values. StrCat()
is generally more efficient on string
concatenations involving more than one binary operator, such as a + b + c
or
a += b + c
, since they avoid the creation of temporary string objects during
string construction.
// absl::StrCat() can merge an arbitrary number of strings
std::string s1;
s1 = absl::StrCat("A string ", " another string", "yet another string");
// StrCat() also can mix types, including std::string, string_view, literals,
// and more.
std::string s1;
std::string s2 = "Foo";
absl::string_view sv1 = MyFunction();
s1 = absl::StrCat(s2, sv1, "a literal");
StrCat()
provides automatic formatting for the following types:
std::string
absl::string_view
absl::Hex()
conversion functionFloating point values are converted to a string using the same format used by STL’s std::basic_ostream::operator«, namely 6 digits of precision, using “e” format when the magnitude is less than 0.001 or greater than or equal to 1e+6.
You can convert to hexadecimal output rather than decimal output using the
absl::Hex
type. To do so, pass Hex(my_int)
as a parameter to StrCat()
or
StrAppend()
. You may specify a minimum hex field width using an
absl::PadSpec
enum, so the equivalent of StringPrintf("%04x", my_int)
is
absl::StrCat(absl::Hex(my_int, absl::kZeroPad4))
.
absl::StrAppend()
For clarity and performance, don’t use absl::StrCat()
when appending to a
string. Use absl::StrAppend()
instead. In particular, avoid using any of these
(anti-)patterns:
str.append(absl::StrCat(...))
str += absl::StrCat(...)
str = absl::StrCat(str, ...)
absl::StrJoin()
for Joining Elements within a StringAlthough similar to absl::StrCat()
in some similar use cases,
absl::StrJoin()
provides a more robust utility for joining a range of
elements, defining separator strings, and formatting the result as a string.
Ranges are specified by passing a container with std::begin()
and std::end()
iterators, container-specific begin()
and end()
iterators, a
brace-initialized std::initializer_list
, or a std::tuple
of heterogeneous
objects. The separator string is specified as an absl::string_view
.
Because the default formatter uses the absl::AlphaNum
class,
absl::StrJoin()
, like absl::StrCat()
, will work out-of-the-box on
collections of strings, ints, floats, doubles, etc.
std::vector<std::string> v = {"foo", "bar", "baz"};
std::string s = absl::StrJoin(v, "-");
// Produces the string "foo-bar-baz"
// Joins the values in the given `std::initializer_list<>` specified using
// brace initialization. This pattern also works with an initializer_list
// of ints or `absl::string_view` -- any `AlphaNum`-compatible type.
std::string s = absl::StrJoin({"foo", "bar", "baz"}, "-");
// Produces the string "foo-bar-baz"
// Joins a collection of ints. This pattern also works with floats,
// doubles, int64s -- any `absl::StrCat()`-compatible type.
std::vector<int> v = {1, 2, 3, -4};
std::string s = absl::StrJoin(v, "-");
// Produces the string "1-2-3--4"
// Joins a collection of pointer-to-int. By default, pointers are
// dereferenced and the pointee is formatted using the default format for
// that type; such dereferencing occurs for all levels of indirection, so
// this pattern works just as well for `std::vector<int**>` as for
// `std::vector<int*>`.
int x = 1, y = 2, z = 3;
std::vector<int*> v = {&x, &y, &z};
std::string s = absl::StrJoin(v, "-");
// Produces the string "1-2-3"
// Dereferencing of `std::unique_ptr<>` is also supported:
std::vector<std::unique_ptr<int>> v
v.push_back(absl::make_unique<int>(1));
v.push_back(absl::make_unique<int>(2));
v.push_back(absl::make_unique<int>(3));
std::string s = absl::StrJoin(v, "-");
// Produces the string "1-2-3"
// Joins a `std::map`, with each key-value pair separated by an equals
// sign. This pattern would also work with, say, a
// `std::vector<std::pair<>>`.
std::map<std::string, int> m = {{"a", 1}, {"b", 2}, {"c", 3}};
std::string s = absl::StrJoin(m, ",", absl::PairFormatter("="));
// Produces the string "a=1,b=2,c=3"
absl::StrJoin()
uses “Formatters” to format the elements to be joined (and
defaults to an AlphaNumFormatter()
if no formatter is specified. A Formatter
is a function object that is responsible for formatting its argument as a string
and appending it to a given output string. Formatters may be implemented as
function objects, lambdas, or normal functions. You may provide your own
Formatter to enable absl::StrJoin()
to work with arbitrary types.
The following is an example of a custom Formatter that simply uses
std::to_string()
to format an integer as a string:
struct MyFormatter {
void operator()(std::string* out, int i) const {
out->append(std::to_string(i));
}
};
You would use the above formatter by passing an instance of it as the final
argument to absl::StrJoin()
:
std::vector<int> v = {1, 2, 3, 4};
std::string s = absl::StrJoin(v, "-", MyFormatter());
// Produces the string "1-2-3-4"
The following standard formatters are provided within the StrJoin()
API:
AlphaNumFormatter()
(the default)StreamFormatter()
formats its arguments using the « operator.PairFormatter()
formats a std::pair
by putting a given separator between
the pair’s .first
and .second
members.DereferenceFormatter()
formats its argument by dereferencing it and then
applying the given formatter. This formatter is useful for formatting a
container of pointer-to-T. This pattern often shows up when joining repeated
fields in protocol buffers.absl::Substitute()
for String SubstitutionFormatting strings for display to users typically has different needs.
Traditionally, most C++ code used built-in functions such as sprintf()
and
snprintf()
; these functions have some problems in that they don’t support
absl::string_view
and the memory of the formatted buffer must be managed.
// Bad. Need to worry about buffer size and NUL-terminations.
std::string GetErrorMessage(char *op, char *user, int id) {
char buffer[50];
sprintf(buffer, "Error in %s for user %s (id %i)", op, user, id);
return buffer;
}
// Better. Using absl::StrCat() avoids the pitfalls of sprintf() and is faster.
std::string GetErrorMessage(absl::string_view op, absl::string_view user, int id) {
return absl::StrCat("Error in ", op, " for user ", user, " (", id, ")");
}
// Best. Using absl::Substitute() is easier to read and to understand.
std::string GetErrorMessage(absl::string_view op, absl::string_view user, int id) {
return absl::Substitute("Error in $0 for user $1 ($2)", op, user, id);
}
absl::Substitute()
combines the efficiency and type-safe nature of
absl::StrCat()
with the argument-binding of conventional functions like
sprintf()
. absl::Substitute
uses a format string that contains positional
identifiers indicated by a dollar sign ($) and single digit positional ids to
indicate which substitution arguments to use at that location within the format
string.
std::string s = Substitute("$1 purchased $0 $2. Thanks $1!", 5, "Bob", "Apples");
// Produces the string "Bob purchased 5 Apples. Thanks Bob!"
std::string s = "Hi. ";
SubstituteAndAppend(&s, "My name is $0 and I am $1 years old.", "Bob", 5);
// Produces the string "Hi. My name is Bob and I am 5 years old."
Note however, that absl::Substitute()
, because it requires parsing a format
string at run-time, is slower than absl::StrCat()
. Choose Substitute()
over
StrCat()
only when code clarity is more important than speed.
StringPrintf()
absl::Substitute
differs from StringPrintf()
in the following ways:
absl::Substitute()
is significantly faster than StringPrintf()
. For very
large strings, it may be orders of magnitude faster.absl::Substitute()
understands the following types:
absl::string_view
, std::string
, const char*
(null is equivalent to “”)int32_t
, int64_t
, uint32_t
, uint64_t
float
, double
bool
(Printed as “true” or “false”)0x<lower case hex string>
,
except that null is printed as “NULL”)absl::StrContains()
for String MatchingThe Abseil strings library also contains simple utilities for performing string
matching checks. All of their function parameters are specified as
absl::string_view
, meaning that these functions can accept std::string
,
absl::string_view
or NUL-terminated C-style strings.
// Assume "msg" is a line from a logs entry
if (absl::StrContains(msg, "ERROR")) {
*has_error = true;
}
if (absl::StrContains(msg, "WARNING")) {
*has_warning = true;
}
Note: The order of parameters in these functions is designed to mimic the order
an equivalent member function would exhibit; e.g. s.Contains(x)
==>
absl::StrContains(s, x)
.
Specialty functions for converting strings to numeric types within the
absl/strings
library are defined within
numbers.h. The following
functions are of particular use:
absl::SimpleAtoi()
converts a string into integral types.absl::SimpleAtof()
converts a string into a float.absl::SimpleAtod()
converts a string into a double.absl::SimpleAtob()
converts a string into a boolean.For conversion of numeric types into strings, use absl::StrCat()
and
absl::StrAppend()
. You can use StrCat/StrAppend
to convert int32
,
uint32
, int64
, uint64
, float
, and double
types into strings:
std::string foo = StrCat("The total is ", cost + tax + shipping);
To extend formatting to your type using AbslStringify()
, provide an
AbslStringify()
overload as a friend
function template definition. If the
function cannot be provided in the type itself, it must be defined in the same
namespace for the purposes of ADL look-up. The strings
library will check for
such an overload when formatting user-defined types.
An AbslStringify()
overload should have the following signature:
template <typename Sink>
void AbslStringify(Sink& sink, const UserDefinedType& value);
Note: AbslStringify()
utilizes a generic “sink” buffer to construct its
string. For more information about supported operations on AbslStringify()
’s
sink, see https://abseil.io/docs/cpp/guides/abslstringify.
An example usage within a user-defined type is shown below:
struct Point {
...
// Strings library support is added to the Point class through an
// AbslStringify() friend declaration.
template <typename Sink>
friend void AbslStringify(Sink& sink, const Point& p) {
absl::Format(&sink, "(%d, %d)", p.x, p.y);
}
int x;
int y;
}
Given this definition, Point
formatting will be supported by a variety of
strings
functions.
absl::StrCat("The point is ", p);
absl::StrAppend(&str, p);
absl::Substitute("The point is $0", p);
absl::StrJoin(vector_of_points, ",");
Additionally, AbslStringify()
itself can use %v
within its own format
strings to perform this type deduction. Our Point
above could be formatted as
"(%v, %v)"
for example, and deduce the int
values as %d
.