The Abseil container
library includes b-tree containers that generally conform
to the STL sorted container (e.g. std::map
) APIs, but are more efficient in
most cases. These containers are:
absl::btree_set
(meant to replace usage of std::set
)absl::btree_map
(meant to replace usage of std::map
)absl::btree_multiset
(meant to replace usage of std::multiset
)absl::btree_multimap
(meant to replace usage of std::multimap
)B-trees have a different implementation from STL std::map
containers, which
require binary trees commonly implemented as red-black trees.
While a red-black tree is limited to single-element nodes, with precisely two
children, a B-tree may contain multiple values per node (M), with each node
having (M+1) children. Having more values and children per node is more cache
friendly because nodes are generally allocated separately so accessing
additional nodes often results in cache misses.
For search, insertion, and deletion, the number of nodes that need to be
accessed in a sorted tree is proportional to the height of the tree. Binary
trees have two children per node so the height for a balanced tree is
~log2(N), where N is the number of values in the tree. B-tree nodes
have more children (e.g., for absl::btree_set<int32_t>
, nodes currently have
62 children) so the tree height for balanced B-trees is ~logM(N),
where M is the number of children. As a result, the height in a B-tree can be
lower than in a balanced binary tree of the same size by a factor of
~log2(M), which is ~6 for M=62.
For iteration, it is most cache efficient to store all values contiguously. B-trees stores values contiguously in each node so it’s also generally more efficient to iterate through a B-tree than a binary tree.
B-trees also use significantly less memory per value in the tree because overhead is per node, and there are fewer nodes per value in B-trees. There is also an optimization in Abseil B-tree in which leaf nodes don’t store child pointers. Since the vast majority of nodes are leaf nodes (due to the higher branching factor due to more children per non-leaf node), this ends up saving a significant amount of memory.
As an example, currently in 64-bit mode, the libstdc++ implementation of
std::set<int32_t>
allocates 40 bytes per value inserted into the container.
absl::btree_set<int32_t>
generally uses ~4.3 to ~5.1 bytes per value
(depending on usage).
When values are inserted or removed from a B-tree, nodes can be split or merged and values can be moved within and between nodes (for the purpose of maintaining tree balance). This means that when values are inserted or removed from a B-tree container, pointers and iterators to other values in the B-tree can be invalidated. Abseil B-trees therefore do not provide pointer stability or iterator stability - this is in contrast to STL sorted containers that do provide these guarantees.
B-trees are a good default choice for sorted containers, however, there are cases in which the STL alternatives may be more appropriate.
value_type
is large, fewer values can be stored per node so the
benefits of B-tree are lessened.value_type
is expensive to move, B-tree may become more expensive than
STL alternatives because B-tree needs to move values within and between nodes
to maintain balance, whereas binary trees can just move pointers instead.
std::array<int32_t, 16>
is an example of a value_type
for which STL sorted
containers currently outperform B-trees.