B-tree Ordered Containers

By Evan Brown, Engineer

The Abseil container library now includes B-tree containers that generally conform to the STL sorted container APIs:

  • absl::btree_set (meant to replace usage of std::set)
  • absl::btree_map (meant to replace usage of std::map)
  • absl::btree_multiset (meant to replace usage of std::multiset)
  • absl::btree_multimap (meant to replace usage of std::multimap)

You use a B-tree container just as you would an STL ordered container:

#include "absl/container/btree_map.h"

absl::btree_map<int, std::string> ducks =
    {{2, "dewey"}, {1, "huey"}, {3, "louie"},};

// Prints "huey, dewey, louie "
for (const auto& n : ducks) {
    std::cout << n.second << ", ";

B-trees have a different implementation from STL std::map containers, which require binary trees commonly implemented as red-black trees. While a red-black tree is limited to single-element nodes, with precisely two children, a B-tree may contain multiple values per node (M), with each node having (M+1) children. Having more values and children per node is more cache friendly because nodes are generally allocated separately so accessing additional nodes often results in cache misses.

Cache Friendliness

For search, insertion, and deletion, the number of nodes that need to be accessed in a sorted tree is proportional to the height of the tree. In balanced trees, this height is ~logC(N), where C is the number of children per node and N is the number of values in the container; because b-trees have more children per node than binary trees, their heights are lower, and searching is faster.

For iteration, it is most cache efficient to store all values contiguously. B-trees store values contiguously in each node so it’s also generally more efficient to iterate through a B-tree than a binary tree.

Memory Overhead

B-trees also use significantly less memory per value in the tree because overhead is per node, and there are fewer nodes per value in B-trees. There is also an optimization in Abseil B-tree in which leaf nodes don’t store child pointers. Since the vast majority of nodes are leaf nodes (because of the higher branching factor due to more children per non-leaf node), this ends up saving a significant amount of memory.

API Difference from STL Sorted Containers

When values are inserted or removed from a B-tree, nodes can be split or merged and values can be moved within and between nodes (for the purpose of maintaining tree balance). This means that when values are inserted or removed from a B-tree container, pointers and iterators to other values in the B-tree can be invalidated. Abseil B-trees therefore do not provide pointer stability or iterator stability - this is in contrast to STL sorted containers that do provide these guarantees.

When to Use B-trees

B-trees are a good default choice for sorted containers, however, there are cases in which the STL alternatives may be more appropriate.

  • When value_type is large, fewer values can be stored per node so the benefits of B-tree are lessened.
  • When value_type is expensive to move, B-tree may become more expensive than STL alternatives because B-tree needs to move values within and between nodes to maintain balance, whereas binary trees can just move pointers instead. std::array<int32_t, 16> is an example of a value_type for which STL sorted containers currently outperform B-trees.
  • When pointer stability or iterator stability is required, B-trees aren’t a viable option (although usually code can be refactored to avoid these requirements).

For more information, consult the Abseil Container library documentation. Check out the B-tree Design Note for information on B-tree’s implementation.

Subscribe to the Abseil Blog