30 December, 2021

How do I implement a C++ bump allocator that works with standard library containers?

Bump Allocators in C++

Here are some notes on using std::pmr::monotonic_buffer_resource as a bump allocator for use with standard library containers.

When to use a Bump Allocator

For certain applications, memory management can be an unnecessary overhead. This is particularly true of applications in which memory has to be acquired temporarily to perform some work, then disposed of once that work is done.

For example, the application might receive a request, do some work to process that request, then reply to the sender with the result of that work. If the point of the work is to respond to the request, then most of the memory allocated during processing will not be needed once the response is sent. In this scenario, memory allocations are short-lived and temporary.

If such an application uses C++ standard library containers, then a measurable and often significant portion of its time will be spent allocating memory for containers, reallocating and copying when the containers are resized, and deallocating when the containers are no longer required.

If speed is of the essence, and the time taken for allocations and deallocations is significant, then a strategy to handle this is to use a bump allocator.

However, if it is not used carefully, then a bump allocator may end up using significantly more memory, particularly when used with containers that resize when they need more space.

std::pmr::monotonic_buffer_resource

C++17 introduced polymorphic allocators, backed by memory resources. Both of these come from the <memory_resource> header. One such memory resource is std::pmr::monotonic_buffer_resource, a name that hardly rolls off the tongue, but is nonetheless useful as it makes it relatively easy to write a bump allocator for use with standard library containers.

The main characteristic of std::pmr::monotonic_buffer_resource is that it allocates during its lifetime, but only deallocates when the resource is destroyed. In other words, it acts as a bump allocator.

Using a monotonic buffer resource as a bump allocator

Creating a monotonic buffer resource

There are a number of constructors for std::pmr::monotonic_buffer_resource whose parameters include:

an upstream resource, to be called for more memory when the resource is exhausted. If no upstream resource is supplied then this defaults to std::pmr::get_default_resource()
an initial buffer, along with its size in bytes. The resource’s current buffer starts off as null unless an initial buffer is provided.
a next size parameter that determines the size of the next buffer to be allocated

In this example, the resource is given an initial buffer and the buffer size. When the initial buffer is exhausted then the resource will use the resource returned by std::pmr::get_default_resource() to get more memory.

std::array<std::byte, 2048> buffer;                                    // The initial buffer.
std::pmr::monotonic_buffer_resource mbr{buffer.data(), buffer.size()}; // The resource itself.

Here’s a polymorphic allocator that uses the monotonic buffer resource, and a vector that uses the allocator. Note that std::pmr::vector<T> is an alias for std::vector<T, std::pmr::polymorphic_allocator<T>>.

std::pmr::polymorphic_allocator<int> bumpAllocator{&mbr}; // An allocator using the resource.
std::pmr::vector<int> vectorOfInt{bumpAllocator};         // A vector using the allocator.

Using an upstream resource

Here’s a resource, upstream, backed by a vector that is used as its initial buffer. When this resource’s buffer is exhausted then it will use std::pmr::get_default_resource() to get more memory.

std::vector<std::byte> bigBuffer{65536};
std::pmr::monotonic_buffer_resource upstream{bigBuffer.data(), bigBuffer.size()};

Here’s another resource that uses upstream as its upstream resource. When this resource is exhausted then it will use the upstream resource to get more memory.

std::pmr::monotonic_buffer_resource mbr{&upstream};

Limiting allocations with the null memory resource

Sometimes you may want to limit the amount of memory that a resource can use. Do this by setting its upstream resource to the return value of std::pmr::null_memory_resource().

When the resource in the following example is exhausted then it will attempt to use the null memory resource to get more memory, but that will throw std::bad_alloc.

std::vector<std::byte> bigBuffer{65536};
std::pmr::monotonic_buffer_resource mbr{bigBuffer.data(), bigBuffer.size(),
                                        std::pmr::null_memory_resource()};

Tradeoffs

There are tradeoffs to using bump allocators, particularly with containers that resize. Consider std::vector’s behaviour when it exceeds its capacity. It allocates new memory, copies its contents to that memory, and deallocates the old memory.

But with an allocator that doesn’t deallocate, then that can quickly use up memory.

Example

Take the following example. There’s a 128 byte buffer that has capacity for 32 x 4-byte integers, a bump allocator backed by a memory resource that uses the buffer, and a vector that uses the bump allocator.

// A buffer with capacity for 32 x 4-byte integers.
std::array<std::byte, 128> tinyBuffer;

// A memory resource that uses it, with its upstream resource set to the null resource.
std::pmr::monotonic_buffer_resource tinyMbr{tinyBuffer.data(), tinyBuffer.size(),
                                            std::pmr::null_memory_resource()};

// A bump allocator, backed by the memory resource.
std::pmr::polymorphic_allocator<int> bumpAllocator{&tinyMbr};

// A vector that uses the bump allocator.
std::pmr::vector<int> v{bumpAllocator};

The following code will throw std::bad_alloc long before the vector contains 32 integers.

try
{
    for (auto i = 0; i < 32; i++)
    {
        // Show the offset of the vector's data relative to the start of the buffer, the
        // vector's size and capacity, and what we're trying to push onto it.
        std::ptrdiff_t offset = reinterpret_cast<std::byte*>(v.data()) - tinyBuffer.data();
        std::cout << "offset " << (offset > 0 ? offset : 0) << ", size " << v.size()
                    << ", capacity " << v.capacity() << " - pushing " << i << '\n';

        v.push_back(i);
    }
}
catch (const std::bad_alloc&)
{
    std::cout << "Earth shattering kaboom!\n";
}

Here’s the output from running the above with a 32-bit debug build. It threw std::bad_alloc when trying to push the 10th element onto the vector.

offset 0, size 0, capacity 0 - pushing 0
offset 8, size 1, capacity 1 - pushing 1
offset 12, size 2, capacity 2 - pushing 2
offset 20, size 3, capacity 3 - pushing 3
offset 32, size 4, capacity 4 - pushing 4
offset 48, size 5, capacity 6 - pushing 5
offset 48, size 6, capacity 6 - pushing 6
offset 72, size 7, capacity 9 - pushing 7
offset 72, size 8, capacity 9 - pushing 8
offset 72, size 9, capacity 9 - pushing 9
Earth shattering kaboom!

Used in this manner, the 128-byte buffer will never be able to hold 32 x 4-byte integers, because the vector resizes each time it exceeds its capacity. The vector is using a bump allocator, so the memory that it used previously is never deallocated and is never reused. The next point of allocation always moves forward - that’s one of the reasons why bump allocators are so quick.

In this particular example, using a debug build, the vector used the buffer as follows…

8 + 4 * 1 bytes for the first allocation - cumulative 8 bytes
plus another 4 * 2 bytes for the second allocation - cumulative 16 bytes
plus another 4 * 3 bytes for the third allocation - cumulative 28 bytes
plus another 4 * 4 bytes for the fourth allocation - cumulative 44 bytes
plus another 4 * 6 bytes for the fifth allocation - cumulative 68 bytes
plus another 4 * 9 bytes for the sixth allocation - cumulative 104 bytes

It used 104 bytes to hold 9 x 4-byte integers, then failed when trying to add a 10th, because the next allocation exceeded the available space and the upstream resource was the null allocator.

Interestingly, after the first allocation, the initial offset of the vector’s data in the buffer was 8 bytes not zero. However, this seems to be an artefact of the build - with a release build the initial offset was zero.

This underlines the point that for containers such as vectors, having an idea of the size up front can reduce the number of allocations. Normally reducing allocations is sensible for performance, because allocations have a cost in terms of time, but for a bump allocator the cost is memory, because allocations are not deallocated.

Taking the same example, and adding a single line before the loop to reserve capacity in the vector, the number of allocations is greatly reduced because the vector never has to resize.

v.reserve(32); // Give the vector an initial capacity of 32.

Here’s the output from running the modified code with a 32-bit release build. This time there has been just one allocation, and the entire buffer has been successfully populated with 32 x 4-byte integers.

offset 0, size 0, capacity 32 - pushing 0
offset 0, size 1, capacity 32 - pushing 1
...
offset 0, size 30, capacity 32 - pushing 30
offset 0, size 31, capacity 32 - pushing 31