Why are you using lower_bound then insert, why not just insert the node handle directly?

Why are you using vector::reserve(n) followed by vector::resize(n), why not just resize?

You can replace your entire StackArena and Map and pool of Map::node_type with this:

alignas(std::max_align_t) char buf_[kArenaCapacity];
std::pmr::monotonic_buffer_resource bufres_{buf_, sizeof(buf_),
std::pmr::null_memory_resource()};
std::pmr::unsynchronized_pool_resource res_{&bufres_};
std::flat_map<int, int, std::less<int>, std::pmr::vector<int>, std::pmr::vector<int>> map_{&res_};

This will use a stack buffer (not malloc) and maintain a free list of deallocated nodes in an object pool.

This uses a flat_map instead of RB tree, which IIUC has a better match for your requirements.

The full code is here https://godbolt.org/z/WW9n7qPEY

On Mon, 6 Apr 2026 at 21:57, Jonathan Wakely <cxx@kayari.org> wrote:

This seems to unfairly pessimize the RB tree benchmark:

// Picks a pseudo-random map entry, extracts it, and returns it to the pool.
void DoRemove(int iter, uint32_t seed, bool verbose)
{
// const std::map<int, int, std::less<int>, StackAllocator<std::pair<const int, int>, kArenaCapacity>>::iterator
const Map::iterator it = std::next(
map_.begin(),
static_cast<std::ptrdiff_t>(seed % map_.size()));

You're incrementing a bidirectional iterator N times as part of the Remove step, which is not testing how fast it is to remove an element.

On Mon, 6 Apr 2026 at 21:20, Jonathan Wakely <cxx@kayari.org> wrote:

On Mon, 6 Apr 2026, 21:09 Jonathan Wakely, <cxx@kayari.org> wrote:

On Mon, 6 Apr 2026, 18:28 Adrian Johnston, <ajohnston4536@gmail.com> wrote:

Hello Jonathan,

> it seems like you described a flat map for C, but you aren't using
> one for C++. Why not?

The goal was to assess/evaluate the design for the standard interface for red black trees. The C version was intended as a baseline, not an apples to apples comparison.

So you're comparing a RB tree to an array. Is that meaningful?

Specifically, you said "The resulting code is somewhat peculiar and the results are not favorable for C++" but it seems that the results are not for "C++" but instead for comparing two different data structures, where anybody could have predicted that a RB tree would perform worse than an array. And the complexity of the code is because of the decision to work with nodes that you create+manage by hand, which was never how std::map is meant to be used.

When I used to write real time code full time we would just use an array and linear probing for something small like this. If your array is unordered then you can just swap the end element down when erasing an element.

From my own code:

template<hxarray_concept_ T_, size_t capacity_>
void hxarray<T_, capacity_>::erase_unordered(const T_* pos_) {
    hxassertmsg(pos_ >= this->data() && pos_ < m_end_, "invalid_iterator");
    if(pos_ != --m_end_) {
        *const_cast<T_*>(pos_) = hxmove(*m_end_);
    }
    m_end_->~T_();
}

This is something I think the std::vector could definitely use.

> Why not use an allocator that has a fixed capacity with enough space for the
> nodes you plan to allocate, then just let the nodes get allocated as needed?

This requires me to call constructors and destructors on the map nodes

Why? I don't understand this claim at all.

which, on a non-trivial program, will likely require further allocations and deallocations.

Eh?

This could become a headache compared to maintaining a free list of non-trivial nodes.

In my ideal world a node type would be nothing but a contract to avoid multiple inheritance.