Date: Sun, 5 Apr 2026 16:25:02 -0700
Hello,
I am interested in designing container classes that look like the STL but
are adapted for use in embedded systems, specifically in real-time contexts
where dynamic memory allocation is not allowed for safety and performance
reasons. The Joint Strike Fighter Air Vehicle C++ Coding Standards would be
an example of such requirements.
Ideally, programmers familiar with the C++ standard library would not need
to learn entirely new interfaces. This led me to study the design of the
existing library from that perspective. In order to better understand the
design and quality of std::map::insert( const_iterator pos, node_type&& nh ),
I set up a benchmark for that method. The resulting code is somewhat
peculiar and the results are not favorable for C++.
I would like to present my findings not as a proposal but as a request for
comments, as I am not yet convinced that a red-black tree is even ideal for
real-time use, even though it is popular in places like the Linux kernel.
For this test I created a C++ version and a C version. The C++ version
uses std::map<Key,
T, Compare, Allocator>::lower_bound and iterator std::map<Key, T, Compare,
Allocator>::insert( const_iterator pos, node_type&& nh ). The C version
implements a map using a simple sorted array, binary search for lookup, and
memmove to create an empty slot. Both tests use a small custom stack
allocator that allocates memory by simply incrementing a pointer.
Please keep in mind that these results were not designed to favor the
algorithmic advantages of a tree data structure. They were instead intended
to be a representative example of ordinary code. My initial goal was to
provide an example of interface design, not to make a point about
performance numbers, although I found the performance numbers to be
somewhat concerning as well.
With that said, let me present the performance numbers first as they are
easier to summarize. Each test either adds or removes a node from the map
32 times. Each test is run 10 times as a warm up first, so there should be
no cache effects. It is then run 100 times in a loop and the results timed.
Each C++ run took 365.6 ns on average. Each C run took 194.4 ns on average,
making the C++ version 1.88x slower. The C++ test allocated 1216 bytes for
node storage while the C test used 128 bytes for its arrays, a difference
of 9.5x. Finally, the C++ executable used 4240 more bytes when stripped.
Knuth has already addressed cache coherence concerns around linear probing
versus trees, but I would also note that the locality of reference with
node handles is worse in this case and was hopefully not measured.
The interface requirements of std::map::insert( const_iterator pos,
node_type&& nh ) are well known to this list, but I would like to show some
of the code I ended up with. What made my usage example particularly
convoluted was the addition of a C++ custom allocator framework. The JSF AV
requirements prohibit globals, so the requirement that containers be
constructed with an allocator handle instance does make sense. Again, this
is primarily a request for comments.
The first major issue was using a temporary std::map to allocate node
handles during startup. If handles cannot be allocated at all later on at
all, this seems like a very strange ceremony to go through to obtain them
all up front. A template<class... Args> std::map::make_handle(Args&&...
args) function returning a handle would seem far more sensible.
If the programmer already knows they have a fixed array of map nodes to
work with after startup, the overhead of stripping used nodes out of a map
so they can be recycled without being destroyed also seems excessive.
Ideally, both an array of available nodes and a map containing them that is
not needed anymore should be resettable by simply updating a few pointers.
Note how much simpler the C version is in this regard.
I am going to attach the code as it runs on far too long for an email.
However, let me quote some of the types involved. While using statements
bring the complexity under control, I want to illustrate the cognitive load
involved in reading compiler errors generated by this code. The array or
node handles is quite a beast.
using PairAlloc = StackAllocator<std::pair<const int, int>, kArenaCapacity>;
using Map = std::map<int, int, std::less<int>, PairAlloc>;
using NodeAlloc = StackAllocator<Map::node_type, kArenaCapacity>;
using IntAlloc = StackAllocator<int, kArenaCapacity>;
// Arena must be declared before any member that holds a StackAllocator,
// so it is constructed first and destroyed last.
StackArena<kArenaCapacity> arena_;
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>, kArenaCapacity>>
// map_{StackAllocator<std::pair<const int, int>,
kArenaCapacity>(arena_)};
Map map_{PairAlloc(arena_)};
// std::vector
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// StackAllocator
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// kArenaCapacity>>
// pool_{StackAllocator
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// kArenaCapacity>(arena_)};
std::vector<Map::node_type, NodeAlloc> pool_{NodeAlloc(arena_)};
// std::vector<int, StackAllocator<int, kArenaCapacity>>
// free_indices_{StackAllocator<int, kArenaCapacity>(arena_)};
std::vector<int, IntAlloc> free_indices_{IntAlloc(arena_)};
Attached are the C++ and C files. I don't think either are great examples
of programming. They are both designed to be idiomatic examples of a
certain test in modern versions of each language.
The git repo for these tests is here:
https://github.com/whatchamacallem/map-nodes
I would be interested in any design feedback for making this kind of
interface simpler and more efficient. Things get a lot easier when you
drop the requirement for node handles, and start working with pointers to
nodes instead, but that works against the desire for safety as we move
forward.
Regards,
Adrian Johnston
I am interested in designing container classes that look like the STL but
are adapted for use in embedded systems, specifically in real-time contexts
where dynamic memory allocation is not allowed for safety and performance
reasons. The Joint Strike Fighter Air Vehicle C++ Coding Standards would be
an example of such requirements.
Ideally, programmers familiar with the C++ standard library would not need
to learn entirely new interfaces. This led me to study the design of the
existing library from that perspective. In order to better understand the
design and quality of std::map::insert( const_iterator pos, node_type&& nh ),
I set up a benchmark for that method. The resulting code is somewhat
peculiar and the results are not favorable for C++.
I would like to present my findings not as a proposal but as a request for
comments, as I am not yet convinced that a red-black tree is even ideal for
real-time use, even though it is popular in places like the Linux kernel.
For this test I created a C++ version and a C version. The C++ version
uses std::map<Key,
T, Compare, Allocator>::lower_bound and iterator std::map<Key, T, Compare,
Allocator>::insert( const_iterator pos, node_type&& nh ). The C version
implements a map using a simple sorted array, binary search for lookup, and
memmove to create an empty slot. Both tests use a small custom stack
allocator that allocates memory by simply incrementing a pointer.
Please keep in mind that these results were not designed to favor the
algorithmic advantages of a tree data structure. They were instead intended
to be a representative example of ordinary code. My initial goal was to
provide an example of interface design, not to make a point about
performance numbers, although I found the performance numbers to be
somewhat concerning as well.
With that said, let me present the performance numbers first as they are
easier to summarize. Each test either adds or removes a node from the map
32 times. Each test is run 10 times as a warm up first, so there should be
no cache effects. It is then run 100 times in a loop and the results timed.
Each C++ run took 365.6 ns on average. Each C run took 194.4 ns on average,
making the C++ version 1.88x slower. The C++ test allocated 1216 bytes for
node storage while the C test used 128 bytes for its arrays, a difference
of 9.5x. Finally, the C++ executable used 4240 more bytes when stripped.
Knuth has already addressed cache coherence concerns around linear probing
versus trees, but I would also note that the locality of reference with
node handles is worse in this case and was hopefully not measured.
The interface requirements of std::map::insert( const_iterator pos,
node_type&& nh ) are well known to this list, but I would like to show some
of the code I ended up with. What made my usage example particularly
convoluted was the addition of a C++ custom allocator framework. The JSF AV
requirements prohibit globals, so the requirement that containers be
constructed with an allocator handle instance does make sense. Again, this
is primarily a request for comments.
The first major issue was using a temporary std::map to allocate node
handles during startup. If handles cannot be allocated at all later on at
all, this seems like a very strange ceremony to go through to obtain them
all up front. A template<class... Args> std::map::make_handle(Args&&...
args) function returning a handle would seem far more sensible.
If the programmer already knows they have a fixed array of map nodes to
work with after startup, the overhead of stripping used nodes out of a map
so they can be recycled without being destroyed also seems excessive.
Ideally, both an array of available nodes and a map containing them that is
not needed anymore should be resettable by simply updating a few pointers.
Note how much simpler the C version is in this regard.
I am going to attach the code as it runs on far too long for an email.
However, let me quote some of the types involved. While using statements
bring the complexity under control, I want to illustrate the cognitive load
involved in reading compiler errors generated by this code. The array or
node handles is quite a beast.
using PairAlloc = StackAllocator<std::pair<const int, int>, kArenaCapacity>;
using Map = std::map<int, int, std::less<int>, PairAlloc>;
using NodeAlloc = StackAllocator<Map::node_type, kArenaCapacity>;
using IntAlloc = StackAllocator<int, kArenaCapacity>;
// Arena must be declared before any member that holds a StackAllocator,
// so it is constructed first and destroyed last.
StackArena<kArenaCapacity> arena_;
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>, kArenaCapacity>>
// map_{StackAllocator<std::pair<const int, int>,
kArenaCapacity>(arena_)};
Map map_{PairAlloc(arena_)};
// std::vector
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// StackAllocator
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// kArenaCapacity>>
// pool_{StackAllocator
// std::map<int, int, std::less<int>,
// StackAllocator<std::pair<const int, int>,
kArenaCapacity>>::node_type,
// kArenaCapacity>(arena_)};
std::vector<Map::node_type, NodeAlloc> pool_{NodeAlloc(arena_)};
// std::vector<int, StackAllocator<int, kArenaCapacity>>
// free_indices_{StackAllocator<int, kArenaCapacity>(arena_)};
std::vector<int, IntAlloc> free_indices_{IntAlloc(arena_)};
Attached are the C++ and C files. I don't think either are great examples
of programming. They are both designed to be idiomatic examples of a
certain test in modern versions of each language.
The git repo for these tests is here:
https://github.com/whatchamacallem/map-nodes
I would be interested in any design feedback for making this kind of
interface simpler and more efficient. Things get a lot easier when you
drop the requirement for node handles, and start working with pointers to
nodes instead, but that works against the desire for safety as we move
forward.
Regards,
Adrian Johnston
Received on 2026-04-05 23:25:19
