Date: Wed, 27 May 2026 21:55:44 -0700
In a previous thread I was asked what I meant by a "non-reallocating
version of the standard."
Before I begin, please allow me to explain this is actually a thing that
real-time programmers and embedded systems programmers have been dealing
with since those job descriptions were invented. Just for example, on
Bjarne Stroustrup's website he has a link to the Joint Strike Fighter Air
Vehicle C++ Coding Standards. And in section 4.26 Memory Allocation it
states: "Allocation/deallocation from/to the free store (heap) shall not
occur after initialization." https://www.stroustrup.com/JSF-AV-rules.pdf
That would be an example of an environment where memory allocator
fragmentation is considered too dangerous to allow.
For my part, I received an $8 Raspberry Pi computer in the mail today and
I'm looking forward to tinkering with it. It has 1/2 MB of RAM and
considering the current price of RAM I think I made an excellent purchasing
decision. I'm hoping to remove malloc and free from libc and set up my own
address space for some simple allocators using the linker.
I do understand that *std::pmr::polymorphic_allocator* exists and that a
range of solutions could be implemented using it. Personally, I consider a
small block allocator a solution that would need to be studied carefully to
prove 100% it would never exhaust all available memory and that it would be
the memory efficient choice. I'd also rather not go down a rabbit hole
debating whether or not a particular allocation strategy is a universal
solution. In my experience in industry the problem was not a technical
problem at all, but an ongoing problem of resolving which person exactly
was responsible for the project being out of memory and a complex technical
solution only obscured the answer to that.
All that said, this is what a non-reallocating set of containers would look
like to me:
*vector* - The vector is sized to its maximum size before use. It will
simply assert or call a hardened library handler if enough memory was not
reserved in advance. This also avoids generating code for reallocation all
over. (I fondly remember a Technical Director dragging me over to his desk
to point out *std::vector::push_back* was taking 3% of CPU when it wasn't
supposed to even exist in an optimized build.)
*deque* - The deque is implemented as an array sized to a maximum
power-of-two size before use. It is used as a ring buffer where all
operations rely on masking off low bits. E.g. to increment the tail one
would use: *m_tail_ = (m_tail_ + 1u) & m_mask_* This is a classic algorithm
and a non-intrusive design
*hash_table/map/set* - Again, the hash table is sized to a maximum
power-of-two before use. This has the effect of substantially reducing the
number of hash collisions during normal operation. Then an embedded singly
linked list is used to insert nodes. This doesn't have to be too intrusive,
it can rely on a C++20 requires statement that requires the node class to
provide an interface for use by the hash table to implement the s-list
instead of using a base class or macros.
Pardon me if I am about to put my foot in my mouth, I don't mean to annoy
the entire list. To my eye, this doesn't look so different than having well
defined *map::node_type*. Specifically if *std::map<T>::node_type* was
defined as *std::unique_ptr<T>* and had the additional requirement that:
*template<typename node_t_>concept std::hash_table_node_concept_
=requires(node_t_& node_, const node_t_& const_node_) { {
node_.hash_next() } -> std::same_as<node_t_*&>; // Returns a reference
to a pointer { const_node_.hash_next() } -> std::same_as<node_t_*>; //
Returns a pointer};*
In this example, the hash table would take ownership from a
*std::unique_ptr* and return ownership via a *std::unique_ptr*. Then it
would use *hash_next()* to manipulate the embedded list pointer. This would
make explicit and safe management of the lifecycle of T a little easier.
Having *std::do_not_delete* as a no-op destruction policy may also be
convenient.
If you have ever had a visualizer that showed allocator memory utilization
as a 2D bitmap, it is really amazing how creating a hash table and stuffing
10000 nodes in it incrementally fragments a general purpose allocator.
Oddly enough, I have worked with one of those once.
*(s)list* - The node could be defined using a concept and operations could
involve *std::unique_ptr* as described above. For maximum optimality the
Sentinel Pattern and the XOR Doubly Linked List Pattern could be used to
fit a doubly linked list into a single *intptr_t* per-node with no null
checks. If the sentinel node also functioned as the *end()* iterator you
have branchless list iteration with a single xor to decode the next
address. The only downside is that the Sentinel Pattern would require the
nodes to use a base class. Unless I am somehow lacking imagination.
After taking a stab at it, I ended up deciding not to recommend
non-allocating ordered maps and sets. The amount of generated code didn't
seem appropriate for environments motivated by efficiency.
I'm not putting forward a proposal here, I think someone would have to do a
lot more work to wrangle the ideas I listed here into something ready to be
standardized. But this is what I am talking about. I did spend some time
putting together a working version of all this, mostly as a strange hobby.
I started my career as a C++ programmer doing real-time 30Hz 3D programming
on a 33MHz processor with 2MB RAM (and no FPU!!!) and I thought it would be
cool to be able to show people how it was done.
https://github.com/whatchamacallem/libhatchet
Regards,
Adrian
version of the standard."
Before I begin, please allow me to explain this is actually a thing that
real-time programmers and embedded systems programmers have been dealing
with since those job descriptions were invented. Just for example, on
Bjarne Stroustrup's website he has a link to the Joint Strike Fighter Air
Vehicle C++ Coding Standards. And in section 4.26 Memory Allocation it
states: "Allocation/deallocation from/to the free store (heap) shall not
occur after initialization." https://www.stroustrup.com/JSF-AV-rules.pdf
That would be an example of an environment where memory allocator
fragmentation is considered too dangerous to allow.
For my part, I received an $8 Raspberry Pi computer in the mail today and
I'm looking forward to tinkering with it. It has 1/2 MB of RAM and
considering the current price of RAM I think I made an excellent purchasing
decision. I'm hoping to remove malloc and free from libc and set up my own
address space for some simple allocators using the linker.
I do understand that *std::pmr::polymorphic_allocator* exists and that a
range of solutions could be implemented using it. Personally, I consider a
small block allocator a solution that would need to be studied carefully to
prove 100% it would never exhaust all available memory and that it would be
the memory efficient choice. I'd also rather not go down a rabbit hole
debating whether or not a particular allocation strategy is a universal
solution. In my experience in industry the problem was not a technical
problem at all, but an ongoing problem of resolving which person exactly
was responsible for the project being out of memory and a complex technical
solution only obscured the answer to that.
All that said, this is what a non-reallocating set of containers would look
like to me:
*vector* - The vector is sized to its maximum size before use. It will
simply assert or call a hardened library handler if enough memory was not
reserved in advance. This also avoids generating code for reallocation all
over. (I fondly remember a Technical Director dragging me over to his desk
to point out *std::vector::push_back* was taking 3% of CPU when it wasn't
supposed to even exist in an optimized build.)
*deque* - The deque is implemented as an array sized to a maximum
power-of-two size before use. It is used as a ring buffer where all
operations rely on masking off low bits. E.g. to increment the tail one
would use: *m_tail_ = (m_tail_ + 1u) & m_mask_* This is a classic algorithm
and a non-intrusive design
*hash_table/map/set* - Again, the hash table is sized to a maximum
power-of-two before use. This has the effect of substantially reducing the
number of hash collisions during normal operation. Then an embedded singly
linked list is used to insert nodes. This doesn't have to be too intrusive,
it can rely on a C++20 requires statement that requires the node class to
provide an interface for use by the hash table to implement the s-list
instead of using a base class or macros.
Pardon me if I am about to put my foot in my mouth, I don't mean to annoy
the entire list. To my eye, this doesn't look so different than having well
defined *map::node_type*. Specifically if *std::map<T>::node_type* was
defined as *std::unique_ptr<T>* and had the additional requirement that:
*template<typename node_t_>concept std::hash_table_node_concept_
=requires(node_t_& node_, const node_t_& const_node_) { {
node_.hash_next() } -> std::same_as<node_t_*&>; // Returns a reference
to a pointer { const_node_.hash_next() } -> std::same_as<node_t_*>; //
Returns a pointer};*
In this example, the hash table would take ownership from a
*std::unique_ptr* and return ownership via a *std::unique_ptr*. Then it
would use *hash_next()* to manipulate the embedded list pointer. This would
make explicit and safe management of the lifecycle of T a little easier.
Having *std::do_not_delete* as a no-op destruction policy may also be
convenient.
If you have ever had a visualizer that showed allocator memory utilization
as a 2D bitmap, it is really amazing how creating a hash table and stuffing
10000 nodes in it incrementally fragments a general purpose allocator.
Oddly enough, I have worked with one of those once.
*(s)list* - The node could be defined using a concept and operations could
involve *std::unique_ptr* as described above. For maximum optimality the
Sentinel Pattern and the XOR Doubly Linked List Pattern could be used to
fit a doubly linked list into a single *intptr_t* per-node with no null
checks. If the sentinel node also functioned as the *end()* iterator you
have branchless list iteration with a single xor to decode the next
address. The only downside is that the Sentinel Pattern would require the
nodes to use a base class. Unless I am somehow lacking imagination.
After taking a stab at it, I ended up deciding not to recommend
non-allocating ordered maps and sets. The amount of generated code didn't
seem appropriate for environments motivated by efficiency.
I'm not putting forward a proposal here, I think someone would have to do a
lot more work to wrangle the ideas I listed here into something ready to be
standardized. But this is what I am talking about. I did spend some time
putting together a working version of all this, mostly as a strange hobby.
I started my career as a C++ programmer doing real-time 30Hz 3D programming
on a 33MHz processor with 2MB RAM (and no FPU!!!) and I thought it would be
cool to be able to show people how it was done.
https://github.com/whatchamacallem/libhatchet
Regards,
Adrian
Received on 2026-05-28 04:56:01
