In a previous thread I was asked what I meant by a "non-reallocating version of the standard."
Before I begin, please allow me to explain this is actually a thing that real-time programmers and embedded systems programmers have been dealing with since those job descriptions were invented. Just for example, on Bjarne Stroustrup's website he has a link
to the Joint Strike Fighter Air Vehicle C++ Coding Standards. And in section 4.26 Memory Allocation it states: "Allocation/deallocation from/to the free store (heap) shall not occur after initialization."
https://www.stroustrup.com/JSF-AV-rules.pdf That would be an example of an environment where memory allocator fragmentation is considered too dangerous to allow.
For my part, I received an $8 Raspberry Pi computer in the mail today and I'm looking forward to tinkering with it. It has 1/2 MB of RAM and considering the current price of RAM I think I made an excellent purchasing decision. I'm hoping to remove malloc and
free from libc and set up my own address space for some simple allocators using the linker.
I do understand that
std::pmr::polymorphic_allocator exists and that a range of solutions could be implemented using it. Personally, I consider a small block allocator a solution that would need to be studied
carefully to prove 100% it would never exhaust all available memory and that it would be the memory efficient choice. I'd also rather not go down a rabbit hole debating whether or not a particular allocation strategy is a universal solution. In my experience
in industry the problem was not a technical problem at all, but an ongoing problem of resolving which person exactly was responsible for the project being out of memory and a complex technical solution only obscured the answer to that.
All that said, this is what a non-reallocating set of containers would look like to me:
vector - The vector is sized to its maximum size before use. It will simply assert or call a hardened library handler if enough memory was not reserved in advance. This also avoids generating code for reallocation all over. (I fondly remember a Technical
Director dragging me over to his desk to point out
std::vector::push_back was taking 3% of CPU when it wasn't supposed to even exist in an optimized build.)
deque - The deque is implemented as an array sized to a maximum power-of-two size before use. It is used as a ring buffer where all operations rely on masking off low bits. E.g. to increment the tail one would use:
m_tail_ = (m_tail_ + 1u) & m_mask_ This is a classic algorithm and a non-intrusive design
hash_table/map/set - Again, the hash table is sized to a maximum power-of-two before use. This has the effect of substantially reducing the number of hash collisions during normal operation. Then an embedded singly linked list is used to insert nodes.
This doesn't have to be too intrusive, it can rely on a C++20 requires statement that requires the node class to provide an interface for use by the hash table to implement the s-list instead of using a base class or macros.
Pardon me if I am about to put my foot in my mouth, I don't mean to annoy the entire list. To my eye, this doesn't look so different than having well defined
map::node_type. Specifically if
std::map<T>::node_type was defined as
std::unique_ptr<T> and had the additional requirement that:
template<typename node_t_>
concept std::hash_table_node_concept_ =
requires(node_t_& node_, const node_t_& const_node_) {
{ node_.hash_next() } -> std::same_as<node_t_*&>; // Returns a reference to a pointer
{ const_node_.hash_next() } -> std::same_as<node_t_*>; // Returns a pointer
};
In this example, the hash table would take ownership from a
std::unique_ptr and return ownership via a
std::unique_ptr. Then it would use
hash_next() to manipulate the embedded list pointer. This would make explicit and safe management of the lifecycle of T a little easier. Having
std::do_not_delete as a no-op destruction policy may also be convenient.
If you have ever had a visualizer that showed allocator memory utilization as a 2D bitmap, it is really amazing how creating a hash table and stuffing 10000 nodes in it incrementally fragments a general purpose allocator. Oddly enough, I have worked with one
of those once.
(s)list - The node could be defined using a concept and operations could involve
std::unique_ptr as described above. For maximum optimality the Sentinel Pattern and the XOR Doubly Linked List Pattern could be used to fit a doubly linked list into a single
intptr_t per-node with no null checks. If the sentinel node also functioned as the
end() iterator you have branchless list iteration with a single xor to decode the next address. The only downside is that the Sentinel Pattern would require the nodes to use a base class. Unless I am somehow
lacking imagination.
After taking a stab at it, I ended up deciding not to recommend non-allocating ordered maps and sets. The amount of generated code didn't seem appropriate for environments motivated by efficiency.