On Thu, Nov 5, 2020 at 9:35 AM Scott Michaud via Std-Proposals <std-proposals@lists.isocpp.org> wrote:

My algorithm uses two stacks, one at the bottom of the page allocating up, and one at the top of the page allocating down. [...]
Godbolt: https://godbolt.org/z/znq1j3
GitHub: https://github.com/ScottMichaud/MiscCpp_DoubleStackAllocator

My comment on the benchmark:
It looks to me as if you have another constraint on your allocation pattern. Memory allocated in thread A cannot be deallocated in thread B. I can't do, like,
    std::vector<int, DoubleStackAllocator<int>> v = {1,2,3};
    std::thread t = std::thread([v = std::move(v)]() {
        // do nothing
    });
    t.join();
Is my understanding accurate?  If so, I think you should consider whether you're really doing an apples-to-apples comparison in your benchmark there.

But anyway, I think the benchmark is mostly a red herring. Everyone should agree that sometimes allocator A will outperform allocator B, and sometimes vice versa, depending on the workload. We don't really need benchmarks to prove that, IMHO.

I recommend that you take your DoubleStackAllocator<T> and re-code it as a class derived from C++17's std::pmr::memory_resource, so that this code works—

    DoubleStackMemoryResource mr;
    std::pmr::vector<int> v({1, 2, 3}, &mr);

Notice that now it becomes obvious that `mr` needs somewhere to live, and it needs some way to explain where it's getting its initial chunk of memory from.
In your current code, you kind of handwave that dependency by hiding it in a global variable (namely, that `thread_local std::vector<DoubleStack>` inside a static member function of DoubleStackAllocatorImpl). Pulling it out into the sunlight will improve the design.

"But std::pmr relies on virtual function calls, which are slow!" True. But I think the community knows how to deal with that, if one really needs to. I recall a conversation with Bryce Adelstein Lelbach and (I think) Michal Dominiak, several years ago, when the Thrust library was trying to do allocators for the first time. Basically, std::pmr::polymorphic_allocator<T> is a poor-man's-type-erased allocator that can point to any memory_resource; you just need to implement a monomorphic_allocator that can point to only one kind of memory_resource. Then the user gets to choose which one to use — polymorphic or monomorphic.

Something like this:
https://godbolt.org/z/rax7Wv
Unfortunately I do see a virtual call still happening in the right-hand pane, though; I'm not sure why Clang isn't devirtualizing it based on the knowledge that FinalOf<std::pmr::monotonic_buffer_resource> is a final class.

Note that my toy monomorphic_allocator<T,MR> is missing a lot of stuff that the real one would have to deal with, such as a converting constructor and POCMA/POCCA/POCS. I don't think that the diff in the godbolt above is attributable to its lack of any of that stuff, though.

Anyway, you should definitely look into C++17 std::pmr; it's doing basically the same kind of thing you're interested in.
I also highly recommend my own talk on the subject, "An Allocator is a Handle to a Heap," if you haven't already seen it:
https://www.youtube.com/watch?v=0MdSJsCTRkY

–Arthur