std-proposals: Re: Request for Opinion: Goal-based Memory Allocation

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Thu, 5 Nov 2020 17:41:09 -0500

On Thu, Nov 5, 2020 at 9:35 AM Scott Michaud via Std-Proposals <
std-proposals_at_[hidden]> wrote:

>
> My algorithm uses two stacks, one at the bottom of the page allocating up,
> and one at the top of the page allocating down. [...]
>
>
> Godbolt: https://godbolt.org/z/znq1j3
> GitHub: https://github.com/ScottMichaud/MiscCpp_DoubleStackAllocator
>

My comment on the benchmark:
It looks to me as if you have another constraint on your allocation
pattern. Memory allocated in thread A cannot be deallocated in thread B. I
can't do, like,
    std::vector<int, DoubleStackAllocator<int>> v = {1,2,3};
    std::thread t = std::thread([v = std::move(v)]() {
        // do nothing
    });
    t.join();
Is my understanding accurate? If so, I think you should consider whether
you're really doing an apples-to-apples comparison in your benchmark there.

But anyway, I think the benchmark is mostly a red herring. Everyone should
agree that sometimes allocator A will outperform allocator B, and sometimes
vice versa, depending on the workload. We don't *really* need benchmarks to
prove that, IMHO.

I recommend that you take your DoubleStackAllocator<T> and re-code it as a
class derived from C++17's std::pmr::memory_resource, so that this code
works—

    DoubleStackMemoryResource mr;
    std::pmr::vector<int> v({1, 2, 3}, &mr);

Notice that now it becomes obvious that `mr` needs somewhere to live, and
it needs some way to explain where it's getting its initial chunk of memory
from.
In your current code, you kind of handwave that dependency by hiding it in
a global variable (namely, that `thread_local std::vector<DoubleStack>`
inside a static member function of DoubleStackAllocatorImpl). Pulling it
out into the sunlight will improve the design.

"But std::pmr relies on virtual function calls, which are slow!" True. But
I think the community knows how to deal with that, if one really needs to.
I recall a conversation with Bryce Adelstein Lelbach and (I think) Michal
Dominiak, several years ago, when the Thrust library was trying to do
allocators for the first time. Basically,
std::pmr::polymorphic_allocator<T> is a poor-man's-type-erased allocator
that can point to *any* memory_resource; you just need to implement a
monomorphic_allocator that can point to only *one* kind of memory_resource.
Then the user gets to choose which one to use — polymorphic or monomorphic.

Something like this:
https://godbolt.org/z/rax7Wv
Unfortunately I do see a virtual call still happening in the right-hand
pane, though; I'm not sure why Clang isn't devirtualizing it based on the
knowledge that FinalOf<std::pmr::monotonic_buffer_resource> is a final
class.

Note that my toy monomorphic_allocator<T,MR> is missing a lot of stuff that
the real one would have to deal with, such as a converting constructor and
POCMA/POCCA/POCS. I don't *think* that the diff in the godbolt above is
attributable to its lack of any of that stuff, though.

Anyway, you should definitely look into C++17 std::pmr; it's doing
basically the same kind of thing you're interested in.
I also highly recommend my own talk on the subject, "An Allocator is a
Handle to a Heap," if you haven't already seen it:
https://www.youtube.com/watch?v=0MdSJsCTRkY

–Arthur

Received on 2020-11-05 16:41:22