Date: Thu, 5 Nov 2020 18:17:21 -0500
Arthur,
Wow thanks for the code review!
The goal when I was designing it was for per-thread temporaries in a
thread pool job system. I envisioned it as a sort of expansion pack for
a thread's stack. Elements that would persist longer than a blocking
function call are not designed for that allocator, so I didn't consider
moving from thread to thread.
This tangentially hits the core point, though.
My proposal was suggesting that memory allocation should be goal-centric.
* Would enumerating goals be too focused to the point that users
wouldn't care?
* Do the actual goals in real-world memory allocation overlap too much?
* Would going down to that level of use case not provide enough
benefits to matter?
My thought is that this would be easy to teach and flexible to
implement, especially if a vendor wants to try something way
outside-the-box, like the "skipping main RAM and mapping directly to a
cache" example.
If we say "have a stack-based allocator" then the implementation's hands
are tied to an algorithm. If we say "have an allocator that's fast for
locals" then they can take it and run as far as they want.
But again that's why I'm requesting feedback. I might be miles away from
what typical stakeholders care about.
Thanks again!
On 11/5/2020 5:41 PM, Arthur O'Dwyer wrote:
> On Thu, Nov 5, 2020 at 9:35 AM Scott Michaud via Std-Proposals
> <std-proposals_at_[hidden]
> <mailto:std-proposals_at_[hidden]>> wrote:
>
>
> My algorithm uses two stacks, one at the bottom of the page
> allocating up, and one at the top of the page allocating down. [...]
>
> Godbolt: https://godbolt.org/z/znq1j3
> GitHub: https://github.com/ScottMichaud/MiscCpp_DoubleStackAllocator
>
>
> My comment on the benchmark:
> It looks to me as if you have another constraint on your allocation
> pattern. Memory allocated in thread A cannot be deallocated in thread
> B. I can't do, like,
> std::vector<int, DoubleStackAllocator<int>> v = {1,2,3};
> std::thread t = std::thread([v = std::move(v)]() {
> // do nothing
> });
> t.join();
> Is my understanding accurate? If so, I think you should consider
> whether you're really doing an apples-to-apples comparison in your
> benchmark there.
>
> But anyway, I think the benchmark is mostly a red herring. Everyone
> should agree that sometimes allocator A will outperform allocator B,
> and sometimes vice versa, depending on the workload. We don't /really/
> need benchmarks to prove that, IMHO.
>
> I recommend that you take your DoubleStackAllocator<T> and re-code it
> as a class derived from C++17's std::pmr::memory_resource, so that
> this code works—
>
> DoubleStackMemoryResource mr;
> std::pmr::vector<int> v({1, 2, 3}, &mr);
>
> Notice that now it becomes obvious that `mr` needs somewhere to live,
> and it needs some way to explain where it's getting its initial chunk
> of memory from.
> In your current code, you kind of handwave that dependency by hiding
> it in a global variable (namely, that `thread_local
> std::vector<DoubleStack>` inside a static member function of
> DoubleStackAllocatorImpl). Pulling it out into the sunlight will
> improve the design.
>
> "But std::pmr relies on virtual function calls, which are slow!" True.
> But I think the community knows how to deal with that, if one really
> needs to. I recall a conversation with Bryce Adelstein Lelbach and (I
> think) Michal Dominiak, several years ago, when the Thrust library was
> trying to do allocators for the first time. Basically,
> std::pmr::polymorphic_allocator<T> is a poor-man's-type-erased
> allocator that can point to */any/* memory_resource; you just need to
> implement a monomorphic_allocator that can point to only /*one*/ kind
> of memory_resource. Then the user gets to choose which one to use —
> polymorphic or monomorphic.
>
> Something like this:
> https://godbolt.org/z/rax7Wv
> Unfortunately I do see a virtual call still happening in the
> right-hand pane, though; I'm not sure why Clang isn't devirtualizing
> it based on the knowledge that
> FinalOf<std::pmr::monotonic_buffer_resource> is a final class.
>
> Note that my toy monomorphic_allocator<T,MR> is missing a lot of stuff
> that the real one would have to deal with, such as a converting
> constructor and POCMA/POCCA/POCS. I don't /think/ that the diff in the
> godbolt above is attributable to its lack of any of that stuff, though.
>
> Anyway, you should definitely look into C++17 std::pmr; it's doing
> basically the same kind of thing you're interested in.
> I also highly recommend my own talk on the subject, "An Allocator is a
> Handle to a Heap," if you haven't already seen it:
> https://www.youtube.com/watch?v=0MdSJsCTRkY
>
> –Arthur
Wow thanks for the code review!
The goal when I was designing it was for per-thread temporaries in a
thread pool job system. I envisioned it as a sort of expansion pack for
a thread's stack. Elements that would persist longer than a blocking
function call are not designed for that allocator, so I didn't consider
moving from thread to thread.
This tangentially hits the core point, though.
My proposal was suggesting that memory allocation should be goal-centric.
* Would enumerating goals be too focused to the point that users
wouldn't care?
* Do the actual goals in real-world memory allocation overlap too much?
* Would going down to that level of use case not provide enough
benefits to matter?
My thought is that this would be easy to teach and flexible to
implement, especially if a vendor wants to try something way
outside-the-box, like the "skipping main RAM and mapping directly to a
cache" example.
If we say "have a stack-based allocator" then the implementation's hands
are tied to an algorithm. If we say "have an allocator that's fast for
locals" then they can take it and run as far as they want.
But again that's why I'm requesting feedback. I might be miles away from
what typical stakeholders care about.
Thanks again!
On 11/5/2020 5:41 PM, Arthur O'Dwyer wrote:
> On Thu, Nov 5, 2020 at 9:35 AM Scott Michaud via Std-Proposals
> <std-proposals_at_[hidden]
> <mailto:std-proposals_at_[hidden]>> wrote:
>
>
> My algorithm uses two stacks, one at the bottom of the page
> allocating up, and one at the top of the page allocating down. [...]
>
> Godbolt: https://godbolt.org/z/znq1j3
> GitHub: https://github.com/ScottMichaud/MiscCpp_DoubleStackAllocator
>
>
> My comment on the benchmark:
> It looks to me as if you have another constraint on your allocation
> pattern. Memory allocated in thread A cannot be deallocated in thread
> B. I can't do, like,
> std::vector<int, DoubleStackAllocator<int>> v = {1,2,3};
> std::thread t = std::thread([v = std::move(v)]() {
> // do nothing
> });
> t.join();
> Is my understanding accurate? If so, I think you should consider
> whether you're really doing an apples-to-apples comparison in your
> benchmark there.
>
> But anyway, I think the benchmark is mostly a red herring. Everyone
> should agree that sometimes allocator A will outperform allocator B,
> and sometimes vice versa, depending on the workload. We don't /really/
> need benchmarks to prove that, IMHO.
>
> I recommend that you take your DoubleStackAllocator<T> and re-code it
> as a class derived from C++17's std::pmr::memory_resource, so that
> this code works—
>
> DoubleStackMemoryResource mr;
> std::pmr::vector<int> v({1, 2, 3}, &mr);
>
> Notice that now it becomes obvious that `mr` needs somewhere to live,
> and it needs some way to explain where it's getting its initial chunk
> of memory from.
> In your current code, you kind of handwave that dependency by hiding
> it in a global variable (namely, that `thread_local
> std::vector<DoubleStack>` inside a static member function of
> DoubleStackAllocatorImpl). Pulling it out into the sunlight will
> improve the design.
>
> "But std::pmr relies on virtual function calls, which are slow!" True.
> But I think the community knows how to deal with that, if one really
> needs to. I recall a conversation with Bryce Adelstein Lelbach and (I
> think) Michal Dominiak, several years ago, when the Thrust library was
> trying to do allocators for the first time. Basically,
> std::pmr::polymorphic_allocator<T> is a poor-man's-type-erased
> allocator that can point to */any/* memory_resource; you just need to
> implement a monomorphic_allocator that can point to only /*one*/ kind
> of memory_resource. Then the user gets to choose which one to use —
> polymorphic or monomorphic.
>
> Something like this:
> https://godbolt.org/z/rax7Wv
> Unfortunately I do see a virtual call still happening in the
> right-hand pane, though; I'm not sure why Clang isn't devirtualizing
> it based on the knowledge that
> FinalOf<std::pmr::monotonic_buffer_resource> is a final class.
>
> Note that my toy monomorphic_allocator<T,MR> is missing a lot of stuff
> that the real one would have to deal with, such as a converting
> constructor and POCMA/POCCA/POCS. I don't /think/ that the diff in the
> godbolt above is attributable to its lack of any of that stuff, though.
>
> Anyway, you should definitely look into C++17 std::pmr; it's doing
> basically the same kind of thing you're interested in.
> I also highly recommend my own talk on the subject, "An Allocator is a
> Handle to a Heap," if you haven't already seen it:
> https://www.youtube.com/watch?v=0MdSJsCTRkY
>
> –Arthur
Received on 2020-11-05 17:17:28