std-proposals: Re: Request for Opinion: Goal-based Memory Allocation

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Thu, 5 Nov 2020 23:53:30 -0500

On Thu, Nov 5, 2020 at 6:17 PM Scott Michaud via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> Arthur,
>
> Wow thanks for the code review!
>
> The goal when I was designing it was for per-thread temporaries in a thread pool job system. I envisioned it as a sort of expansion pack for a thread's stack. Elements that would persist longer than a blocking function call are not designed for that allocator, so I didn't consider moving from thread to thread.
>
> This tangentially hits the core point, though.
>
> My proposal was suggesting that memory allocation should be goal-centric.
>
> Would enumerating goals be too focused to the point that users wouldn't care?
> Do the actual goals in real-world memory allocation overlap too much?
> Would going down to that level of use case not provide enough benefits to matter?
>
> My thought is that this would be easy to teach and flexible to implement, especially if a vendor wants to try something way outside-the-box, like the "skipping main RAM and mapping directly to a cache" example.
>
> If we say "have a stack-based allocator" then the implementation's hands are tied to an algorithm. If we say "have an allocator that's fast for locals" then they can take it and run as far as they want.
>
> But again that's why I'm requesting feedback. I might be miles away from what typical stakeholders care about.
>
> Thanks again!

Your notion of "goals" rather than "algorithms" feels a bit like
`std::async`, but worse. Here's what I mean.

If you launch an `async` task with `launch::deferred`, you get a
specific algorithm: the task executes on the thread that gets the
future value. If you launch an `async` task with the `launch::async`
policy, you also get a specific algorithm: a new thread is created
(along with all side-effects thereof) in which the task will be
executed.

But if you specify both launch policies, then you don't get an
algorithm; you get a goal of sorts. That goal being "be asynchronous".
Maybe the implementation uses a thread-pool for these, or maybe it
just launches a whole new thread for each one. Or something inbetween;
it's all "quality of implementation".

But that's the problem: the implementation details actually *matter*.
If the implementation is creating individual threads for each task,
then you can't use `std::async` in a scenario where you're launching
thousands of short tasks every second.

Of course, there are users for whom these details just don't matter.
Maybe they're not launching thousands of tasks per second. Maybe they
just need to do some work off-thread and need a way to get that data
back, so they're not picky. Etc.

But even so, the variance of implementations of
`launch::async|deferred` `std::async` calls are never actually
advantageous to the user. Having real control over what's going on may
not be necessary for every user, but lacking said control is never
helpful either. The algorithms being employed matter,

Your suggestion is similar to this. You want to transition from a
model where a user specifies the mechanics of an allocator and moves
to one where it expresses the intent of the allocator's usage.

There's one problem though. In the `std::async` case, the "goal" form
of `async` was still useful to some users because they're not picky
about the performance characteristics of that form of task
parallelism. Optimization isn't important to those use cases, so
letting the implementation choose the algorithm is adequate.

But in your case, that never actually happens.

If I need to do some special memory management shenanigans, that is
almost certainly because of a performance issue. That is, I cannot
afford to heap-allocate new storage within the context where I'm doing
this. So from my perspective, it is *really important* that I know
what algorithm my allocator is using.

Or at least, what algorithm my allocator is *not* using.

The concept "local allocator" tells me nothing useful. If I don't know
that the allocator will never heap-allocate (after creation), then I
cannot use it where I can't allow heap allocation. You might be able
to put some kind of restriction on implementers, similar to how
standard algorithms have big-O requirements. But that just restricts
which algorithms can be used, to the point where there's basically
only one (significant) way to implement most algorithms.

If I'm in a performance-critical scenario, and there's one specific
algorithm that I know will solve that problem, why would I want a
"goal" instead of the algorithm that I know will solve my problem? And
if I'm not in a performance-critical scenario... why wouldn't I just
use the heap? Using non-`std::allocator`s is troublesome; I'd only go
through that trouble if I need to. And if I need to, then I *need* a
specific algorithm with known characteristics.

Received on 2020-11-05 22:53:46