Date: Fri, 6 Nov 2020 22:42:05 -0500
Jason,
Thanks for the feedback! Sorry for the day delay. It took a little time
to write a response.
First, if the standard wants to define specific algorithms -- that's
fine. Some use cases may be specific enough such that it needs to be
defined down to the algorithm. I do not want to suggest something that
takes away something good from stakeholders.
That said, the argument against goal-based allocators is that some users
exist who wouldn't use anything but std::allocator, and some users exist
who want to know exactly what the implementation is doing. However, it
does not say that no users exist between those two extremes (let alone
how many) and it does not say that well-described goals cannot exist
that serve the middle group, as well maybe even some of the two extremes
in some situations. (Even if they made the best-performing allocators
for their main problem... maybe they'll use a debug allocator once or
twice? Maybe they'll use one of our allocators when they encounter
performance issues with secondary problems? etc.) The std::async example
is one that shows poorly-described goals can exist, and that's fair --
but that's possible any time you define a specification. I'm more
curious how much value can be derived from goal-based allocators, and
what market will it serve.
* How big is that group?
* How easily can they be served?
* What are their goals?
* How much effort will it take (in development and education) before
new behaviour becomes commonplace -- maybe even a default?
* What do they want that behaviour to even be?
* What do platforms want that behaviour to be?
* Is there any long-term benefits to platforms?
* Is there any long-term benefits to the language?
And yes, if we define use cases terribly, then this could be awful...
and it's not entirely clear what awful is without thought.
* What is being too vague?
* What is being too explicit?
* What aligns with other initiatives (such as performance, tooling, etc.)?
* What needs to happen before it's a pain?
I am curious to see the range in what platforms want versus the range in
what developers want. I know from myself that I was frustrated with
being stuck with std::allocator for cases that I knew very well could be
done better based on how the data was being accessed. I didn't want to
import a new third-party dependency... and I didn't want to go through
making it myself.
However, I did end up going through making it myself, although it will
still need a bit of testing and performance tuning before I trust it
enough to put it in production somewhere. That's not an effort that most
users should need to go through. It's also holding benefits hostage from
those who could very much use it.
But, before we go about solving the problems, we need to define what
they are... which means we need to know what goals actual users and
platforms have.
Thanks again for the thoughtful remarks!
On 11/5/2020 11:53 PM, Jason McKesson via Std-Proposals wrote:
> On Thu, Nov 5, 2020 at 6:17 PM Scott Michaud via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
>> Arthur,
>>
>> Wow thanks for the code review!
>>
>> The goal when I was designing it was for per-thread temporaries in a thread pool job system. I envisioned it as a sort of expansion pack for a thread's stack. Elements that would persist longer than a blocking function call are not designed for that allocator, so I didn't consider moving from thread to thread.
>>
>> This tangentially hits the core point, though.
>>
>> My proposal was suggesting that memory allocation should be goal-centric.
>>
>> Would enumerating goals be too focused to the point that users wouldn't care?
>> Do the actual goals in real-world memory allocation overlap too much?
>> Would going down to that level of use case not provide enough benefits to matter?
>>
>> My thought is that this would be easy to teach and flexible to implement, especially if a vendor wants to try something way outside-the-box, like the "skipping main RAM and mapping directly to a cache" example.
>>
>> If we say "have a stack-based allocator" then the implementation's hands are tied to an algorithm. If we say "have an allocator that's fast for locals" then they can take it and run as far as they want.
>>
>> But again that's why I'm requesting feedback. I might be miles away from what typical stakeholders care about.
>>
>> Thanks again!
> Your notion of "goals" rather than "algorithms" feels a bit like
> `std::async`, but worse. Here's what I mean.
>
> If you launch an `async` task with `launch::deferred`, you get a
> specific algorithm: the task executes on the thread that gets the
> future value. If you launch an `async` task with the `launch::async`
> policy, you also get a specific algorithm: a new thread is created
> (along with all side-effects thereof) in which the task will be
> executed.
>
> But if you specify both launch policies, then you don't get an
> algorithm; you get a goal of sorts. That goal being "be asynchronous".
> Maybe the implementation uses a thread-pool for these, or maybe it
> just launches a whole new thread for each one. Or something inbetween;
> it's all "quality of implementation".
>
> But that's the problem: the implementation details actually *matter*.
> If the implementation is creating individual threads for each task,
> then you can't use `std::async` in a scenario where you're launching
> thousands of short tasks every second.
>
> Of course, there are users for whom these details just don't matter.
> Maybe they're not launching thousands of tasks per second. Maybe they
> just need to do some work off-thread and need a way to get that data
> back, so they're not picky. Etc.
>
> But even so, the variance of implementations of
> `launch::async|deferred` `std::async` calls are never actually
> advantageous to the user. Having real control over what's going on may
> not be necessary for every user, but lacking said control is never
> helpful either. The algorithms being employed matter,
>
> Your suggestion is similar to this. You want to transition from a
> model where a user specifies the mechanics of an allocator and moves
> to one where it expresses the intent of the allocator's usage.
>
> There's one problem though. In the `std::async` case, the "goal" form
> of `async` was still useful to some users because they're not picky
> about the performance characteristics of that form of task
> parallelism. Optimization isn't important to those use cases, so
> letting the implementation choose the algorithm is adequate.
>
> But in your case, that never actually happens.
>
> If I need to do some special memory management shenanigans, that is
> almost certainly because of a performance issue. That is, I cannot
> afford to heap-allocate new storage within the context where I'm doing
> this. So from my perspective, it is *really important* that I know
> what algorithm my allocator is using.
>
> Or at least, what algorithm my allocator is *not* using.
>
> The concept "local allocator" tells me nothing useful. If I don't know
> that the allocator will never heap-allocate (after creation), then I
> cannot use it where I can't allow heap allocation. You might be able
> to put some kind of restriction on implementers, similar to how
> standard algorithms have big-O requirements. But that just restricts
> which algorithms can be used, to the point where there's basically
> only one (significant) way to implement most algorithms.
>
> If I'm in a performance-critical scenario, and there's one specific
> algorithm that I know will solve that problem, why would I want a
> "goal" instead of the algorithm that I know will solve my problem? And
> if I'm not in a performance-critical scenario... why wouldn't I just
> use the heap? Using non-`std::allocator`s is troublesome; I'd only go
> through that trouble if I need to. And if I need to, then I *need* a
> specific algorithm with known characteristics.
Thanks for the feedback! Sorry for the day delay. It took a little time
to write a response.
First, if the standard wants to define specific algorithms -- that's
fine. Some use cases may be specific enough such that it needs to be
defined down to the algorithm. I do not want to suggest something that
takes away something good from stakeholders.
That said, the argument against goal-based allocators is that some users
exist who wouldn't use anything but std::allocator, and some users exist
who want to know exactly what the implementation is doing. However, it
does not say that no users exist between those two extremes (let alone
how many) and it does not say that well-described goals cannot exist
that serve the middle group, as well maybe even some of the two extremes
in some situations. (Even if they made the best-performing allocators
for their main problem... maybe they'll use a debug allocator once or
twice? Maybe they'll use one of our allocators when they encounter
performance issues with secondary problems? etc.) The std::async example
is one that shows poorly-described goals can exist, and that's fair --
but that's possible any time you define a specification. I'm more
curious how much value can be derived from goal-based allocators, and
what market will it serve.
* How big is that group?
* How easily can they be served?
* What are their goals?
* How much effort will it take (in development and education) before
new behaviour becomes commonplace -- maybe even a default?
* What do they want that behaviour to even be?
* What do platforms want that behaviour to be?
* Is there any long-term benefits to platforms?
* Is there any long-term benefits to the language?
And yes, if we define use cases terribly, then this could be awful...
and it's not entirely clear what awful is without thought.
* What is being too vague?
* What is being too explicit?
* What aligns with other initiatives (such as performance, tooling, etc.)?
* What needs to happen before it's a pain?
I am curious to see the range in what platforms want versus the range in
what developers want. I know from myself that I was frustrated with
being stuck with std::allocator for cases that I knew very well could be
done better based on how the data was being accessed. I didn't want to
import a new third-party dependency... and I didn't want to go through
making it myself.
However, I did end up going through making it myself, although it will
still need a bit of testing and performance tuning before I trust it
enough to put it in production somewhere. That's not an effort that most
users should need to go through. It's also holding benefits hostage from
those who could very much use it.
But, before we go about solving the problems, we need to define what
they are... which means we need to know what goals actual users and
platforms have.
Thanks again for the thoughtful remarks!
On 11/5/2020 11:53 PM, Jason McKesson via Std-Proposals wrote:
> On Thu, Nov 5, 2020 at 6:17 PM Scott Michaud via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
>> Arthur,
>>
>> Wow thanks for the code review!
>>
>> The goal when I was designing it was for per-thread temporaries in a thread pool job system. I envisioned it as a sort of expansion pack for a thread's stack. Elements that would persist longer than a blocking function call are not designed for that allocator, so I didn't consider moving from thread to thread.
>>
>> This tangentially hits the core point, though.
>>
>> My proposal was suggesting that memory allocation should be goal-centric.
>>
>> Would enumerating goals be too focused to the point that users wouldn't care?
>> Do the actual goals in real-world memory allocation overlap too much?
>> Would going down to that level of use case not provide enough benefits to matter?
>>
>> My thought is that this would be easy to teach and flexible to implement, especially if a vendor wants to try something way outside-the-box, like the "skipping main RAM and mapping directly to a cache" example.
>>
>> If we say "have a stack-based allocator" then the implementation's hands are tied to an algorithm. If we say "have an allocator that's fast for locals" then they can take it and run as far as they want.
>>
>> But again that's why I'm requesting feedback. I might be miles away from what typical stakeholders care about.
>>
>> Thanks again!
> Your notion of "goals" rather than "algorithms" feels a bit like
> `std::async`, but worse. Here's what I mean.
>
> If you launch an `async` task with `launch::deferred`, you get a
> specific algorithm: the task executes on the thread that gets the
> future value. If you launch an `async` task with the `launch::async`
> policy, you also get a specific algorithm: a new thread is created
> (along with all side-effects thereof) in which the task will be
> executed.
>
> But if you specify both launch policies, then you don't get an
> algorithm; you get a goal of sorts. That goal being "be asynchronous".
> Maybe the implementation uses a thread-pool for these, or maybe it
> just launches a whole new thread for each one. Or something inbetween;
> it's all "quality of implementation".
>
> But that's the problem: the implementation details actually *matter*.
> If the implementation is creating individual threads for each task,
> then you can't use `std::async` in a scenario where you're launching
> thousands of short tasks every second.
>
> Of course, there are users for whom these details just don't matter.
> Maybe they're not launching thousands of tasks per second. Maybe they
> just need to do some work off-thread and need a way to get that data
> back, so they're not picky. Etc.
>
> But even so, the variance of implementations of
> `launch::async|deferred` `std::async` calls are never actually
> advantageous to the user. Having real control over what's going on may
> not be necessary for every user, but lacking said control is never
> helpful either. The algorithms being employed matter,
>
> Your suggestion is similar to this. You want to transition from a
> model where a user specifies the mechanics of an allocator and moves
> to one where it expresses the intent of the allocator's usage.
>
> There's one problem though. In the `std::async` case, the "goal" form
> of `async` was still useful to some users because they're not picky
> about the performance characteristics of that form of task
> parallelism. Optimization isn't important to those use cases, so
> letting the implementation choose the algorithm is adequate.
>
> But in your case, that never actually happens.
>
> If I need to do some special memory management shenanigans, that is
> almost certainly because of a performance issue. That is, I cannot
> afford to heap-allocate new storage within the context where I'm doing
> this. So from my perspective, it is *really important* that I know
> what algorithm my allocator is using.
>
> Or at least, what algorithm my allocator is *not* using.
>
> The concept "local allocator" tells me nothing useful. If I don't know
> that the allocator will never heap-allocate (after creation), then I
> cannot use it where I can't allow heap allocation. You might be able
> to put some kind of restriction on implementers, similar to how
> standard algorithms have big-O requirements. But that just restricts
> which algorithms can be used, to the point where there's basically
> only one (significant) way to implement most algorithms.
>
> If I'm in a performance-critical scenario, and there's one specific
> algorithm that I know will solve that problem, why would I want a
> "goal" instead of the algorithm that I know will solve my problem? And
> if I'm not in a performance-critical scenario... why wouldn't I just
> use the heap? Using non-`std::allocator`s is troublesome; I'd only go
> through that trouble if I need to. And if I need to, then I *need* a
> specific algorithm with known characteristics.
Received on 2020-11-06 21:42:07