It was referring to points #3 and #5 in the first email:

Going back to the LIFO, per-thread example:

Some vendors might want to use the double-stack allocator.

Some vendors might want to use a single-stack allocator.

==> Some vendors might want to bypass RAM altogether for the first page(s) of each thread.

Why not let vendors set aside some asymmetric memory per hardware thread?

Some vendors may just use std::allocator (and that’s fine).

==> Embedded device vendors might have a different idea altogether.

I was mostly thinking of weird vendors (like embedded, maybe a video game console, or maybe a cloud / HPC platform) at first. Even then, it was a kind-of low probability thing... more of a "if we can hit the balance JUST RIGHT then we can get a bunch of software using certain allocators in very predictable ways", which would allow the vendors to do optimizations, even if they only occur 80% of the time. It can happen with any old allocator, like when Intel decided to use their Iris Pro GPU's embedded DRAM as an L4 cache for general compute a few years ago; it gave speed improvements, but through blind luck. What if you could (haphazardly or otherwise) give them a hint?

Of course, that would take the goal-based allocators being worded just right that it's focused enough to be usable but not focused enough to be limiting... and that's still up in the air whether that balance can even happen. Even what those goals would be is up in the air at the moment.

It will take a lot of discussion from both vendors and programmers. I have no idea. I just know that I follow specific patterns when I program stuff, and that opens up room to optimize.

It would also take C++ allocators getting very easy to use... which is a whole other topic, but one that's already going on.

On 11/6/2020 8:07 AM, Matthew Woehlke wrote:

On 05/11/2020 18.17, Scott Michaud via Std-Proposals wrote:

[...] especially if a vendor wants to try something way outside-the-box, like the "skipping main RAM and mapping directly to
a cache" example.

I don't think that's a viable thing to do, at least outside of very niche usages like real-time programming. In a modern, general-purpose system, task switching (or worse, your process being suspended or the machine entering sleep) needs to be able to push that memory all the way down to disk.

If you just mean the allocation doesn't hit RAM until and unless it actually needs to, I don't see why that can't happen today with any old allocator.