std-proposals: Re: Cache Alloc - 1.3x to 13.4x faster

From: Phil Bouchard <boost_at_[hidden]>
Date: Tue, 27 Jul 2021 12:26:33 -0400

On 7/27/21 11:53 AM, Jason McKesson via Std-Proposals wrote:
> On Tue, Jul 27, 2021 at 11:21 AM Phil Bouchard via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
>> So did you guys have a chance to try it out? Any questions?
> My main question is this. If I care about performance, and I'm in some
> hot loop code where I need a stack/queue, and that stack/queue needs
> to be so large that a stack buffer or fixed-size globally allocated
> buffer won't do (a case with so many qualifiers that makes this a
> rarified circumstance)... why would I *ever* use `std::list` for this?
> `std::list` is almost always the *wrong* type if performance matters
> to you.

If you care about performance is the main reason people use C++ in the
first place so C++ should offer the more efficient algorithms available.

Sorry if my questions sound stupid but I don't understand your dislike
of std::list. Yes I can use a more efficient chained buffer but the
complexity remain the same.

>
> Consider this case. Each "page" allocation for a simple
> `std::list<int>` will be *much* larger than it needs to be. That
> substantially damages the cache locality of accessing the elements.

By cache locality are you referring to the bus cache?

> This is more important for a queue than a stack, but it still matters
> when you're actually *doing something* in the hot loop.
>
> `std::list` is good for maybe one or two use cases, all of which are
> primarily about insertion/removal from the *middle*, not from the
> ends.
>
> A dedicated stack or queue object that knows what it's doing will beat
> it every single time.
>
> Why is an allocator-based solution to this problem (which has only
> nebulously been specified, BTW) a good idea? That's the question you
> haven't answered as of yet.

What the allocator does is to gather allocations of the same type
together so heap objects are not scattered around the memory therefore
fragmenting the memory.

This generic allocator can be used for overlayed lists, stacks, queues,
maps, multimaps, unordered map, etc of the same type and frequency usage.

Yes it does add a header to each element allocated but sometimes speed
is what is required to the expense of more memory chips.

My main goal is to prove that this compile-time allocator is more
efficient than these LD_PRELOAD-style allocators out there.

-- 
*Phil Bouchard*
Founder & CTO
C.: (819) 328-4743
Fornux Logo <http://www.fornux.com>

Received on 2021-07-27 11:26:37