sg14: Re: [SG14] Question re: colony and new SIMD-related function

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Fri, 2 Aug 2019 10:58:37 +0100

On 01/08/2019 22:50, Matt Bentley via SG14 wrote:
> Thanks Staffan & Niall,
> my understanding is that gather-scatter has greatly improved in AVX512

Sure, but I've not personally tested how well it works. It's much like
hardware memory transactions were billed as amazing, but in fact are
amazingly slow on Intel, and are not actually of much use at all in the
real world.

> In terms of my query, I guess the main point is:
> If SIMD processing of elements is worthwhile in a given use-case, do you
> think it's worth exposing colony internals so that the programmer can
> create a gather-mask by parallel-processing the internal skipfields,
> or do you think it's going to be just as well-performing for the
> programmer to construct the mask in serial via iterating over the colony
> as per usual?

I'll be blunt here.

I remain very unconvinced that any of the alt-containers which have
landed before WG21 in recent years are worth the committee time. I even
don't think Abseil's or Folly's containers are worth the committee time,
for the cost-benefits they supply.

I would observe that a lot of what committee folk do nowadays in their
day jobs is write custom containers for bespoke use cases - indeed, just
last week I wrote yet another open addressed hash table.

Why do we all keep reimplementing containers? Because if one is tightly
integrating data layout with the container, that yields a unique
container design each and every time. And by definition, if the STL
containers aren't good enough, that's because performance in this area
is important, and now it's worth rolling a bespoke localised solution.

Hence, for me personally, any container which looks like a STL container
isn't worth the committee time. I'd personally be much keener on
containers which look *very* different to STL containers. Like ones
which are coroutinised, and have no allocators, because all dynamic
memory allocation is implemented using commit-on-first-write page
faulting, and during that page fault service the coroutine suspends and
the CPU does other work until the TLB shootdown has completed.

Now *those* are containers worth standardising because they deliver
*orders of magnitude* gains over the conventional ones. Even your basic
vector<T> becomes oodles better during capacity expansion, because
instead of doing no work during capacity expansion, you are always doing
as much work as possible.

Anyway, that's my ha'pennies worth. And please don't be dissuaded from
your efforts and work, it's just my opinion about cost-benefit
tradeoffs. I am obviously quite biased by writing the kind of containers
I want all day, and being consistently sad that WG21 doesn't and
probably can't understand.

Niall

Received on 2019-08-02 05:00:40