Subject: Re: Question re: colony and new SIMD-related function
From: Matt Bentley (mattreecebentley_at_[hidden])
Date: 2019-08-01 16:50:02
Thanks Staffan & Niall,
my understanding is that gather-scatter has greatly improved in AVX512
skylake and onwards, see the following links for details:
but I take your point that Actual benefits may be minuscule depending on
In terms of my query, I guess the main point is:
If SIMD processing of elements is worthwhile in a given use-case, do you
think it's worth exposing colony internals so that the programmer can
create a gather-mask by parallel-processing the internal skipfields,
or do you think it's going to be just as well-performing for the
programmer to construct the mask in serial via iterating over the colony
as per usual?
On 2/08/2019 12:20 AM, Niall Douglas via SG14 wrote:
> On 01/08/2019 13:12, Tjernstrom, Staffan via SG14 wrote:
>> Hey Matt,
>> I'm dubious, but not staunchly against.
>> My anecdotal (no current measurements) reason being that I've seen naive code outperform SIMD code for short memory arenas (say < 10 cache lines), presumably due to the down-clocking effect of the later SIMD instruction sets.
> Also, Haswell and later have become remarkably good at rewriting short
> runs of scalar code to perform exactly as if they were written using
> SIMD. About 200-300 instructions, I've generally found. If you're
> looking at a loop of less than 200 instructions, and you can guarantee a
> newer CPU, chances are high that a SIMD rewrite will confer negligible
> Where scatter-gather SIMD really shines is on ARM NEON, but NEON has
> actually useful scatter-gather. One can often avoid whole memory copies
> using their support. I wish Intel would do the same.
> (Some tell me AVX-512 does now have this, but I haven't investigated)
> SG14 mailing list
SG14 list run by firstname.lastname@example.org
Older Archives on Google Groups