C++ Logo

sg14

Advanced search

[SG14] Question re: colony and new SIMD-related function

From: Matt Bentley <mattreecebentley_at_[hidden]>
Date: Thu, 1 Aug 2019 16:41:19 +1200
Hi all,
I made a modification to plf::colony which enables it to be used with
SIMD gather techniques, using the skipfields as masks for processing.
I'd like some feedback on how useful this is, and whether it's worth
keeping or throwing away.

The way it works in colony is that the skipfield is 0 when an item is
unskipped, non-zero when it is skipped. Providing a developer direct
access to the skipfield allows them to use SIMD to create a gather mask
by transforming the skipfield in parallel to whatever the particular
architecture requires eg. for AVX2 and above, this's a vector of
integers where each element whose highest bit is one, will be read into
SIMD registers.

The new function (with the underwhelming name of
"get_raw_memory_block_pointers()"), returns a pointer to a
dynamically-allocated struct of the following type:

struct raw_memory_block_pointers : private uchar_allocator_type
{
 // array of pointers to element memory blocks:
 aligned_pointer_type *element_memory_block_pointers;
 
 // array of pointers to skipfield memory blocks:
 skipfield_pointer_type *skipfield_memory_block_pointers;
 
 // array of the number of elements in each memory block:
 skipfield_type *block_sizes;
 
 // size of each array:
 size_type number_of_blocks;
};

There is a destructor so all the sub-struct deallocations are taken care
of upon 'delete' of the returned struct.


By using these data sets a programmer can create an appropriate mask for
each memory block by processing the skipfield into a separate
array/vector, then using that mask perform a Gather operation on the
element memory blocks to fetch active elements into SIMD registers. And
if desired, and if the processor supports it, a subsequent Scatter back
into the colony element memory blocks.


I am wondering how useful this is in comparison to manually creating a
mask by iterating over the colony conventionally. The latter approach
also ignores memory block boundaries, so might be more useful in that
respect, even if you can't use SIMD to create the mask in that case.

Most of you have more experience with SIMD than I do, so I defer to your
wisdom.

There's a demo of how to use the existing function in the test suite,
which I've just updated in the repo.
Thanks,
Matt

Received on 2019-07-31 23:43:22