Date: Thu, 1 Aug 2019 12:12:10 +0000
Hey Matt,
I'm dubious, but not staunchly against.
My anecdotal (no current measurements) reason being that I've seen naive code outperform SIMD code for short memory arenas (say < 10 cache lines), presumably due to the down-clocking effect of the later SIMD instruction sets.
-----Original Message-----
From: SG14 [mailto:sg14-bounces_at_[hidden]] On Behalf Of Matt Bentley via SG14
Sent: Thursday, 01 August, 2019 00:41
To: Low Latency:Game Dev/Financial/Trading/Simulation/Embedded Devices <sg14_at_[hidden]>
Cc: Matt Bentley <mattreecebentley_at_[hidden]>
Subject: [SG14] Question re: colony and new SIMD-related function
Hi all,
I made a modification to plf::colony which enables it to be used with SIMD gather techniques, using the skipfields as masks for processing.
I'd like some feedback on how useful this is, and whether it's worth keeping or throwing away.
The way it works in colony is that the skipfield is 0 when an item is unskipped, non-zero when it is skipped. Providing a developer direct access to the skipfield allows them to use SIMD to create a gather mask by transforming the skipfield in parallel to whatever the particular architecture requires eg. for AVX2 and above, this's a vector of integers where each element whose highest bit is one, will be read into SIMD registers.
The new function (with the underwhelming name of "get_raw_memory_block_pointers()"), returns a pointer to a dynamically-allocated struct of the following type:
struct raw_memory_block_pointers : private uchar_allocator_type {
// array of pointers to element memory blocks:
aligned_pointer_type *element_memory_block_pointers;
// array of pointers to skipfield memory blocks:
skipfield_pointer_type *skipfield_memory_block_pointers;
// array of the number of elements in each memory block:
skipfield_type *block_sizes;
// size of each array:
size_type number_of_blocks;
};
There is a destructor so all the sub-struct deallocations are taken care of upon 'delete' of the returned struct.
By using these data sets a programmer can create an appropriate mask for
each memory block by processing the skipfield into a separate
array/vector, then using that mask perform a Gather operation on the
element memory blocks to fetch active elements into SIMD registers. And
if desired, and if the processor supports it, a subsequent Scatter back
into the colony element memory blocks.
I am wondering how useful this is in comparison to manually creating a
mask by iterating over the colony conventionally. The latter approach
also ignores memory block boundaries, so might be more useful in that
respect, even if you can't use SIMD to create the mask in that case.
Most of you have more experience with SIMD than I do, so I defer to your
wisdom.
There's a demo of how to use the existing function in the test suite,
which I've just updated in the repo.
Thanks,
Matt
_______________________________________________
SG14 mailing list
SG14_at_[hidden]
http://lists.isocpp.org/mailman/listinfo.cgi/sg14
________________________________
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
I'm dubious, but not staunchly against.
My anecdotal (no current measurements) reason being that I've seen naive code outperform SIMD code for short memory arenas (say < 10 cache lines), presumably due to the down-clocking effect of the later SIMD instruction sets.
-----Original Message-----
From: SG14 [mailto:sg14-bounces_at_[hidden]] On Behalf Of Matt Bentley via SG14
Sent: Thursday, 01 August, 2019 00:41
To: Low Latency:Game Dev/Financial/Trading/Simulation/Embedded Devices <sg14_at_[hidden]>
Cc: Matt Bentley <mattreecebentley_at_[hidden]>
Subject: [SG14] Question re: colony and new SIMD-related function
Hi all,
I made a modification to plf::colony which enables it to be used with SIMD gather techniques, using the skipfields as masks for processing.
I'd like some feedback on how useful this is, and whether it's worth keeping or throwing away.
The way it works in colony is that the skipfield is 0 when an item is unskipped, non-zero when it is skipped. Providing a developer direct access to the skipfield allows them to use SIMD to create a gather mask by transforming the skipfield in parallel to whatever the particular architecture requires eg. for AVX2 and above, this's a vector of integers where each element whose highest bit is one, will be read into SIMD registers.
The new function (with the underwhelming name of "get_raw_memory_block_pointers()"), returns a pointer to a dynamically-allocated struct of the following type:
struct raw_memory_block_pointers : private uchar_allocator_type {
// array of pointers to element memory blocks:
aligned_pointer_type *element_memory_block_pointers;
// array of pointers to skipfield memory blocks:
skipfield_pointer_type *skipfield_memory_block_pointers;
// array of the number of elements in each memory block:
skipfield_type *block_sizes;
// size of each array:
size_type number_of_blocks;
};
There is a destructor so all the sub-struct deallocations are taken care of upon 'delete' of the returned struct.
By using these data sets a programmer can create an appropriate mask for
each memory block by processing the skipfield into a separate
array/vector, then using that mask perform a Gather operation on the
element memory blocks to fetch active elements into SIMD registers. And
if desired, and if the processor supports it, a subsequent Scatter back
into the colony element memory blocks.
I am wondering how useful this is in comparison to manually creating a
mask by iterating over the colony conventionally. The latter approach
also ignores memory block boundaries, so might be more useful in that
respect, even if you can't use SIMD to create the mask in that case.
Most of you have more experience with SIMD than I do, so I defer to your
wisdom.
There's a demo of how to use the existing function in the test suite,
which I've just updated in the repo.
Thanks,
Matt
_______________________________________________
SG14 mailing list
SG14_at_[hidden]
http://lists.isocpp.org/mailman/listinfo.cgi/sg14
________________________________
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
Received on 2019-08-01 07:14:09