Date: Tue, 4 Aug 2020 11:09:15 +1200
Just an update regarding the use or presence of reserve() in colony,
as discussed in the june meeting, reserve could be useful in the sense of
(a) moving allocation cost to non-hot-loop areas of code and
(b) preventing the establishment of many smaller-sized element memory
blocks during singular insertions, if the approximate number of overall
insertions is known in advance. This in turn increases cache locality for
elements.
The current reserve function only allows for the creation of one memory
block, which is of course not suitable for the end specification,
as the size of memory blocks is limited by various factors and a reserve
function which does not necessarily reserve the capacity requested is not a
good fit for the C++ standard.
As mentioned in the meeting my previous tests of creating a version which
allowed for multiple empty memory blocks "in storage" created untenable
performance detriments, to the tune of 2% on average.
In addition such a function necessitates another function to free unused
memory blocks without reallocating elements (as opposed to shrink_to_fit
which reallocates).
Anyway, I thought I'd try again and see whether I could come up with a
better solution this time without any performance advantages.
I didn't look at the old code, so as to obviate any chance of me recreating
the same possibly-slower-performing pattern.
Unfortunately the same results were found - specifically, 2% slower on
average for both my general use and 'referencer' (game-scenario) benchmarks.
But most importantly, in the key areas where colony overtakes all other
containers/container-adaptations in terms of performance (general use with
high ratios of insert/erase to iteration), colony became 10-11% slower. I
believe this was also the case the last time I tried this. This is
untenable and reserve will be removed from future versions of the
specification.
I tried many different approaches, comparing their results but no amount of
optimization would shift this figure significantly.
I still have a few more things to implement and test before I release the
next version of the specification, but it shouldn't be too far off.
In terms of the useful aspects of reserve as mentioned above, these can be
replicated by:
(a) using allocators with colony to move the allocation costs away from any
hot loops
(b) changing the minimum element memory block capacity sizes before
insertion to guarantee the capacity of the first block allocated.
While reserve would've been useful, I believe it's cost in this case
outweighs its benefits.
In case anyone's interested, both the highest-performing version of the
with-reserve() colony.h, and the spreadsheet of the general use performance
results are attached. The implementation lacks re-use of memory blocks in
the range-insert/fill-insert/initializer-list-insert/splice functions, and
probably a few other functions that weren't critical for benchmarking, so
is not usable in production.
Thanks-
Matt
as discussed in the june meeting, reserve could be useful in the sense of
(a) moving allocation cost to non-hot-loop areas of code and
(b) preventing the establishment of many smaller-sized element memory
blocks during singular insertions, if the approximate number of overall
insertions is known in advance. This in turn increases cache locality for
elements.
The current reserve function only allows for the creation of one memory
block, which is of course not suitable for the end specification,
as the size of memory blocks is limited by various factors and a reserve
function which does not necessarily reserve the capacity requested is not a
good fit for the C++ standard.
As mentioned in the meeting my previous tests of creating a version which
allowed for multiple empty memory blocks "in storage" created untenable
performance detriments, to the tune of 2% on average.
In addition such a function necessitates another function to free unused
memory blocks without reallocating elements (as opposed to shrink_to_fit
which reallocates).
Anyway, I thought I'd try again and see whether I could come up with a
better solution this time without any performance advantages.
I didn't look at the old code, so as to obviate any chance of me recreating
the same possibly-slower-performing pattern.
Unfortunately the same results were found - specifically, 2% slower on
average for both my general use and 'referencer' (game-scenario) benchmarks.
But most importantly, in the key areas where colony overtakes all other
containers/container-adaptations in terms of performance (general use with
high ratios of insert/erase to iteration), colony became 10-11% slower. I
believe this was also the case the last time I tried this. This is
untenable and reserve will be removed from future versions of the
specification.
I tried many different approaches, comparing their results but no amount of
optimization would shift this figure significantly.
I still have a few more things to implement and test before I release the
next version of the specification, but it shouldn't be too far off.
In terms of the useful aspects of reserve as mentioned above, these can be
replicated by:
(a) using allocators with colony to move the allocation costs away from any
hot loops
(b) changing the minimum element memory block capacity sizes before
insertion to guarantee the capacity of the first block allocated.
While reserve would've been useful, I believe it's cost in this case
outweighs its benefits.
In case anyone's interested, both the highest-performing version of the
with-reserve() colony.h, and the spreadsheet of the general use performance
results are attached. The implementation lacks re-use of memory blocks in
the range-insert/fill-insert/initializer-list-insert/splice functions, and
probably a few other functions that weren't critical for benchmarking, so
is not usable in production.
Thanks-
Matt
Received on 2020-08-03 18:13:17