Date: Sun, 28 Jun 2020 14:09:16 +1200
Hi all,
so,
on Jens' suggestion I split the sort technique out of plf::colony, and then
improved it so it was more efficient with it's use of memory.
That's been released now here:
https://plflib.org/indiesort.htm
And the benchmarks for it on various container types are here:
https://plflib.org/benchmarks_haswell_gcc.htm#sort
My question is whether it's worth pursuing this in a paper as an additional
algorithm for the standard library. I'd like your opinions as to why/why
not.
The basic benefits of the algorithm are:
* You get a sort to use with non-random-access containers/iterators, like
colony and std::list - there are currently no sort algorithms in the
standard library which sort non-random-access iterators - aside from
std::list's internal sort function
* Better sort performance for random-access containers/arrays/iterators and
types larger than 152 bytes (average +130% with vector and 496 byte
structs), vs std::sort.
* Better sort performance for std::list with types smaller than 272 bytes
(average +28%), vs it's own internal sort(). Although std::list would be
greatly improved to have a new function of it's own based on the same
concept, as that could just write next/previous pointers instead of
relocating elements.
The downsides are:
* increased temporary memory use: (N * (sizeof(T *) + sizeof(size_t)))
* decreased performance for random-access containers with types smaller
than 152 bytes. Both this and memory usage for random access ranges this
may be improved in a specialization for random-access iterators, but I
haven't implemented or tested that theory yet
* current implementation uses std::sort internally - I'm pretty sure the
algorithm could be improved somehow if the current gather and sort phases
(possibly scatter phase too) were combined in a single algorithm
At the moment the existing implementation could be improved by allowing the
user to specify an alternative sort algorithm without use of macros, but I
couldn't figure it out.
It could also be improved by being implemented as allocator-aware. That may
happen at some point.
Anyway.
Your thoughts?
M@
so,
on Jens' suggestion I split the sort technique out of plf::colony, and then
improved it so it was more efficient with it's use of memory.
That's been released now here:
https://plflib.org/indiesort.htm
And the benchmarks for it on various container types are here:
https://plflib.org/benchmarks_haswell_gcc.htm#sort
My question is whether it's worth pursuing this in a paper as an additional
algorithm for the standard library. I'd like your opinions as to why/why
not.
The basic benefits of the algorithm are:
* You get a sort to use with non-random-access containers/iterators, like
colony and std::list - there are currently no sort algorithms in the
standard library which sort non-random-access iterators - aside from
std::list's internal sort function
* Better sort performance for random-access containers/arrays/iterators and
types larger than 152 bytes (average +130% with vector and 496 byte
structs), vs std::sort.
* Better sort performance for std::list with types smaller than 272 bytes
(average +28%), vs it's own internal sort(). Although std::list would be
greatly improved to have a new function of it's own based on the same
concept, as that could just write next/previous pointers instead of
relocating elements.
The downsides are:
* increased temporary memory use: (N * (sizeof(T *) + sizeof(size_t)))
* decreased performance for random-access containers with types smaller
than 152 bytes. Both this and memory usage for random access ranges this
may be improved in a specialization for random-access iterators, but I
haven't implemented or tested that theory yet
* current implementation uses std::sort internally - I'm pretty sure the
algorithm could be improved somehow if the current gather and sort phases
(possibly scatter phase too) were combined in a single algorithm
At the moment the existing implementation could be improved by allowing the
user to specify an alternative sort algorithm without use of macros, but I
couldn't figure it out.
It could also be improved by being implemented as allocator-aware. That may
happen at some point.
Anyway.
Your thoughts?
M@
Received on 2020-06-27 21:13:06