ISOCPP std-proposals List: [std-proposals] Generalizing radix sort

From: Adrian Johnston <adrian3_at_[hidden]>
Date: Thu, 28 Aug 2025 16:35:13 -0700

Hello,

I'm sorry if I am raising an old topic, but I would like to put
forward a possible radix sort implementation. This tries to address
the issues with that algorithm.

Radix sort achieves linear time Θ(n) by sorting keys incrementally,
one "digit" at a time. It was originally used to sort punch cards.
There are lots of great articles about it on the internet.

The primary limitation with radix sort is that it depends on having a
fundamental type sort key. This doesn't have to be as bad as it is
widely believed to be. For example, it is actually easy to use
floating point numbers and signed numbers with radix sort.

Here is how to convert a 32-bit float for use with radix sort. It
expects a sign-extended right shift.

    uint32_t t;
    ::memcpy(&t, &key, sizeof t);
    return t ^ (uint32_t)(((int32_t)t_ >> 31) | 0x80000000);

Furthermore, if you have a sort key with multiple fields it would be
possible to use radix sort on the first field to get things to the
point they were almost sorted. At that point, if everything is within
a small distance of its destination then insertion sort could be used.
Having things end with using insertion sort is quite common in sorting
algorithm design.

I have an example implementation that uses either 8-bit digits or
11-bit digits. The 11-bit version does use a 20 kilobyte histogram
but that might be useful for very large data sets. This version is
designed as a thin template wrapper around a reusable implementation
to avoid code bloat. It also observes if the most significant digits
are all zero and avoids processing them.

To make this more consistent with the standard I might suggest a
version of std::radix_sort() that takes std::random_access_iterator to
std::pair<K, V> where K was required to be a fundamental type.
Temporary memory would also be required to be provided somehow.

A mixture of radix sort on the most significant digits and insertion
sort on the remainder might also be appropriate for 64-bit types.
However, it is a bit of a weird thing to be doing.

Here is the example implementation that is shared between template instances:

https://github.com/whatchamacallem/hatchlingplatform/blob/master/src/hxradix_sort.cpp

Here is the corresponding header. It isn't part of this proposal as it
has a different allocation strategy, and is a bit corny, but it is
here if anyone wants to see more details.

https://github.com/whatchamacallem/hatchlingplatform/blob/master/include/hx/hxradix_sort.hpp

This sort of thing starts to matter in computer graphics when you have
a very complex scene and the graphics hardware will operate more
efficiently when rendering from front to back. It also matters in
collision detection when trying to find the intersection of a large
number of volumes. If you find the axis along which the volumes are
best distributed and use radix sort to sort their extents, then single
axis sweep and prune can be used. These turn out to be important due
to their frame rate stability and throughput.

Again, let me know if you would like a more formal proposal that looks
like the rest of the standard.

Regards,
Adrian

Received on 2025-08-28 23:35:26