std-proposals: Re: Distributed random number ordering

From: Moritz Klammler <moritz_at_[hidden]>
Date: Wed, 12 May 2021 21:25:53 +0200

The fact that the random /engines/ do have their algorithms specified
but the random /distributions/ don't has caused me discomfort in the
past as well. As you say, it makes writing code that wants to reproduce
the same results on every platform painfully difficult. It also makes
for unit tests that are either fragile or much more complicated than
they would have to be, could the algorithm be relied upon. So as far as
I am concerned, I would be very happy if the algorithm were defined.

Mandating reliable results for the discrete distributions should be
doable; the real-valued ones would be much more challenging I suppose.
Even if the same underlying floating-point implementation could be
assumed. So I'm not sure whether that's realistic to happen.

Anyway, I don't think that the complexity of adding a parameter would be
warranted, though. That would mean that all standard library
implementations would have to implement all variants. I'm not aware of
any actual use cases where having this flexibility would be beneficial
for a user. And if there is a real choice to be made for some current or
future distributions, different types like, say, a hypothetical
fast_triangular_distribution and correct_triangular_distribution could
always be used instead.

Finally, I'm worried that defining the algorithms now (even and
especially for those distributions where doing so would be
straight-forward) would cause many unhappy users who have come to rely
upon the implementation-defined behavior of their current standard
library...

On 5/10/21 1:13 PM, RICHINGS James via Std-Proposals wrote:
> Dear Std-Proposals,
>
> When developing programs for scientific applications the numerical reproducibility of code is of paramount importance if reliable results are to be obtained.
>
> One of the key sources of error is in the sequence in which distributions of random numbers are generated.
>
> By way of example, currently the standard does not require that a given implementation reproduce the same sequence of normally distributed random numbers as it leaves the implementer freedom to choose the algorithm by which to implement the normal distribution. However, this leave the order in which normally distributed random numbers are generated out of the control of the user. This is particularly troubling when currently gcc and llvm have both used the polar method to implement normally distributed random numbers but have decided on different orderings to output the distributed random numbers. This results in a permutation of the even and odd values in a list of normally distributed numbers (1,3,9,4 -> 3,1,4,9). This is not a bug in either implementation as both sequences are valid random numbers, but we cannot control the ordering with the current interface.
>
> This is an issue as this makes it difficult to verify our code against multiple implementations of the standard which we find important when running across multiple machines with different HPC architectures.
>
> Ideally the standard should specify that the order random numbers are returned by a distribution is controllable via a parameter so that the sequence is not dependent on the implementation and that multiple algorithms (if implemented) should be selectable by the user making it possible to always fix the order to a desired convention. This would allow the existing default behaviour to persist but additional control to be added.
>
> Any thoughts on this issue are welcome.
>
> Regards,
>
> James Richings
>
> Research Software Engineer
> James Clerk Maxwell Building
> University of Edinburgh
> Edinburgh
> EH9 3FD
>
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
>

Received on 2021-05-12 14:25:59