AW: [std-proposals] solution proposal for Issue 2524: generate

Trying to save the idea by reusing the non-P0952 post-processed generate_canonical with subnormal numbers in the case of -log(generate_canonical()):

Internally

-log(generate_canonical()) is used

The case of 0 could be mapped to a number slightly lower or higher than 1 to keep statistical properties (like expected value)?

Numbers around 1 cannot be represented with high accuracy, but

-log(1) = -0

can.

The overall problem with statistical properties is:

Having a discrete distribution with expected value and variance and putting it through a non-linear function like log changes the resulting expected value and variance slightly compared to a continuous distribution.

That effect is there, even when using the fixed-point random numbers of P0952.

If the discreteness is known, the distributions can apply correction factors.

-----Ursprüngliche Nachricht-----
Von: Lénárd Szolnoki <cpp@lenardszolnoki.com>
Gesendet: Mo 08.12.2025 01:00
Betreff: Re: [std-proposals] solution proposal for Issue 2524: generate_canonical can occasionally return 1.0
An: std-proposals@lists.isocpp.org;
CC: Sebastian Wittmeier <wittmeier@projectalpha.org>;

On 07/12/2025 14:50, Sebastian Wittmeier via Std-Proposals wrote:
> I meant something like
>
> val = generate_canonical();
>
> if (val==0) val=1;
>
> Or would the remaining subnormal numbers violate the lower b bound of the interval as they
> sometimes are rounded to 0?

If I used 32 bit IEEE float and made a uniform distribution where every representable
value was possible on the selected range, then for a uniform distribution on (0, 1] rolls
1 with probability 2^-24 or 2^-25, depending on rounding strategy.

For a uniform distribution on [0, 1), 0 is rolled with probability 2^-149 or 2^-150,
depending on rounding strategy and assuming that subnormals are also rolled with uniform
probabilities and no representable value is skipped. The set of representable values are
much more dense near 0 than near 1, hence the different probabilities.

I don't think that changing such a distribution by adjusting 0 to 1 has the desired
effect, at least the resulting distribution doesn't resemble a uniform distribution on (0,
1] that one would write from scratch.

Having said that, if I read it right, generate_canonical as specified P0952 effectively
treats float as a fixed-point number with 24 bit of mantissa, and never accesses the extra
precision that is available on [0, 0.5), always skipping over subnormals and more. So
there adjusting 0 to 1 works as intended, as it always produces a distribution where each
possible value has the same probability. But 1-x works here as well, as there is no more
precision to lose.