ISOCPP std-proposals List: Re: [std-proposals] solution proposal for Issue 2524: generate

From: Lénárd Szolnoki <cpp_at_[hidden]>
Date: Mon, 8 Dec 2025 00:00:35 +0000

On 07/12/2025 14:50, Sebastian Wittmeier via Std-Proposals wrote:
> I meant something like
>
> val = generate_canonical();
>
> if (val==0) val=1;
>
> Or would the remaining subnormal numbers violate the lower b bound of the interval as they
> sometimes are rounded to 0?

If I used 32 bit IEEE float and made a uniform distribution where every representable
value was possible on the selected range, then for a uniform distribution on (0, 1] rolls
1 with probability 2^-24 or 2^-25, depending on rounding strategy.

For a uniform distribution on [0, 1), 0 is rolled with probability 2^-149 or 2^-150,
depending on rounding strategy and assuming that subnormals are also rolled with uniform
probabilities and no representable value is skipped. The set of representable values are
much more dense near 0 than near 1, hence the different probabilities.

I don't think that changing such a distribution by adjusting 0 to 1 has the desired
effect, at least the resulting distribution doesn't resemble a uniform distribution on (0,
1] that one would write from scratch.

Having said that, if I read it right, generate_canonical as specified P0952 effectively
treats float as a fixed-point number with 24 bit of mantissa, and never accesses the extra
precision that is available on [0, 0.5), always skipping over subnormals and more. So
there adjusting 0 to 1 works as intended, as it always produces a distribution where each
possible value has the same probability. But 1-x works here as well, as there is no more
precision to lose.

>
> -----Ursprüngliche Nachricht-----
> *Von:* Lénárd Szolnoki <cpp_at_[hidden]>
> *Gesendet:* So 07.12.2025 12:56
> *Betreff:* Re: [std-proposals] solution proposal for Issue 2524: generate_canonical
> can occasionally return 1.0
> *An:* std-proposals_at_[hidden];
> *CC:* Sebastian Wittmeier <wittmeier_at_[hidden]>;
>
>
> On 07/12/2025 10:57, Sebastian Wittmeier via Std-Proposals wrote:
> > Changing from [0; 1) to (0; 1] and vice versa is simple on the call site, just one
> > conditional. So the exponential distribution could fix it without a new
> generate_canonical?
>
> Can you elaborate what the simple fix is from changing [0, 1) to (0, 1]? Apart from 1-x,
> which has the precision problem.
>
>
> >
> >
> > -----Ursprüngliche Nachricht-----
> > *Von:* Lénárd Szolnoki via Std-Proposals <std-proposals_at_[hidden]>
> > *Gesendet:* So 07.12.2025 09:49
> > *Betreff:* Re: [std-proposals] solution proposal for Issue 2524: generate_canonical
> > can occasionally return 1.0
> > *An:* std-proposals_at_[hidden]; Jonathan Wakely <cxx_at_[hidden]>;
> > *CC:* Lénárd Szolnoki <cpp_at_[hidden]>; pnash44_at_[hidden]; Juan
> Lucas Rey
> > <juanlucasrey_at_[hidden]>;
> >
> >
> > On 05/12/2025 18:25, Jonathan Wakely via Std-Proposals wrote:
> > >
> > >
> > > On Fri, 5 Dec 2025 at 14:34, Jonathan Wakely <cxx_at_[hidden]
> > <mailto:cxx_at_[hidden]>> wrote:
> > >
> > >
> > >
> > > On Fri, 5 Dec 2025 at 14:06, Juan Lucas Rey <juanlucasrey_at_[hidden]
> > > <mailto:juanlucasrey_at_[hidden]>> wrote:
> > >
> > > "You keep saying "canonical_distribution" ... do you mean
> > > std::generate_canonical, or one of the random number distributions in
> > > <random>, or some non-standard random number distribution in your own
> > > code?"
> > >
> > > I mean std::generate_canonical, yes.
> > >
> > > "But your proposal returns negative numbers for those 10 values. It's
> > > highly debatable whether that is a "better distribution" given that
> > > those values are outside the [0,1) range!
> > > Your results might be more uniformly distributed over some range, but
> > > it's a different range!"
> > >
> > > My suggestion to use "generate_canonical_centered" inside
> > > "std::exponential_distribution" (as proposed in the sample file I
> > > sent) does return 10 different values for the extremes. what libstd++
> > > is proposing is to return the same value for those 10 cases. As
> > > explained before, the purpose here is to have that different range,
> > > containing better precision, especially in the right limit, being
> > > properly handled in the other distributions.
> > >
> > >
> > >
> > > Your proposal needs to say that then. Because currently it says:
> > >
> > > *0.4 3. Proposal*
> > > Add the following to <random>:
> > > namespace std {
> > > template<class RealType = double, int bits, class URNG>
> > > RealType generate_canonical_centered(URNG& g);
> > > }
> > >
> > > Is that it? That's the whole proposal?!
> > > Apparently not, apparently you want to change std::exponential_distribution
> > too. What
> > > about the other 20+ places that use std::generate_canonical?
> > >
> > > So in summary:
> > >
> > > You should explain that where P0952R2 says "In particular, code that
> depends on a
> > > specific sequence of results from repeated invocations, *or on a particular
> > number of
> > > calls to the URBG argument*, will be broken" that it's the second part (in
> > bold) that
> > > is a problem for your. Based on your initial PDF proposal there is no clue
> > whether the
> > > compatibility you're talking about is the exact sequence of values
> returned, or the
> > > number of invocations of the URBG. The word "discard" doesn't even appear in
> > the proposal.
> > >
> > > Your abstract says "without altering existing behavior". I think you
> mean "without
> > > altering the C++23 behaviour", but you should be clear about what you
> mean by
> > > "existing". P0952R2 is already part of the C++26 draft. Assuming you
> mean "without
> > > changing the C++23 behaviour", how does proposing a completely different
> function
> > > help? The P0952R2 changes would still be in C++26, and so that's still a
> change
> > from
> > > C++23. How does a different function with different behaviour undo the
> changes to
> > > std::generate_canonical?!
> > >
> > > You need to be clear about what you're actually proposing, and the impact on
> > > implementations (they would need to replace some or all internal uses of
> > > std::generate_canonical with your new function, and adjust to deal with a
> > completely
> > > different output range?)
> > >
> > > Currently the proposal is vague and contradictory and confusing.
> > >
> > >
> > > Finally, I don't see how making more use of the increased precision near
> zero actually
> > > helps. The purpose of std::generate_canonical is to produce values
> > uniformly distributed
> > > in the range [0,1). Producing more values close to zero because there is a
> higher
> > density
> > > of representable values there does not meet the contract.
> >
> > The way it helps is that the way the centered distribution is sliced and
> rearranged, it
> > produces a uniform distribution on (0, 1], and then the produced value is used
> > directly as
> > -log(u).
> >
> > The way libstdc++ (and I assume other implementations as well) do it, is that
> it produces
> > a uniform distribution on [0, 1), and then use it as -log(1-u). 1-u has reduced
> precision
> > close to 0 (in fact on the whole range of (0, 0.5)). Using 1-u is effectively
> equivalent
> > to generating a fixed-point number between 0 and 1 with 24 bits of mantissa in
> terms of
> > precision (assuming float).
> >
> > If we deem the resulting exponential distribution acceptable then this
> algorithm is quite
> > wasteful in how it uses the random generator, as it only uses a fixed 24 bits
> of entropy,
> > but consumes a lot more bits from the generator to generate the intermediate
> > generate_canonical.
> >
> > >
> > > If I have five buckets of different sizes, 5L, 3L, 2L, and 1L, and I have to
> evenly
> > > distribute 4L of water into those buckets, putting more in the 5L bucket
> because it
> > has
> > > more capacity does not make sense. There should be exactly 1L in each
> bucket. This
> > seems
> > > analogous to saying that we should return more results near zero, because
> there are
> > more
> > > representable values near zero.
> > >
> > > Ideally, we want 25% of all results to be in the interval [0, 0.25) and 25%
> of all
> > results
> > > to be in the interval [0.75, 1.0). We don't want there to be more than 25% of
> > results in
> > > the first interval just because it's a bigger bucket that can represent more
> distinct
> > > values, due to the higher precision.
> > >
> > > And really finally finally, one of the P0952R2 authors reminded me that the
> standard
> > > already has a note giving you the guarantee that you want:
> > > https://eel.is/c++draft/rand.util.canonical#note-1 <https://eel.is/c++draft/
> > > rand.util.canonical#note-1>
> > > When the full range of the URBG is (2^N - 1) for any N, there is never a need to
> > discard
> > > any values from the URBG. So if you are only concerned with the additional
> discards
> > being
> > > done, just make sure your URBG is sensible. If your URBG returns any value
> in the
> > range
> > > [0,UINT_MAX) or [0,ULLONG_MAX) then there will be no discarded values.
> > >
> > >
> > >
> > > --
> > > Std-Proposals mailing list
> > > Std-Proposals_at_[hidden]
> > > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-12-08 00:00:55