C++ Logo

std-proposals

Advanced search

Re: [std-proposals] solution proposal for Issue 2524: generate_canonical can occasionally return 1.0

From: Lénárd Szolnoki <cpp_at_[hidden]>
Date: Sun, 7 Dec 2025 08:49:28 +0000
On 05/12/2025 18:25, Jonathan Wakely via Std-Proposals wrote:
>
>
> On Fri, 5 Dec 2025 at 14:34, Jonathan Wakely <cxx_at_[hidden] <mailto:cxx_at_[hidden]>> wrote:
>
>
>
> On Fri, 5 Dec 2025 at 14:06, Juan Lucas Rey <juanlucasrey_at_[hidden]
> <mailto:juanlucasrey_at_[hidden]>> wrote:
>
> "You keep saying "canonical_distribution" ... do you mean
> std::generate_canonical, or one of the random number distributions in
> <random>, or some non-standard random number distribution in your own
> code?"
>
> I mean std::generate_canonical, yes.
>
> "But your proposal returns negative numbers for those 10 values. It's
> highly debatable whether that is a "better distribution" given that
> those values are outside the [0,1) range!
> Your results might be more uniformly distributed over some range, but
> it's a different range!"
>
> My suggestion to use "generate_canonical_centered" inside
> "std::exponential_distribution" (as proposed in the sample file I
> sent) does return 10 different values for the extremes. what libstd++
> is proposing is to return the same value for those 10 cases. As
> explained before, the purpose here is to have that different range,
> containing better precision, especially in the right limit, being
> properly handled in the other distributions.
>
>
>
> Your proposal needs to say that then. Because currently it says:
>
> *0.4 3. Proposal*
> Add the following to <random>:
> namespace std {
> template<class RealType = double, int bits, class URNG>
> RealType generate_canonical_centered(URNG& g);
> }
>
> Is that it? That's the whole proposal?!
> Apparently not, apparently you want to change std::exponential_distribution too. What
> about the other 20+ places that use std::generate_canonical?
>
> So in summary:
>
> You should explain that where P0952R2 says "In particular, code that depends on a
> specific sequence of results from repeated invocations, *or on a particular number of
> calls to the URBG argument*, will be broken" that it's the second part (in bold) that
> is a problem for your. Based on your initial PDF proposal there is no clue whether the
> compatibility you're talking about is the exact sequence of values returned, or the
> number of invocations of the URBG. The word "discard" doesn't even appear in the proposal.
>
> Your abstract says "without altering existing behavior". I think you mean "without
> altering the C++23 behaviour", but you should be clear about what you mean by
> "existing". P0952R2 is already part of the C++26 draft. Assuming you mean "without
> changing the C++23 behaviour", how does proposing a completely different function
> help? The P0952R2 changes would still be in C++26, and so that's still a change from
> C++23. How does a different function with different behaviour undo the changes to
> std::generate_canonical?!
>
> You need to be clear about what you're actually proposing, and the impact on
> implementations (they would need to replace some or all internal uses of
> std::generate_canonical with your new function, and adjust to deal with a completely
> different output range?)
>
> Currently the proposal is vague and contradictory and confusing.
>
>
> Finally, I don't see how making more use of the increased precision near zero actually
> helps. The purpose of std::generate_canonical is to produce values uniformly distributed
> in the range [0,1). Producing more values close to zero because there is a higher density
> of representable values there does not meet the contract.

The way it helps is that the way the centered distribution is sliced and rearranged, it
produces a uniform distribution on (0, 1], and then the produced value is used directly as
-log(u).

The way libstdc++ (and I assume other implementations as well) do it, is that it produces
a uniform distribution on [0, 1), and then use it as -log(1-u). 1-u has reduced precision
close to 0 (in fact on the whole range of (0, 0.5)). Using 1-u is effectively equivalent
to generating a fixed-point number between 0 and 1 with 24 bits of mantissa in terms of
precision (assuming float).

If we deem the resulting exponential distribution acceptable then this algorithm is quite
wasteful in how it uses the random generator, as it only uses a fixed 24 bits of entropy,
but consumes a lot more bits from the generator to generate the intermediate
generate_canonical.

>
> If I have five buckets of different sizes, 5L, 3L, 2L, and 1L, and I have to evenly
> distribute 4L of water into those buckets, putting more in the 5L bucket because it has
> more capacity does not make sense. There should be exactly 1L in each bucket. This seems
> analogous to saying that we should return more results near zero, because there are more
> representable values near zero.
>
> Ideally, we want 25% of all results to be in the interval [0, 0.25) and 25% of all results
> to be in the interval [0.75, 1.0). We don't want there to be more than 25% of results in
> the first interval just because it's a bigger bucket that can represent more distinct
> values, due to the higher precision.
>
> And really finally finally, one of the P0952R2 authors reminded me that the standard
> already has a note giving you the guarantee that you want:
> https://eel.is/c++draft/rand.util.canonical#note-1 <https://eel.is/c++draft/
> rand.util.canonical#note-1>
> When the full range of the URBG is (2^N - 1) for any N, there is never a need to discard
> any values from the URBG. So if you are only concerned with the additional discards being
> done, just make sure your URBG is sensible. If your URBG returns any value in the range
> [0,UINT_MAX) or [0,ULLONG_MAX) then there will be no discarded values.
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-12-07 08:49:42