On Wed, Nov 17, 2021 at 1:27 PM Dimitrij Mijoski via Std-Proposals <std-proposals@lists.isocpp.org> wrote:
The papers P0205 and P2060 like to improve the code for proper seeding of the standard Mersenne twister. Both papers cite the following article https://www.pcg-random.org/posts/cpp-seeding-surprises.html by Melissa O’Neill. I will start with that article and their claim that the standard Mersenne twister (std::mt19937) can never produce some integers on the first call after seeding. In particular, the numbers 7 are 13 are mentioned. I state that such claim is false with a simple counterexample in code [...]
	auto rnd = mt19937(1080100664);
	assert(rnd() == 7);

Your code is correct; the code in O'Neill's essay is wrong. O'Neill writes:

> If you look online for how to initialize a C++11 random number generator, you'll see code that looks like
>     std::mt19937 my_rng(std::random_device{}());
> or possibly the more explicit (but equivalent) version,
>     std::random_device rdev;
>     uint32_t random_seed = rdev();
>     std::seed_seq seeder{random_seed};
>     std::mt19937 my_rng(seeder);

> Footnote: This example is specific to the Mersenne Twister. For some other generators,
> especially the simpler ones, if you just specify a single integer for the seed, as we did
> in the first piece of code, it uses it directly without involving seed_seq.

The footnote hints at her mistake. I don't know whether the footnote was ever correct, but certainly as of C++20 (and even in the C++11 standard I just checked), these two snippets are not equivalent.
The first snippet, as you discovered, does produce 7 (twice) and 13 (once). But it never produces 0, 1, or 42.
The second snippet is the one that never produces 7 or 13.  (And yes, I just checked this. It took hours.)

Back to Dimitrij Mijoski:
> Please consider this brute force approach in the future. And please do not reference O’Neill’s article in
> the future, it probably has more false claims. I can only speculate why that article was written. My initial
> guess is that it is meant to be an advertisement of PCG and not real science or engineering. The
> experiment presented there is too slow to actually prove anything. Additionally, the author could
> have used bugged std::random_device.

I think that's a bit of an overreaction on your part.
Heck, I'll email Melissa O'Neill right now with the (IMHO fairly minor) correction.
I admit I'm surprised it took 6 years for anyone to fact-check that code, though. :)
 
Now, I’ll present my second point.

Both papers and the article do not acknowledge that the seeding of mt19937 with a single integer uses an algorithm that properly fills the whole state with pseudo-random integers. It is not just one integer and all zeroes. The algorithm was published here http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/MT2002/emt19937ar.html by the same authors that published the original Mersenne twister. The C++ standard demands this algorithm. It isn’t that terrible to seed mt19937 with only a single integer as that article and the papers say. It’s OK if you do it.

It's "okay" in the sense of, "Well, I guess it's okay if you do it." It's not OK in the mathematical sense of "it'll give you an actual random sample of the entire possible state space." Does the average programmer care about the latter sense? No, I don't think they do. But the average programmer also has no reason to be attached to MT19937 in the first place. They might prefer a simpler/smaller/faster generator. I personally have written
https://github.com/Quuxplusone/Xoshiro256ss/blob/main/xoshiro256ss.h
(49 lines) for the sole purpose of copy-pasting into my own toy projects that need a fast PRNG.  If you seed Xoshiro256** with a merely 32-bit seed, then use it to produce 32 bits, it'll never produce 7, or 13 — or any of 3760 other numbers between 1 and 10000. But guess what? My test program to verify that ran in 11 seconds flat. The same thing for MT19937 (which I had to run to produce my comments above) takes literally hours. MT19937 is just awful if what you care about is getting random numbers out of it in a timely fashion. It's the Java of PRNGs.

So what is the proper way to seed this generator? Well, a single integer is actually good for the majority of users, and if you really need wider state space, use seed_seq with only few integers.

If you're already paying for heap-allocation (std::seed_seq is basically std::vector with extra steps), then you might as well fill up the seed. I claim there's no good reason to "under-seed" a generator. If you're only going to use 256 bits of seed, then you should just use a generator with 256 bits of state in the first place.

> I personally think that filling the whole state (624 integers) with values from random_device
> is wrong because it wastes the system entropy. If not wrong then slow and pointless.

Sure, but that's a problem with MT19937 asking for too much seed material, not a problem with the core idea of "fully seeding" a generator.

Lastly, some API for novice users can be added. Some was actually 
already proposed in TS v2. To this I would add note for implementers that the global thread-local RNG should not be initialized and
random_device should not be called if this global RNG is never used. That can be achieved either with global variable with
deferred dynamic initialization, or with 
static/thread-local local variable.
The "novice user" API was proposed (with my strong if irrelevant support) in P0205R0, and it looks like this:

    std::random_device rd;
    auto mt = std::mt19937(rd);

Sadly, P0205R1 seems to have complicated things back into expert territory; it now proposes

    std::random_device rd;
    auto adaptor = std::seed_adaptor(rd);
    auto mt = std::mt19937(adaptor);

I don't immediately understand why this complication was added between R0 and R1, and I'd recommend to Moritz that he re-introduce the original simple approach. Perhaps even provide rvalue overloads taking `SeedGenerator&&`, so that this could work too:

    auto mt = std::mt19937(std::random_device());  // awesome, one line
    auto xo = xoshiro256ss(std::random_device());  // awesome, one line, but way faster and less entropy-hogging

–Arthur