std::mt19937) can never produce some integers on the first call after seeding.
In particular, the numbers 7 are 13 are mentioned. I state that such claim is false with a simple counterexample in code.
#include <random>
#include <cassert>
using namespace std;
int main()
{
auto rnd = mt19937(1080100664);
assert(rnd() == 7);
rnd = mt19937(736640520);
assert(rnd() == 7);
rnd = mt19937(1292535796);
assert(rnd() == 13);
}
How did I found the seeds? With brute force. Here is some more code:
#include <random>
#include <iostream>
#include <limits>
#include <cassert>
using namespace std;
int main()
{
# pragma omp parallel for
for(auto i = 0ull; i <= numeric_limits<unsigned int>::max() ; ++i ) {
auto mt = mt19937(i);
auto n = mt();
if (n == 7 || n == 13)
cout << "Hurray, seed = " << i << ", n = " << n << '\n';
if (i % (1<<24) == 0)
cout << i << endl;
}
}
For speed, compile with: g++ file.cpp -03 -fopenmp
Please consider this brute force approach in the future. And please do not reference O’Neill’s article in the future, it probably has more false claims. I can only speculate why that article was written. My initial guess is that it is meant to be an advertisement
of PCG and not real science or engineering. The experiment presented there is too slow to actually prove anything. Additionally, the author could have used bugged
std::random_device. For example, that facility was useless on GCC on Windows until v9.2.0. See bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494.
Now, I’ll present my second point.
Both papers and the article do not acknowledge that the seeding of mt19937 with a single integer uses an algorithm that properly fills the whole state with pseudo-random integers. It is not just one integer and all zeroes. The algorithm was
published here
http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/MT2002/emt19937ar.html by the same authors that published the original Mersenne twister. The C++ standard demands this algorithm. It isn’t that terrible to seed
mt19937 with only a single integer as that article and the papers say. It’s OK if you do it.
So what is the proper way to seed this generator? Well, a single integer is actually good for the majority of users, and if you really need wider state space, use
seed_seq with only few integers.
#include <random>
using namespace std;
int main()
{
random_device rd;
seed_seq sd{rd(), rd(), rd(), rd()}; // 4 values are enough. 8 max.
auto g = mt19937(sd); // advanced seeding
}
In this example you get about 2^128 different seeds which is more than enough.
I personally think that filling the whole state (624 integers) with values from
random_device is wrong because it wastes the system entropy. If not wrong then slow and pointless. I consider the following code to be bad.
#include <random>
using namespace std;
int main()
{
random_device rd;
unsigned int v[624]; // no need for this large seed
for(auto& a: v)
a = rd();
seed_seq sd(begin(v), end(v));
auto g = mt19937(sd);
}
I have mirrored this second point from my own article written here https://simplecxx.github.io/2018/11/03/seed-mt19937.html.
Now, to finish with my last point.
How should the API for random numbers be improved? For starters, the documentation for
mt19937, seed_seq and random_device should be improved by mentioning that using just 4 integers in
seed_seq is actually good. Maybe this should be done with an example in the standard.
Secondly, some API for advanced users can be added that seeds an mt19937 with 4 random integers. For example:
auto get_good_mt19937()
{
random_device rd;
seed_seq sd{rd(), rd(), rd(), rd()};
return mt19937(sd);
}
Lastly, some API for novice users can be added. Some was actually
already proposed in TS v2. To this I would add note for implementers that the global thread-local RNG should not be initialized and
random_device should not be called if this global RNG is never used. That can be achieved either with global variable with
deferred dynamic initialization, or with
static/thread-local local variable.