Date: Wed, 17 Nov 2021 18:26:41 +0000

The papers P0205<https://wg21.link/p0205> and P2060<https://wg21.link/p2060> like to improve the code for proper seeding of the standard Mersenne twister. Both papers cite the following article https://www.pcg-random.org/posts/cpp-seeding-surprises.html by Melissa O’Neill. I will start with that article and their claim that the standard Mersenne twister (std::mt19937) can never produce some integers on the first call after seeding. In particular, the numbers 7 are 13 are mentioned. I state that such claim is false with a simple counterexample in code.

#include <random>

#include <cassert>

using namespace std;

int main()

{

auto rnd = mt19937(1080100664);

assert(rnd() == 7);

rnd = mt19937(736640520);

assert(rnd() == 7);

rnd = mt19937(1292535796);

assert(rnd() == 13);

}

How did I found the seeds? With brute force. Here is some more code:

#include <random>

#include <iostream>

#include <limits>

#include <cassert>

using namespace std;

int main()

{

# pragma omp parallel for

for(auto i = 0ull; i <= numeric_limits<unsigned int>::max() ; ++i ) {

auto mt = mt19937(i);

auto n = mt();

if (n == 7 || n == 13)

cout << "Hurray, seed = " << i << ", n = " << n << '\n';

if (i % (1<<24) == 0)

cout << i << endl;

}

}

For speed, compile with: g++ file.cpp -03 -fopenmp

Please consider this brute force approach in the future. And please do not reference O’Neill’s article in the future, it probably has more false claims. I can only speculate why that article was written. My initial guess is that it is meant to be an advertisement of PCG and not real science or engineering. The experiment presented there is too slow to actually prove anything. Additionally, the author could have used bugged std::random_device. For example, that facility was useless on GCC on Windows until v9.2.0. See bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494.

Now, I’ll present my second point.

Both papers and the article do not acknowledge that the seeding of mt19937 with a single integer uses an algorithm that properly fills the whole state with pseudo-random integers. It is not just one integer and all zeroes. The algorithm was published here http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/MT2002/emt19937ar.html by the same authors that published the original Mersenne twister. The C++ standard demands this algorithm. It isn’t that terrible to seed mt19937 with only a single integer as that article and the papers say. It’s OK if you do it.

So what is the proper way to seed this generator? Well, a single integer is actually good for the majority of users, and if you really need wider state space, use seed_seq with only few integers.

#include <random>

using namespace std;

int main()

{

random_device rd;

seed_seq sd{rd(), rd(), rd(), rd()}; // 4 values are enough. 8 max.

auto g = mt19937(sd); // advanced seeding

}

In this example you get about 2^128 different seeds which is more than enough.

I personally think that filling the whole state (624 integers) with values from random_device is wrong because it wastes the system entropy. If not wrong then slow and pointless. I consider the following code to be bad.

#include <random>

using namespace std;

int main()

{

random_device rd;

unsigned int v[624]; // no need for this large seed

for(auto& a: v)

a = rd();

seed_seq sd(begin(v), end(v));

auto g = mt19937(sd);

}

I have mirrored this second point from my own article written here https://simplecxx.github.io/2018/11/03/seed-mt19937.html.

Now, to finish with my last point.

How should the API for random numbers be improved? For starters, the documentation for mt19937, seed_seq and random_device should be improved by mentioning that using just 4 integers in seed_seq is actually good. Maybe this should be done with an example in the standard.

Secondly, some API for advanced users can be added that seeds an mt19937 with 4 random integers. For example:

auto get_good_mt19937()

{

random_device rd;

seed_seq sd{rd(), rd(), rd(), rd()};

return mt19937(sd);

}

Lastly, some API for novice users can be added. Some was actually already proposed in TS v2<https://en.cppreference.com/w/cpp/experimental/randint>. To this I would add note for implementers that the global thread-local RNG should not be initialized and random_device should not be called if this global RNG is never used. That can be achieved either with global variable with deferred dynamic initialization<https://en.cppreference.com/w/cpp/language/initialization#Deferred_dynamic_initialization>, or with static/thread-local local variable<https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables>.

#include <random>

#include <cassert>

using namespace std;

int main()

{

auto rnd = mt19937(1080100664);

assert(rnd() == 7);

rnd = mt19937(736640520);

assert(rnd() == 7);

rnd = mt19937(1292535796);

assert(rnd() == 13);

}

How did I found the seeds? With brute force. Here is some more code:

#include <random>

#include <iostream>

#include <limits>

#include <cassert>

using namespace std;

int main()

{

# pragma omp parallel for

for(auto i = 0ull; i <= numeric_limits<unsigned int>::max() ; ++i ) {

auto mt = mt19937(i);

auto n = mt();

if (n == 7 || n == 13)

cout << "Hurray, seed = " << i << ", n = " << n << '\n';

if (i % (1<<24) == 0)

cout << i << endl;

}

}

For speed, compile with: g++ file.cpp -03 -fopenmp

Please consider this brute force approach in the future. And please do not reference O’Neill’s article in the future, it probably has more false claims. I can only speculate why that article was written. My initial guess is that it is meant to be an advertisement of PCG and not real science or engineering. The experiment presented there is too slow to actually prove anything. Additionally, the author could have used bugged std::random_device. For example, that facility was useless on GCC on Windows until v9.2.0. See bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494.

Now, I’ll present my second point.

Both papers and the article do not acknowledge that the seeding of mt19937 with a single integer uses an algorithm that properly fills the whole state with pseudo-random integers. It is not just one integer and all zeroes. The algorithm was published here http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/MT2002/emt19937ar.html by the same authors that published the original Mersenne twister. The C++ standard demands this algorithm. It isn’t that terrible to seed mt19937 with only a single integer as that article and the papers say. It’s OK if you do it.

So what is the proper way to seed this generator? Well, a single integer is actually good for the majority of users, and if you really need wider state space, use seed_seq with only few integers.

#include <random>

using namespace std;

int main()

{

random_device rd;

seed_seq sd{rd(), rd(), rd(), rd()}; // 4 values are enough. 8 max.

auto g = mt19937(sd); // advanced seeding

}

In this example you get about 2^128 different seeds which is more than enough.

I personally think that filling the whole state (624 integers) with values from random_device is wrong because it wastes the system entropy. If not wrong then slow and pointless. I consider the following code to be bad.

#include <random>

using namespace std;

int main()

{

random_device rd;

unsigned int v[624]; // no need for this large seed

for(auto& a: v)

a = rd();

seed_seq sd(begin(v), end(v));

auto g = mt19937(sd);

}

I have mirrored this second point from my own article written here https://simplecxx.github.io/2018/11/03/seed-mt19937.html.

Now, to finish with my last point.

How should the API for random numbers be improved? For starters, the documentation for mt19937, seed_seq and random_device should be improved by mentioning that using just 4 integers in seed_seq is actually good. Maybe this should be done with an example in the standard.

Secondly, some API for advanced users can be added that seeds an mt19937 with 4 random integers. For example:

auto get_good_mt19937()

{

random_device rd;

seed_seq sd{rd(), rd(), rd(), rd()};

return mt19937(sd);

}

Lastly, some API for novice users can be added. Some was actually already proposed in TS v2<https://en.cppreference.com/w/cpp/experimental/randint>. To this I would add note for implementers that the global thread-local RNG should not be initialized and random_device should not be called if this global RNG is never used. That can be achieved either with global variable with deferred dynamic initialization<https://en.cppreference.com/w/cpp/language/initialization#Deferred_dynamic_initialization>, or with static/thread-local local variable<https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables>.

Received on 2021-11-17 12:26:45