C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Random numbers in identical builds

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Thu, 13 Mar 2025 12:01:17 +0000
On Wed, Mar 12, 2025 at 1:50 PM Frederick Virchanza Gotham wrote:
>
> On Wed, Mar 12, 2025 at 9:54 AM Frederick Virchanza Gotham wrote:
> >
> > a) Feed sequential numbers followed by a salt into a hash algorithm such as MD5
>
>
> Here's a constexpr implementation of the MD5 algorithm written by Wodann:
>
> https://github.com/Wodann/constexpr-md5-cpp/blob/master/include/md5.h



I won't turn this into a cryptography mailing list; I'm just going to
give minimal information to get my point across.

I wrote a short program to give sequential numbers to MD5:

    #include <iostream>
    #include "md5.h"

    int main(void)
    {
        for ( __uint128_t n = 0u; ; ++n )
        {
            md5::details::Context c;
            c.append( (char*)&n, sizeof n );
            __uint128_t const digest = c.final();
            std::cout.write( (char*)&digest, sizeof digest );
        }
    }

I then piped the output of this program into DieHarder to test whether
it's "random enough":

    myprogram | dieharder -a -g 200

And it passed all the tests. Specifically it passed the following tests:

diehard_birthdays diehard_operm5 diehard_rank_32x32 diehard_rank_6x8
diehard_bitstream diehard_opso diehard_oqso diehard_dna
diehard_count_1s_str diehard_count_1s_byt diehard_parking_lot
diehard_2dsphere diehard_3dsphere diehard_squeeze diehard_sums
diehard_runs diehard_runs diehard_craps diehard_craps
marsaglia_tsang_gcd marsaglia_tsang_gcd, diehard_predict_mers
sts_monobit sts_runs


Previously I posted this code:

> if ( nullptr == name )
> {
> auto const location = std::source_location::current();
> Append( location.file_name() );
> Append( location.function_name() );
> Append( uint_to_base10_string(location.line()).c_str() );
> Append( uint_to_base10_string(location.column ()).c_str() );
> }



I just realised now that this doesn't make much sense at all. Here's
what would make more sense:
    (Point No. 1) If the 'uuid' function is called within a consteval
context, use MD5 with a salt so that we have reproducible identical
builds
    (Point No. 2) If the 'uuid' function is called at runtime, use
std::random_device

So if the signature of the function is as follows:

   namespace std {
        constexpr __uint128_t uuid( char const *name = nullptr );
    }

Then it would make sense to have it work as follows:
    (Point A) If invoked in a consteval context, name must not be a nullptr
    (Point B) If invoked at runtime, name must be a nullptr

Actually I think it's better to split this into two functions, one
being consteval as follows:

    consteval __uint128_t uuid(char const *const name) noexcept
    {
        // The following line shouldn't be needed
        // but I'm a fan of belt and braces
        static_assert( std::is_constant_evaluated() );
        md5::details::Context c;
        c.append( name, md5::details::const_strlen(name) );
        c.append( "This is my salt!", sizeof "This is my salt!" - 1u );
        return c.final();
    }

    __uint128_t uuid(void) noexcept(false)
    {
        std::random_device rd; // might throw
        // rd() yields an unsigned integer type of implementation-defined width
        __uint128_t retval = 0u;
        static_assert( 0u == (sizeof(__uint128_t) % sizeof(rd())) );
        constexpr unsigned how_many_uints = sizeof(__uint128_t) / sizeof(rd());
        for ( unsigned n = 0u; n < how_many_uints; ++n )
        {
            retval <<= 128u / how_many_uints;
            retval |= rd(); // might throw
        }
        return retval;
    }

I think this gives the best behaviour, because we'll get a compiler
error if we try to give a name to a UUID that won't be generated until
runtime.

I realise that the initialism UUID started out as 'Universally Unique
Identifier", with versions 1, 2, 3 and so forth. But nowadays I think
it's reasonable to use UUID as a synonym for a 128-Bit random number.
So I think it would be reasonable to have a function called
"std::uuid" which returns an implementation-defined type which could
be either __uint128_t or std::array< char unsigned, 128u / CHAR_BIT >.

Do you think this belongs in the C++ standard library?

Received on 2025-03-13 12:01:29