I think you need an actual benchmark to be able to claim the performance quality of the implementation.

For example, wouldn't it be better to just have as "isInitialized" thread local boolean and to access the object, you do:


thread_local bool isInitialized = false;
T& initialize() // mark with attribute to never inline
{
    Emplace on optional with call_once (typing on my phone :) )
    isInitialized = true;
    Return *g_obj;
}

inline T& g_obj_fn() {
     if (isInitialized) // mark this with likely attribute
        return *g_obj:
    else return initialize();
}

Then all access to the variable just need to check a boolean (in the hot path) instead of calling an opaque function.

I also don't see the need for a mutex on the initialization. The global call_once flag should be enough. The thread local variable shouldn't need to be atomic either.

Breno G.




Em qua., 28 de jun. de 2023 05:58, Frederick Virchanza Gotham via Std-Proposals <std-proposals@lists.isocpp.org> escreveu:
On Wed, Jun 28, 2023 at 1:06 AM Breno Guimarães wrote:
>
> You can do:
>
> T& g_obj_fn() {
>     static T t;
>     return t;
> }
>
> #define g_obj (g_obj_fn())
>
> Function local statics do the flag+muted+call_once for you.


That's exactly what I have in my current code, I'm just looking for
ways to make it more efficient. For example, the 'call_once' can be
taken out if we change the global function pointer. I've come up with
three changes:
(1) Mark the global function pointer as 'thread_local'
(2) Use a shared_mutex instead of a mutex
(3) Only lock the mutex the first time for each thread

I've gotten the machine code for the function "Func_PostInit" down to:

        mov     rax, OFFSET FLAT:g_optional
        ret

And the machine code for invoking the thread_local function pointer is:

        mov     rax, QWORD PTR fs:g_funcptr@tpoff
        call    rax

I might be very very close to the most efficient implementation.

So here's my code now:

    https://godbolt.org/z/sPfPsnbjv

And here it is copy-pasted:

#include <atomic>                   // atomic
#include <optional>                 // optional
#include <mutex>                    // once_flag, call_once, lock_guard
#include <shared_mutex>             // shared_mutex, shared_lock
#include <iostream>                 // Just for testing: cout, enld
#include <thread>                   // Just for testing: jthread,
get_id, sleep_for
#include <chrono>                   // Just for testing: milliseconds

struct T {
     T(void) { std::cout << "Constructing. . .\n"; }
    ~T(void) { std::cout << "Destroying. . .\n"  ; }
    void Speak(void) { std::cout << "Hi!\n"; }
};

std::optional<T> g_optional;

/* forward declaration of function */ T &Func_PreInit(void);

/* Note that the following variable is thread_local */
thread_local std::atomic<T &(*)(void)> g_funcptr{ &Func_PreInit };  /*
<<<<<<<<<<<< thread_local */

static std::shared_mutex g_mutex_for_optional;

T &Func_PostInit(void)
{
    // No protection at all in here -- no mutex, no once_flag, nothing

    std::cout << "- - - Func_PostInit " << std::this_thread::get_id()
<< std::endl;

    return *g_optional.operator->();  // crashes instead of throwing
}

std::once_flag g_flag_for_optional{};

T &Func_PreInit(void)
{
    std::cout << "- - - Func_PreInit " << std::this_thread::get_id()
<< std::endl;

    std::call_once(g_flag_for_optional, []()
        {
            std::lock_guard mylock(g_mutex_for_optional);  // exclusive lock
            g_optional.emplace();
        });

    g_funcptr.store( &Func_PostInit );  // ------------- This variable
is thread_local

    std::shared_lock mylock(g_mutex_for_optional);  // shared lock
    return *g_optional.operator->();  // crashes instead of throwing
}

#define g_obj (g_funcptr.load()())

int main(void)
{
    g_obj.Speak();
    std::jthread mythread( []()
        {
            std::this_thread::sleep_for( std::chrono::milliseconds(20u) );
            g_obj.Speak();
            std::this_thread::sleep_for( std::chrono::milliseconds(105u) );
            g_obj.Speak();
        } );

    std::this_thread::sleep_for( std::chrono::milliseconds(120u) );
    g_obj.Speak();
    g_obj.Speak();
    g_obj.Speak();
}

I think that might be the most efficient way of implementing a
'construct on demand' global object, allowing for the circumstance
that it never gets constructed.
--
Std-Proposals mailing list
Std-Proposals@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals