std-proposals: Re: Slim mutexes and locks based on C++20 std::atomic::wait

From: Marko Mäkelä <marko.makela_at_[hidden]>
Date: Tue, 31 Aug 2021 22:17:57 +0300

Mon, Aug 30, 2021 at 11:25:47AM -0700, Thiago Macieira via Std-Proposals wrote:
>On Monday, 30 August 2021 10:29:48 PDT Ville Voutilainen wrote:
>> > Anyway, can I ask that we get a spinlock before we get a mutex? Before
>> > C++17 it was impossible to get it right with std::atomic_flag; right,
>> > it's just non- obvious.
>>
>> The marching order of an atomic_mutex and spinlock is completely
>> non-obvious to me, I happily don't program
>> at these levels much, so I'll also happily defer this to Marko. :)
>
>mutex = spinlock + futex (wait / wake) + optional goodies
>
>Spinlocks do suffer from the transactional memory requirement too.
>
>But mutexes will have more scope creep, like the "adaptive" mode (spin a
>little before going for the system call).
>
>An intermediary option is a ticket lock. Whenever I worked with hardware
>architects on spinlock performance issues, their first question was why I
>wasn't doing a ticket lock instead.
>
>And I worked with hardware people BECAUSE I was looking at naively-implemented
>spinlocks and those have extremely poor scalability. I've just noticed that
>the libstdc++ stop_token::binary_semaphore has this mistake (fortunately, it
>doesn't get used because std::binary_semaphore exists).
>
>The correct implementation for a non-transactional spinlock is:
>
>while (lock.test_and_set(std::memory_order_acquire)) {
> while (lock.test(std::memory_order_relaxed))
> _mm_pause();
>}

Thank you for this concrete example.

I updated https://github.com/dr-m/atomic_sync once more. With GCC
11.2.0, the atomic_spin_mutex::wait_and_lock() now loops around a load,
instead of looping around lock cmpxchg. On my system, the difference
between pinning the test program to a single NUMA node, or running it
normally (using both CPU sockets in the system) became smaller.

Marko Mäkelä

Received on 2021-08-31 14:18:09