Date: Mon, 30 Aug 2021 11:25:47 -0700
On Monday, 30 August 2021 10:29:48 PDT Ville Voutilainen wrote:
> > Anyway, can I ask that we get a spinlock before we get a mutex? Before
> > C++17 it was impossible to get it right with std::atomic_flag; right,
> > it's just non- obvious.
>
> The marching order of an atomic_mutex and spinlock is completely
> non-obvious to me, I happily don't program
> at these levels much, so I'll also happily defer this to Marko. :)
mutex = spinlock + futex (wait / wake) + optional goodies
Spinlocks do suffer from the transactional memory requirement too.
But mutexes will have more scope creep, like the "adaptive" mode (spin a
little before going for the system call).
An intermediary option is a ticket lock. Whenever I worked with hardware
architects on spinlock performance issues, their first question was why I
wasn't doing a ticket lock instead.
And I worked with hardware people BECAUSE I was looking at naively-implemented
spinlocks and those have extremely poor scalability. I've just noticed that
the libstdc++ stop_token::binary_semaphore has this mistake (fortunately, it
doesn't get used because std::binary_semaphore exists).
The correct implementation for a non-transactional spinlock is:
while (lock.test_and_set(std::memory_order_acquire)) {
while (lock.test(std::memory_order_relaxed))
_mm_pause();
}
See https://gcc.godbolt.org/z/zhYhz6e9M for the implementation in action as
well as a port of glibc's pthread_spin_lock() to C++. GCC and Clang generated
almost exactly the same code as is written in assembly in glibc, but
performance-wise the extra instructions for atomic_flag shouldn't make a
difference.
> > Anyway, can I ask that we get a spinlock before we get a mutex? Before
> > C++17 it was impossible to get it right with std::atomic_flag; right,
> > it's just non- obvious.
>
> The marching order of an atomic_mutex and spinlock is completely
> non-obvious to me, I happily don't program
> at these levels much, so I'll also happily defer this to Marko. :)
mutex = spinlock + futex (wait / wake) + optional goodies
Spinlocks do suffer from the transactional memory requirement too.
But mutexes will have more scope creep, like the "adaptive" mode (spin a
little before going for the system call).
An intermediary option is a ticket lock. Whenever I worked with hardware
architects on spinlock performance issues, their first question was why I
wasn't doing a ticket lock instead.
And I worked with hardware people BECAUSE I was looking at naively-implemented
spinlocks and those have extremely poor scalability. I've just noticed that
the libstdc++ stop_token::binary_semaphore has this mistake (fortunately, it
doesn't get used because std::binary_semaphore exists).
The correct implementation for a non-transactional spinlock is:
while (lock.test_and_set(std::memory_order_acquire)) {
while (lock.test(std::memory_order_relaxed))
_mm_pause();
}
See https://gcc.godbolt.org/z/zhYhz6e9M for the implementation in action as
well as a port of glibc's pthread_spin_lock() to C++. GCC and Clang generated
almost exactly the same code as is written in assembly in glibc, but
performance-wise the extra instructions for atomic_flag shouldn't make a
difference.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DPG Cloud Engineering
Received on 2021-08-30 13:25:54