Date: Wed, 3 Jul 2024 21:24:09 +0000
Hello everyone! I just come up with a small idea about std::atomic. I don't know whether it is useful enough, but I found that it can certainlly improve the performance of std::binary_semaphore in MSVC STL.
I think std::atomic<T> could provide a member function whose declaration is `T wait_and_write_back(T old,std::memory_order<http://zh.cppreference.com/w/cpp/atomic/memory_order> order =
std::memory_order::seq_cst)noexcept`. This member function will use exchange to write old into the inner atomic memory, then compare old with the return value of exchange. If equal, it will block, or else, it will return.
The impl of std::atomic<T>::wait() in MSVC STL is here.
void wait(const _TVal _Expected, const memory_order _Order = memory_order_seq_cst) const noexcept {
_STD _Atomic_wait_direct(this, _STD _Atomic_reinterpret_as<char>(_Expected), _Order);
}
void _Atomic_wait_direct(
const _Atomic_storage<_Ty>* const _This, _Value_type _Expected_bytes, const memory_order _Order) noexcept {
const auto _Storage_ptr = _STD addressof(_This->_Storage);
for (;;) {
const _Value_type _Observed_bytes = _STD _Atomic_reinterpret_as<_Value_type>(_This->load(_Order));
if (_Expected_bytes != _Observed_bytes) {
return;
}
::__std_atomic_wait_direct(_Storage_ptr, &_Expected_bytes, sizeof(_Value_type), __std_atomic_wait_no_timeout);
}
}
Then is the impl of wait_and_write_back, just change a little.
T wait(const _TVal _Expected, const memory_order _Order = memory_order_seq_cst) const noexcept {
return _STD _Atomic_wait_and_write_back(this, _STD _Atomic_reinterpret_as<char>(_Expected), _Order);
}
T _Atomic_wait_and_write_back(
const _Atomic_storage<_Ty>* const _This, _Value_type _Expected_bytes, const memory_order _Order) noexcept {
const auto _Storage_ptr = _STD addressof(_This->_Storage);
for (;;) {
const _Value_type _Observed_bytes = _STD _Atomic_reinterpret_as<_Value_type>(_This->exchange(_Order));
if (_Expected_bytes != _Observed_bytes) {
return _Observed_bytes;
}
::__std_atomic_wait_direct(_Storage_ptr, &_Expected_bytes, sizeof(_Value_type), __std_atomic_wait_no_timeout);
}
}
Here is why it can improve the performance of std::binary_semaphore. In fact, is the performance of std::binary_semaphore::acquire()
Here is the impl in MSVC STL:
//The only data member of std::binary_semaphore is atomic<unsigned char> _Counter
void acquire() noexcept /* strengthened */ {
for (;;) {
unsigned char _Prev = _Counter.exchange(0);
if (_Prev == 1) {
break;
}
_Counter.wait(0, memory_order_relaxed);
}
}
It will change to
void acquire() noexcept
{
_Counter.wait_and_write_back(0);
}
Observely, there is less atomic calling and if in the function body of acquire.
I think std::atomic<T> could provide a member function whose declaration is `T wait_and_write_back(T old,std::memory_order<http://zh.cppreference.com/w/cpp/atomic/memory_order> order =
std::memory_order::seq_cst)noexcept`. This member function will use exchange to write old into the inner atomic memory, then compare old with the return value of exchange. If equal, it will block, or else, it will return.
The impl of std::atomic<T>::wait() in MSVC STL is here.
void wait(const _TVal _Expected, const memory_order _Order = memory_order_seq_cst) const noexcept {
_STD _Atomic_wait_direct(this, _STD _Atomic_reinterpret_as<char>(_Expected), _Order);
}
void _Atomic_wait_direct(
const _Atomic_storage<_Ty>* const _This, _Value_type _Expected_bytes, const memory_order _Order) noexcept {
const auto _Storage_ptr = _STD addressof(_This->_Storage);
for (;;) {
const _Value_type _Observed_bytes = _STD _Atomic_reinterpret_as<_Value_type>(_This->load(_Order));
if (_Expected_bytes != _Observed_bytes) {
return;
}
::__std_atomic_wait_direct(_Storage_ptr, &_Expected_bytes, sizeof(_Value_type), __std_atomic_wait_no_timeout);
}
}
Then is the impl of wait_and_write_back, just change a little.
T wait(const _TVal _Expected, const memory_order _Order = memory_order_seq_cst) const noexcept {
return _STD _Atomic_wait_and_write_back(this, _STD _Atomic_reinterpret_as<char>(_Expected), _Order);
}
T _Atomic_wait_and_write_back(
const _Atomic_storage<_Ty>* const _This, _Value_type _Expected_bytes, const memory_order _Order) noexcept {
const auto _Storage_ptr = _STD addressof(_This->_Storage);
for (;;) {
const _Value_type _Observed_bytes = _STD _Atomic_reinterpret_as<_Value_type>(_This->exchange(_Order));
if (_Expected_bytes != _Observed_bytes) {
return _Observed_bytes;
}
::__std_atomic_wait_direct(_Storage_ptr, &_Expected_bytes, sizeof(_Value_type), __std_atomic_wait_no_timeout);
}
}
Here is why it can improve the performance of std::binary_semaphore. In fact, is the performance of std::binary_semaphore::acquire()
Here is the impl in MSVC STL:
//The only data member of std::binary_semaphore is atomic<unsigned char> _Counter
void acquire() noexcept /* strengthened */ {
for (;;) {
unsigned char _Prev = _Counter.exchange(0);
if (_Prev == 1) {
break;
}
_Counter.wait(0, memory_order_relaxed);
}
}
It will change to
void acquire() noexcept
{
_Counter.wait_and_write_back(0);
}
Observely, there is less atomic calling and if in the function body of acquire.
Received on 2024-07-03 21:24:15