Date: Wed, 28 Aug 2024 15:12:02 +0200
I recently have presented my idea for an improvement of
shared_ptr<> in combination with
atomic<shared_ptr<>> here, but I was misunderstood. My
idea was simply that shared_ptr<>
should add a new assignement-operator with a
atomic<shared_ptr<>> as its parameter. The
trick with that would be that the assignment-operator could check
if both pointers point to
the same data and then it could do nothing.
If you have a RCU-like pattern the participating theads could
simply keep a thread_local
shared_ptr<> which is periodically updated with such an
assignment. Like with RCU-like
patterns the central atomic<shared_ptr<>> would be
updated rarely and the comparison
would be very quick on mostly shared cachelines.
With current C++ load()ing from a atomic<shared_ptr<>>
is extremely slow, even when you
have a lock-free solution since the cacheline-flipping between the
cores is rather expensive.
I've written my own shared_obj<>- (like shared_ptr<>)
and tshared_obj<>-classes (like atomic
<shared_ptr<>>) and with this optimization I get a
speedup of factor 5,000 when 32 threads
are contending on a single thared_obj<>, i.e. constantly
copying it's pointer to a shared_obj<>
object while the tshared_obj is rarely updated.
This is the code:
template<typename T>
shared_obj<T> &shared_obj<T>::operator =(
tshared_obj<T> const &ts ) noexcept
{
using namespace std;
// important optimization for RCU-like patterns:
// we're managing the same object like the thread-safe
object-pointer ?
if( (size_t)m_ctrl == ts.m_dca.first().load(
memory_order_relaxed ) ) [[likely]]
// yes: nothing to do
return *this;
// erase our object pointer
this->~shared_obj();
// copy() increments the reference count for ourself
m_ctrl = ts.copy();
return *this;
}
Received on 2024-08-28 13:12:04