C++ Logo

std-proposals

Advanced search

Re: [std-proposals] std::shared_ptr<> improvement / 2

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Wed, 28 Aug 2024 14:36:47 +0100
On Wed, 28 Aug 2024 at 14:13, Oliver Schädlich via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> I recently have presented my idea for an improvement of shared_ptr<> in
> combination with
> atomic<shared_ptr<>> here, but I was misunderstood. My idea was simply
> that shared_ptr<>
> should add a new assignement-operator with a atomic<shared_ptr<>> as its
> parameter. The
> trick with that would be that the assignment-operator could check if both
> pointers point to
> the same data and then it could do nothing.
> If you have a RCU-like pattern the participating theads could simply keep
> a thread_local
> shared_ptr<> which is periodically updated with such an assignment. Like
> with RCU-like
> patterns the central atomic<shared_ptr<>> would be updated rarely and the
> comparison
> would be very quick on mostly shared cachelines.
> With current C++ load()ing from a atomic<shared_ptr<>> is extremely slow,
> even when you
> have a lock-free solution since the cacheline-flipping between the cores
> is rather expensive.
> I've written my own shared_obj<>- (like shared_ptr<>) and
> tshared_obj<>-classes (like atomic
> <shared_ptr<>>) and with this optimization I get a speedup of factor 5,000
> when 32 threads
> are contending on a single thared_obj<>, i.e. constantly copying it's
> pointer to a shared_obj<>
> object while the tshared_obj is rarely updated.
>
> This is the code:
>
> template<typename T>
> shared_obj<T> &shared_obj<T>::operator =( tshared_obj<T> const &ts )
> noexcept
> {
> using namespace std;
> // important optimization for RCU-like patterns:
> // we're managing the same object like the thread-safe object-pointer ?
> if( (size_t)m_ctrl == ts.m_dca.first().load( memory_order_relaxed ) )
> [[likely]]
>

The [[likely]] branch prediction is specific to your use case, a similar
function in the standard couldn't make that assumption.

What is the (size_t) cast for? Should that be uintptr_t?


> // yes: nothing to do
> return *this;
> // erase our object pointer
> this->~shared_obj();
>

This looks like undefined behaviour, since you destroy *this and don't
create a new object in its place.


> // copy() increments the reference count for ourself
> m_ctrl = ts.copy();
> return *this;
> }
>

What's the actual optimization happening here? Not performing a reference
count increment/decrement pair in the "do nothing" case? Because above you
say that load() is slow, is that specifically atomic<shared_ptr<T>>::load()
which increments the ref count, or just doing any atomic load from another
cacheline? Because your code above still does an atomic load.

Is your m_ctrl just a pointer? Because std::shared_ptr needs to update two
pointers, not one, so I'm not sure how this actually relates to
std::shared_ptr. What is the equality above equivalent to, shared_ptr's
operator== or std::owner_equal? Or both?

Received on 2024-08-28 13:38:06