Date: Sun, 16 Feb 2025 14:03:45 +0100
On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <nate_at_[hidden]>
wrote:
>
>
> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
> >
> >> And it seems clear that shared_ptr does have to avoid data races on
> freeing the control block, which requires there to be some amount of
> ordering.
> >
> > But this ordering can be provided by the single total order on RMW
> operations of mo_relaxed, no?
>
> No. The reference counter is all that provides synchronization between
> reads of the control block by "non-last" destructors (i.e. those whose
> decrement did not result in zero), and the deletion of the control block by
> the "last" destructor. And the only way to get that synchronization is
> with both a release and an acquire. (Or a consume, but let's discount those
> for now; they say they're being removed from C++26 anyhow.)
>
> > If you read a refcount of 0 it implies that all other existent
> shared_ptrs have also completed their destructors at least to the point of
> decrementing the refcount. By definition there cannot be anyone else who
> could potentially use the control block while the last owner is inside the
> == 0 block. Any reads that happen as part of the delete could potentially
> leak outside of the if statement but any writes have to remain inside, lest
> a write be invented.
>
> That's how real machines work, of course, but the C++ model doesn't
> promise it. A control dependency without an acquire barrier doesn't
> provide any happens-before ordering across threads, and doesn't save you
> from a data race and consequent UB. There's no distinction between reads
> and writes in this respect, and so either one would be allowed to "leak
> out".
>
> So as I understand, ISO C++ would allow for a hypothetical machine that
> could make a write globally visible before the read on which it depended.
> For instance, if the machine could peek into the future and see that the
> read would inevitably return the correct value. Or, if it could make the
> write globally visible while still speculative, and have some way to roll
> back the entire system (including other cores) if it turned out to be wrong.
>
> > The only thing that the acquire barrier could be synchronising in that
> case is the *content* of the control block, which shouldn't need to be
> synchronised since it's all trivially destructible and not read as part of
> the delete.
> >
> > The mo_release on the decrement also seems unnecessary if you're just
> synchronising the state of the control block. The only case you'd need that
> would be if you were modifying part of the control block and needed that to
> be visible on another thread when it does the acquire, which we've
> established isn't nessecary so the paired release isn't either, putting
> aside the fact that the refcount is the only thing being modified
>
> The release is needed for the "non-last" case, since you'll have accesses
> to the control blocked sequenced before the decrement, and they need to not
> race with the delete in the thread which does do the final decrement.
> Again, the memory model insists on both sides of the release-acquire pair.
> And there's no trick like with an acquire fence to have a release barrier
> conditional on the value read by a RMW.
>
> > As I understand it if shared_ptr doesn't provide any ordering guarantees
> on accesses to the object then all it requires is a STO on the
> increments/decrements, which mo_relaxed provides.
> >
> >> I'd expect it to say something like this: if D is the invocation of the
> destructor that invokes the deleter, and D' is any other invocation of the
> destructor of a `shared_ptr` that shares ownership with it, then D'
> strongly happens before the invocation of the deleter.
> >
> > Second this as being a good wording to provide ordering between the
> destructor and the deleter, but would still need additional wording beyond
> this to ensure that the accesses to the shared object happens-before the
> destructor also (although it would need to perhaps cover more than just
> accesses to the shared object? If a thread modifies an object via a pointer
> stored in the shared object, should that modification be visible in the
> deleter? Current library behaviour with rel-acq ordering is yes)
>
The standard library can't directly control the ordering between the user
code that accesses the object (i.e. using the built-in dereference
operator, lvalue-to-rvalue conversions, the class member access operator,
etc) and something else.
>
> I think it's sufficient. It puts it on the programmer to ensure that
> every access to the shared object (other than its deleter) happens-before
> the destructor (or reset, move, etc) of at least one of the shared_ptr
> objects that owns it. And I think that's what people are already doing,
> based on the general understanding of how shared_ptr "should" work, as you
> already articulated. (I presume that in the vast majority of cases, this
> happens-before is achieved simply by sequenced-before; within a single
> thread, with respect to program order, you access the object through a
> shared_ptr, then destroy that shared_ptr, and don't access the object again
> after that.)
>
+1
>
> Assuming they do that, then they have the guarantee you want, because
> happens-before is transitive (again I disregard consume for simplicity).
> So their modifications of the object happen-before the destructor, which
> happens-before the deleter. Therefore their modifications happen-before
> the deleter, which means the deleter must observe them.
>
btw, memory_order_consume will be deprecated in C++26 and given the
semantics of memory_order_acquire (which is, I believe, how all
implementations currently behave anyway).
wrote:
>
>
> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
> >
> >> And it seems clear that shared_ptr does have to avoid data races on
> freeing the control block, which requires there to be some amount of
> ordering.
> >
> > But this ordering can be provided by the single total order on RMW
> operations of mo_relaxed, no?
>
> No. The reference counter is all that provides synchronization between
> reads of the control block by "non-last" destructors (i.e. those whose
> decrement did not result in zero), and the deletion of the control block by
> the "last" destructor. And the only way to get that synchronization is
> with both a release and an acquire. (Or a consume, but let's discount those
> for now; they say they're being removed from C++26 anyhow.)
>
> > If you read a refcount of 0 it implies that all other existent
> shared_ptrs have also completed their destructors at least to the point of
> decrementing the refcount. By definition there cannot be anyone else who
> could potentially use the control block while the last owner is inside the
> == 0 block. Any reads that happen as part of the delete could potentially
> leak outside of the if statement but any writes have to remain inside, lest
> a write be invented.
>
> That's how real machines work, of course, but the C++ model doesn't
> promise it. A control dependency without an acquire barrier doesn't
> provide any happens-before ordering across threads, and doesn't save you
> from a data race and consequent UB. There's no distinction between reads
> and writes in this respect, and so either one would be allowed to "leak
> out".
>
> So as I understand, ISO C++ would allow for a hypothetical machine that
> could make a write globally visible before the read on which it depended.
> For instance, if the machine could peek into the future and see that the
> read would inevitably return the correct value. Or, if it could make the
> write globally visible while still speculative, and have some way to roll
> back the entire system (including other cores) if it turned out to be wrong.
>
> > The only thing that the acquire barrier could be synchronising in that
> case is the *content* of the control block, which shouldn't need to be
> synchronised since it's all trivially destructible and not read as part of
> the delete.
> >
> > The mo_release on the decrement also seems unnecessary if you're just
> synchronising the state of the control block. The only case you'd need that
> would be if you were modifying part of the control block and needed that to
> be visible on another thread when it does the acquire, which we've
> established isn't nessecary so the paired release isn't either, putting
> aside the fact that the refcount is the only thing being modified
>
> The release is needed for the "non-last" case, since you'll have accesses
> to the control blocked sequenced before the decrement, and they need to not
> race with the delete in the thread which does do the final decrement.
> Again, the memory model insists on both sides of the release-acquire pair.
> And there's no trick like with an acquire fence to have a release barrier
> conditional on the value read by a RMW.
>
> > As I understand it if shared_ptr doesn't provide any ordering guarantees
> on accesses to the object then all it requires is a STO on the
> increments/decrements, which mo_relaxed provides.
> >
> >> I'd expect it to say something like this: if D is the invocation of the
> destructor that invokes the deleter, and D' is any other invocation of the
> destructor of a `shared_ptr` that shares ownership with it, then D'
> strongly happens before the invocation of the deleter.
> >
> > Second this as being a good wording to provide ordering between the
> destructor and the deleter, but would still need additional wording beyond
> this to ensure that the accesses to the shared object happens-before the
> destructor also (although it would need to perhaps cover more than just
> accesses to the shared object? If a thread modifies an object via a pointer
> stored in the shared object, should that modification be visible in the
> deleter? Current library behaviour with rel-acq ordering is yes)
>
The standard library can't directly control the ordering between the user
code that accesses the object (i.e. using the built-in dereference
operator, lvalue-to-rvalue conversions, the class member access operator,
etc) and something else.
>
> I think it's sufficient. It puts it on the programmer to ensure that
> every access to the shared object (other than its deleter) happens-before
> the destructor (or reset, move, etc) of at least one of the shared_ptr
> objects that owns it. And I think that's what people are already doing,
> based on the general understanding of how shared_ptr "should" work, as you
> already articulated. (I presume that in the vast majority of cases, this
> happens-before is achieved simply by sequenced-before; within a single
> thread, with respect to program order, you access the object through a
> shared_ptr, then destroy that shared_ptr, and don't access the object again
> after that.)
>
+1
>
> Assuming they do that, then they have the guarantee you want, because
> happens-before is transitive (again I disregard consume for simplicity).
> So their modifications of the object happen-before the destructor, which
> happens-before the deleter. Therefore their modifications happen-before
> the deleter, which means the deleter must observe them.
>
btw, memory_order_consume will be deprecated in C++26 and given the
semantics of memory_order_acquire (which is, I believe, how all
implementations currently behave anyway).
-- *Brian Bi*
Received on 2025-02-16 13:04:00