Date: Sun, 16 Feb 2025 13:40:26 +0000
Ah, ok to follow on from this the main thing that seems to have passed me by is this in Nate's response:
>I presume that in the vast majority of cases, this happens-before is achieved simply by sequenced-before; within a single thread, with respect to program order, you access the object through a shared_ptr, then destroy that shared_ptr, and don't access the object again after that.
How is sequenced-before equivalent to happens-before without any provisions of ordering?
My understanding is that given the following:
std::atomic<int>* foo = ...
int* bar = ..
int temp = *bar;
foo->fetch_sub(1, std::memory_order_relaxed);
There is nothing that requires the read from *bar to happen before the fetch_sub, because they're from two independent variables and the atomic operation provides no ordering constraints.
However what you seem to be claiming (unless I'm misunderstanding, which I probably am) is that in the case of shared_ptr the fact that the read is sequenced-before the fetch_sub would be sufficient to prevent both the compiler and hardware from moving the read until after the fetch_sub. The thing I'm missing is exactly how that conclusion is being reached.
On 16 February 2025 13:21:31 GMT, Jennifier Burnett via Std-Discussion <std-discussion_at_[hidden]> wrote:
>Sorry just to clarify my reading of your response:
>
>> The standard library can't directly control the ordering between the user code that accesses the object (i.e. using the built-in dereference operator, lvalue-to-rvalue conversions, the class member access operator, etc) and something else.
>
>When you say the standard library can't directly control the ordering between user code accesses and anything else, that's intended to include the call to the deleter? And so in order to ensure that a read from a shared_ptr doesn't race with the deleter it's *intended* that the user needs to manually provide a release fence between the accesses and the shared_ptr destructor, to make sure the accesses stay within the critical section where the lifetime of the object is known alive?
>
>On 16 February 2025 13:03:45 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <nate_at_[hidden]>
>>wrote:
>>
>>>
>>>
>>> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
>>> >
>>> >> And it seems clear that shared_ptr does have to avoid data races on
>>> freeing the control block, which requires there to be some amount of
>>> ordering.
>>> >
>>> > But this ordering can be provided by the single total order on RMW
>>> operations of mo_relaxed, no?
>>>
>>> No. The reference counter is all that provides synchronization between
>>> reads of the control block by "non-last" destructors (i.e. those whose
>>> decrement did not result in zero), and the deletion of the control block by
>>> the "last" destructor. And the only way to get that synchronization is
>>> with both a release and an acquire. (Or a consume, but let's discount those
>>> for now; they say they're being removed from C++26 anyhow.)
>>>
>>> > If you read a refcount of 0 it implies that all other existent
>>> shared_ptrs have also completed their destructors at least to the point of
>>> decrementing the refcount. By definition there cannot be anyone else who
>>> could potentially use the control block while the last owner is inside the
>>> == 0 block. Any reads that happen as part of the delete could potentially
>>> leak outside of the if statement but any writes have to remain inside, lest
>>> a write be invented.
>>>
>>> That's how real machines work, of course, but the C++ model doesn't
>>> promise it. A control dependency without an acquire barrier doesn't
>>> provide any happens-before ordering across threads, and doesn't save you
>>> from a data race and consequent UB. There's no distinction between reads
>>> and writes in this respect, and so either one would be allowed to "leak
>>> out".
>>>
>>> So as I understand, ISO C++ would allow for a hypothetical machine that
>>> could make a write globally visible before the read on which it depended.
>>> For instance, if the machine could peek into the future and see that the
>>> read would inevitably return the correct value. Or, if it could make the
>>> write globally visible while still speculative, and have some way to roll
>>> back the entire system (including other cores) if it turned out to be wrong.
>>>
>>> > The only thing that the acquire barrier could be synchronising in that
>>> case is the *content* of the control block, which shouldn't need to be
>>> synchronised since it's all trivially destructible and not read as part of
>>> the delete.
>>> >
>>> > The mo_release on the decrement also seems unnecessary if you're just
>>> synchronising the state of the control block. The only case you'd need that
>>> would be if you were modifying part of the control block and needed that to
>>> be visible on another thread when it does the acquire, which we've
>>> established isn't nessecary so the paired release isn't either, putting
>>> aside the fact that the refcount is the only thing being modified
>>>
>>> The release is needed for the "non-last" case, since you'll have accesses
>>> to the control blocked sequenced before the decrement, and they need to not
>>> race with the delete in the thread which does do the final decrement.
>>> Again, the memory model insists on both sides of the release-acquire pair.
>>> And there's no trick like with an acquire fence to have a release barrier
>>> conditional on the value read by a RMW.
>>>
>>> > As I understand it if shared_ptr doesn't provide any ordering guarantees
>>> on accesses to the object then all it requires is a STO on the
>>> increments/decrements, which mo_relaxed provides.
>>> >
>>> >> I'd expect it to say something like this: if D is the invocation of the
>>> destructor that invokes the deleter, and D' is any other invocation of the
>>> destructor of a `shared_ptr` that shares ownership with it, then D'
>>> strongly happens before the invocation of the deleter.
>>> >
>>> > Second this as being a good wording to provide ordering between the
>>> destructor and the deleter, but would still need additional wording beyond
>>> this to ensure that the accesses to the shared object happens-before the
>>> destructor also (although it would need to perhaps cover more than just
>>> accesses to the shared object? If a thread modifies an object via a pointer
>>> stored in the shared object, should that modification be visible in the
>>> deleter? Current library behaviour with rel-acq ordering is yes)
>>>
>>
>>The standard library can't directly control the ordering between the user
>>code that accesses the object (i.e. using the built-in dereference
>>operator, lvalue-to-rvalue conversions, the class member access operator,
>>etc) and something else.
>>
>>
>>>
>>> I think it's sufficient. It puts it on the programmer to ensure that
>>> every access to the shared object (other than its deleter) happens-before
>>> the destructor (or reset, move, etc) of at least one of the shared_ptr
>>> objects that owns it. And I think that's what people are already doing,
>>> based on the general understanding of how shared_ptr "should" work, as you
>>> already articulated. (I presume that in the vast majority of cases, this
>>> happens-before is achieved simply by sequenced-before; within a single
>>> thread, with respect to program order, you access the object through a
>>> shared_ptr, then destroy that shared_ptr, and don't access the object again
>>> after that.)
>>>
>>
>>+1
>>
>>
>>>
>>> Assuming they do that, then they have the guarantee you want, because
>>> happens-before is transitive (again I disregard consume for simplicity).
>>> So their modifications of the object happen-before the destructor, which
>>> happens-before the deleter. Therefore their modifications happen-before
>>> the deleter, which means the deleter must observe them.
>>>
>>
>>btw, memory_order_consume will be deprecated in C++26 and given the
>>semantics of memory_order_acquire (which is, I believe, how all
>>implementations currently behave anyway).
>>
>>
>>
>>--
>>*Brian Bi*
>I presume that in the vast majority of cases, this happens-before is achieved simply by sequenced-before; within a single thread, with respect to program order, you access the object through a shared_ptr, then destroy that shared_ptr, and don't access the object again after that.
How is sequenced-before equivalent to happens-before without any provisions of ordering?
My understanding is that given the following:
std::atomic<int>* foo = ...
int* bar = ..
int temp = *bar;
foo->fetch_sub(1, std::memory_order_relaxed);
There is nothing that requires the read from *bar to happen before the fetch_sub, because they're from two independent variables and the atomic operation provides no ordering constraints.
However what you seem to be claiming (unless I'm misunderstanding, which I probably am) is that in the case of shared_ptr the fact that the read is sequenced-before the fetch_sub would be sufficient to prevent both the compiler and hardware from moving the read until after the fetch_sub. The thing I'm missing is exactly how that conclusion is being reached.
On 16 February 2025 13:21:31 GMT, Jennifier Burnett via Std-Discussion <std-discussion_at_[hidden]> wrote:
>Sorry just to clarify my reading of your response:
>
>> The standard library can't directly control the ordering between the user code that accesses the object (i.e. using the built-in dereference operator, lvalue-to-rvalue conversions, the class member access operator, etc) and something else.
>
>When you say the standard library can't directly control the ordering between user code accesses and anything else, that's intended to include the call to the deleter? And so in order to ensure that a read from a shared_ptr doesn't race with the deleter it's *intended* that the user needs to manually provide a release fence between the accesses and the shared_ptr destructor, to make sure the accesses stay within the critical section where the lifetime of the object is known alive?
>
>On 16 February 2025 13:03:45 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <nate_at_[hidden]>
>>wrote:
>>
>>>
>>>
>>> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
>>> >
>>> >> And it seems clear that shared_ptr does have to avoid data races on
>>> freeing the control block, which requires there to be some amount of
>>> ordering.
>>> >
>>> > But this ordering can be provided by the single total order on RMW
>>> operations of mo_relaxed, no?
>>>
>>> No. The reference counter is all that provides synchronization between
>>> reads of the control block by "non-last" destructors (i.e. those whose
>>> decrement did not result in zero), and the deletion of the control block by
>>> the "last" destructor. And the only way to get that synchronization is
>>> with both a release and an acquire. (Or a consume, but let's discount those
>>> for now; they say they're being removed from C++26 anyhow.)
>>>
>>> > If you read a refcount of 0 it implies that all other existent
>>> shared_ptrs have also completed their destructors at least to the point of
>>> decrementing the refcount. By definition there cannot be anyone else who
>>> could potentially use the control block while the last owner is inside the
>>> == 0 block. Any reads that happen as part of the delete could potentially
>>> leak outside of the if statement but any writes have to remain inside, lest
>>> a write be invented.
>>>
>>> That's how real machines work, of course, but the C++ model doesn't
>>> promise it. A control dependency without an acquire barrier doesn't
>>> provide any happens-before ordering across threads, and doesn't save you
>>> from a data race and consequent UB. There's no distinction between reads
>>> and writes in this respect, and so either one would be allowed to "leak
>>> out".
>>>
>>> So as I understand, ISO C++ would allow for a hypothetical machine that
>>> could make a write globally visible before the read on which it depended.
>>> For instance, if the machine could peek into the future and see that the
>>> read would inevitably return the correct value. Or, if it could make the
>>> write globally visible while still speculative, and have some way to roll
>>> back the entire system (including other cores) if it turned out to be wrong.
>>>
>>> > The only thing that the acquire barrier could be synchronising in that
>>> case is the *content* of the control block, which shouldn't need to be
>>> synchronised since it's all trivially destructible and not read as part of
>>> the delete.
>>> >
>>> > The mo_release on the decrement also seems unnecessary if you're just
>>> synchronising the state of the control block. The only case you'd need that
>>> would be if you were modifying part of the control block and needed that to
>>> be visible on another thread when it does the acquire, which we've
>>> established isn't nessecary so the paired release isn't either, putting
>>> aside the fact that the refcount is the only thing being modified
>>>
>>> The release is needed for the "non-last" case, since you'll have accesses
>>> to the control blocked sequenced before the decrement, and they need to not
>>> race with the delete in the thread which does do the final decrement.
>>> Again, the memory model insists on both sides of the release-acquire pair.
>>> And there's no trick like with an acquire fence to have a release barrier
>>> conditional on the value read by a RMW.
>>>
>>> > As I understand it if shared_ptr doesn't provide any ordering guarantees
>>> on accesses to the object then all it requires is a STO on the
>>> increments/decrements, which mo_relaxed provides.
>>> >
>>> >> I'd expect it to say something like this: if D is the invocation of the
>>> destructor that invokes the deleter, and D' is any other invocation of the
>>> destructor of a `shared_ptr` that shares ownership with it, then D'
>>> strongly happens before the invocation of the deleter.
>>> >
>>> > Second this as being a good wording to provide ordering between the
>>> destructor and the deleter, but would still need additional wording beyond
>>> this to ensure that the accesses to the shared object happens-before the
>>> destructor also (although it would need to perhaps cover more than just
>>> accesses to the shared object? If a thread modifies an object via a pointer
>>> stored in the shared object, should that modification be visible in the
>>> deleter? Current library behaviour with rel-acq ordering is yes)
>>>
>>
>>The standard library can't directly control the ordering between the user
>>code that accesses the object (i.e. using the built-in dereference
>>operator, lvalue-to-rvalue conversions, the class member access operator,
>>etc) and something else.
>>
>>
>>>
>>> I think it's sufficient. It puts it on the programmer to ensure that
>>> every access to the shared object (other than its deleter) happens-before
>>> the destructor (or reset, move, etc) of at least one of the shared_ptr
>>> objects that owns it. And I think that's what people are already doing,
>>> based on the general understanding of how shared_ptr "should" work, as you
>>> already articulated. (I presume that in the vast majority of cases, this
>>> happens-before is achieved simply by sequenced-before; within a single
>>> thread, with respect to program order, you access the object through a
>>> shared_ptr, then destroy that shared_ptr, and don't access the object again
>>> after that.)
>>>
>>
>>+1
>>
>>
>>>
>>> Assuming they do that, then they have the guarantee you want, because
>>> happens-before is transitive (again I disregard consume for simplicity).
>>> So their modifications of the object happen-before the destructor, which
>>> happens-before the deleter. Therefore their modifications happen-before
>>> the deleter, which means the deleter must observe them.
>>>
>>
>>btw, memory_order_consume will be deprecated in C++26 and given the
>>semantics of memory_order_acquire (which is, I believe, how all
>>implementations currently behave anyway).
>>
>>
>>
>>--
>>*Brian Bi*
Received on 2025-02-16 13:40:36