Date: Sun, 16 Feb 2025 15:17:35 +0100
On Sun, Feb 16, 2025 at 3:02 PM Jennifier Burnett <jenni_at_[hidden]>
wrote:
> Right, so to put it together:
>
> - We want to ensure an acquire fence (or equivalent) between the fetch_sub
> and the deleter
> - The acquire fence won't establish a happens-before relationship with a
> relaxed decrement in another thread, even though logically we could prove
> that all the other relaxed decrements must've already happened there's no
> "official" ordering
> - Therefore the decrement must be release or stronger in order to pair
> with the acquire
>
> Is that right? If I've got that then I'm on the same page. If the
> decrement is logically required to be a release (or equivalent) by your
> wording then my previous concerns are unfounded.
>
Yes, I think that's what we all expect implementations to do. (There's some
debate over whether it's faster to always use acquire-release when
decrementing, or just release and then the conditional acquire fence when
you're the last owner left.) And in library wording you would say that
using the "strongly happens before" concept I mentioned.
Please file an LWG issue. Instructions can be found here:
https://cplusplus.github.io/LWG/lwg-active.html#submit_issue
(If LWG thinks that the existing wording already establishes the required
ordering somehow, at least they will explain it in their response, and
we'll all learn something.)
>
>
> On 16 February 2025 13:45:13 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>
>>
>>
>> On Sun, Feb 16, 2025 at 2:40 PM Jennifier Burnett via Std-Discussion <
>> std-discussion_at_[hidden]> wrote:
>>
>>> Ah, ok to follow on from this the main thing that seems to have passed
>>> me by is this in Nate's response:
>>>
>>> >I presume that in the vast majority of cases, this happens-before is
>>> achieved simply by sequenced-before; within a single thread, with respect
>>> to program order, you access the object through a shared_ptr, then destroy
>>> that shared_ptr, and don't access the object again after that.
>>>
>>> How is sequenced-before equivalent to happens-before without any
>>> provisions of ordering?
>>>
>>> My understanding is that given the following:
>>>
>>> std::atomic<int>* foo = ...
>>> int* bar = ..
>>>
>>> int temp = *bar;
>>> foo->fetch_sub(1, std::memory_order_relaxed);
>>>
>>> There is nothing that requires the read from *bar to happen before the
>>> fetch_sub, because they're from two independent variables and the atomic
>>> operation provides no ordering constraints.
>>>
>>
>> Assuming that the last two statements are being executed by the same
>> thread, they have a happens-before relationship because of their
>> sequenced-before relationship.
>>
>> But between two threads, you only get a happens-before edge if there is
>> synchronization (which is not provided by relaxed atomic operations). So, a
>> relaxed read that reads a value stored by a relaxed write in some other
>> thread does not actually happen after the write, which is why you can't
>> conclude that anything sequenced after that read observes anything
>> sequenced before the write.
>>
>>
>>>
>>> However what you seem to be claiming (unless I'm misunderstanding, which
>>> I probably am) is that in the case of shared_ptr the fact that the read is
>>> sequenced-before the fetch_sub would be sufficient to prevent both the
>>> compiler and hardware from moving the read until after the fetch_sub. The
>>> thing I'm missing is exactly how that conclusion is being reached.
>>>
>>>
>>>
>>> On 16 February 2025 13:21:31 GMT, Jennifier Burnett via Std-Discussion <
>>> std-discussion_at_[hidden]> wrote:
>>>
>>>> Sorry just to clarify my reading of your response:
>>>>
>>>> > The standard library can't directly control the ordering between the
>>>> user code that accesses the object (i.e. using the built-in dereference
>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>> etc) and something else.
>>>>
>>>> When you say the standard library can't directly control the ordering
>>>> between user code accesses and anything else, that's intended to include
>>>> the call to the deleter? And so in order to ensure that a read from a
>>>> shared_ptr doesn't race with the deleter it's *intended* that the user
>>>> needs to manually provide a release fence between the accesses and the
>>>> shared_ptr destructor, to make sure the accesses stay within the critical
>>>> section where the lifetime of the object is known alive?
>>>>
>>>>
>>>> On 16 February 2025 13:03:45 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <
>>>>> nate_at_[hidden]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
>>>>>> >
>>>>>> >> And it seems clear that shared_ptr does have to avoid data races
>>>>>> on freeing the control block, which requires there to be some amount of
>>>>>> ordering.
>>>>>> >
>>>>>> > But this ordering can be provided by the single total order on RMW
>>>>>> operations of mo_relaxed, no?
>>>>>>
>>>>>> No. The reference counter is all that provides synchronization
>>>>>> between reads of the control block by "non-last" destructors (i.e. those
>>>>>> whose decrement did not result in zero), and the deletion of the control
>>>>>> block by the "last" destructor. And the only way to get that
>>>>>> synchronization is with both a release and an acquire. (Or a consume, but
>>>>>> let's discount those for now; they say they're being removed from C++26
>>>>>> anyhow.)
>>>>>>
>>>>>> > If you read a refcount of 0 it implies that all other existent
>>>>>> shared_ptrs have also completed their destructors at least to the point of
>>>>>> decrementing the refcount. By definition there cannot be anyone else who
>>>>>> could potentially use the control block while the last owner is inside the
>>>>>> == 0 block. Any reads that happen as part of the delete could potentially
>>>>>> leak outside of the if statement but any writes have to remain inside, lest
>>>>>> a write be invented.
>>>>>>
>>>>>> That's how real machines work, of course, but the C++ model doesn't
>>>>>> promise it. A control dependency without an acquire barrier doesn't
>>>>>> provide any happens-before ordering across threads, and doesn't save you
>>>>>> from a data race and consequent UB. There's no distinction between reads
>>>>>> and writes in this respect, and so either one would be allowed to "leak
>>>>>> out".
>>>>>>
>>>>>> So as I understand, ISO C++ would allow for a hypothetical machine
>>>>>> that could make a write globally visible before the read on which it
>>>>>> depended. For instance, if the machine could peek into the future and see
>>>>>> that the read would inevitably return the correct value. Or, if it could
>>>>>> make the write globally visible while still speculative, and have some way
>>>>>> to roll back the entire system (including other cores) if it turned out to
>>>>>> be wrong.
>>>>>>
>>>>>> > The only thing that the acquire barrier could be synchronising in
>>>>>> that case is the *content* of the control block, which shouldn't need to be
>>>>>> synchronised since it's all trivially destructible and not read as part of
>>>>>> the delete.
>>>>>> >
>>>>>> > The mo_release on the decrement also seems unnecessary if you're
>>>>>> just synchronising the state of the control block. The only case you'd need
>>>>>> that would be if you were modifying part of the control block and needed
>>>>>> that to be visible on another thread when it does the acquire, which we've
>>>>>> established isn't nessecary so the paired release isn't either, putting
>>>>>> aside the fact that the refcount is the only thing being modified
>>>>>>
>>>>>> The release is needed for the "non-last" case, since you'll have
>>>>>> accesses to the control blocked sequenced before the decrement, and they
>>>>>> need to not race with the delete in the thread which does do the final
>>>>>> decrement. Again, the memory model insists on both sides of the
>>>>>> release-acquire pair. And there's no trick like with an acquire fence to
>>>>>> have a release barrier conditional on the value read by a RMW.
>>>>>>
>>>>>> > As I understand it if shared_ptr doesn't provide any ordering
>>>>>> guarantees on accesses to the object then all it requires is a STO on the
>>>>>> increments/decrements, which mo_relaxed provides.
>>>>>> >
>>>>>> >> I'd expect it to say something like this: if D is the invocation
>>>>>> of the destructor that invokes the deleter, and D' is any other invocation
>>>>>> of the destructor of a `shared_ptr` that shares ownership with it, then D'
>>>>>> strongly happens before the invocation of the deleter.
>>>>>> >
>>>>>> > Second this as being a good wording to provide ordering between the
>>>>>> destructor and the deleter, but would still need additional wording beyond
>>>>>> this to ensure that the accesses to the shared object happens-before the
>>>>>> destructor also (although it would need to perhaps cover more than just
>>>>>> accesses to the shared object? If a thread modifies an object via a pointer
>>>>>> stored in the shared object, should that modification be visible in the
>>>>>> deleter? Current library behaviour with rel-acq ordering is yes)
>>>>>>
>>>>>
>>>>> The standard library can't directly control the ordering between the
>>>>> user code that accesses the object (i.e. using the built-in dereference
>>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>>> etc) and something else.
>>>>>
>>>>>
>>>>>>
>>>>>> I think it's sufficient. It puts it on the programmer to ensure that
>>>>>> every access to the shared object (other than its deleter) happens-before
>>>>>> the destructor (or reset, move, etc) of at least one of the shared_ptr
>>>>>> objects that owns it. And I think that's what people are already doing,
>>>>>> based on the general understanding of how shared_ptr "should" work, as you
>>>>>> already articulated. (I presume that in the vast majority of cases, this
>>>>>> happens-before is achieved simply by sequenced-before; within a single
>>>>>> thread, with respect to program order, you access the object through a
>>>>>> shared_ptr, then destroy that shared_ptr, and don't access the object again
>>>>>> after that.)
>>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>>
>>>>>> Assuming they do that, then they have the guarantee you want, because
>>>>>> happens-before is transitive (again I disregard consume for simplicity).
>>>>>> So their modifications of the object happen-before the destructor, which
>>>>>> happens-before the deleter. Therefore their modifications happen-before
>>>>>> the deleter, which means the deleter must observe them.
>>>>>>
>>>>>
>>>>> btw, memory_order_consume will be deprecated in C++26 and given the
>>>>> semantics of memory_order_acquire (which is, I believe, how all
>>>>> implementations currently behave anyway).
>>>>>
>>>>>
>>>>> --
>>> Std-Discussion mailing list
>>> Std-Discussion_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion
>>>
>>
>>
wrote:
> Right, so to put it together:
>
> - We want to ensure an acquire fence (or equivalent) between the fetch_sub
> and the deleter
> - The acquire fence won't establish a happens-before relationship with a
> relaxed decrement in another thread, even though logically we could prove
> that all the other relaxed decrements must've already happened there's no
> "official" ordering
> - Therefore the decrement must be release or stronger in order to pair
> with the acquire
>
> Is that right? If I've got that then I'm on the same page. If the
> decrement is logically required to be a release (or equivalent) by your
> wording then my previous concerns are unfounded.
>
Yes, I think that's what we all expect implementations to do. (There's some
debate over whether it's faster to always use acquire-release when
decrementing, or just release and then the conditional acquire fence when
you're the last owner left.) And in library wording you would say that
using the "strongly happens before" concept I mentioned.
Please file an LWG issue. Instructions can be found here:
https://cplusplus.github.io/LWG/lwg-active.html#submit_issue
(If LWG thinks that the existing wording already establishes the required
ordering somehow, at least they will explain it in their response, and
we'll all learn something.)
>
>
> On 16 February 2025 13:45:13 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>
>>
>>
>> On Sun, Feb 16, 2025 at 2:40 PM Jennifier Burnett via Std-Discussion <
>> std-discussion_at_[hidden]> wrote:
>>
>>> Ah, ok to follow on from this the main thing that seems to have passed
>>> me by is this in Nate's response:
>>>
>>> >I presume that in the vast majority of cases, this happens-before is
>>> achieved simply by sequenced-before; within a single thread, with respect
>>> to program order, you access the object through a shared_ptr, then destroy
>>> that shared_ptr, and don't access the object again after that.
>>>
>>> How is sequenced-before equivalent to happens-before without any
>>> provisions of ordering?
>>>
>>> My understanding is that given the following:
>>>
>>> std::atomic<int>* foo = ...
>>> int* bar = ..
>>>
>>> int temp = *bar;
>>> foo->fetch_sub(1, std::memory_order_relaxed);
>>>
>>> There is nothing that requires the read from *bar to happen before the
>>> fetch_sub, because they're from two independent variables and the atomic
>>> operation provides no ordering constraints.
>>>
>>
>> Assuming that the last two statements are being executed by the same
>> thread, they have a happens-before relationship because of their
>> sequenced-before relationship.
>>
>> But between two threads, you only get a happens-before edge if there is
>> synchronization (which is not provided by relaxed atomic operations). So, a
>> relaxed read that reads a value stored by a relaxed write in some other
>> thread does not actually happen after the write, which is why you can't
>> conclude that anything sequenced after that read observes anything
>> sequenced before the write.
>>
>>
>>>
>>> However what you seem to be claiming (unless I'm misunderstanding, which
>>> I probably am) is that in the case of shared_ptr the fact that the read is
>>> sequenced-before the fetch_sub would be sufficient to prevent both the
>>> compiler and hardware from moving the read until after the fetch_sub. The
>>> thing I'm missing is exactly how that conclusion is being reached.
>>>
>>>
>>>
>>> On 16 February 2025 13:21:31 GMT, Jennifier Burnett via Std-Discussion <
>>> std-discussion_at_[hidden]> wrote:
>>>
>>>> Sorry just to clarify my reading of your response:
>>>>
>>>> > The standard library can't directly control the ordering between the
>>>> user code that accesses the object (i.e. using the built-in dereference
>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>> etc) and something else.
>>>>
>>>> When you say the standard library can't directly control the ordering
>>>> between user code accesses and anything else, that's intended to include
>>>> the call to the deleter? And so in order to ensure that a read from a
>>>> shared_ptr doesn't race with the deleter it's *intended* that the user
>>>> needs to manually provide a release fence between the accesses and the
>>>> shared_ptr destructor, to make sure the accesses stay within the critical
>>>> section where the lifetime of the object is known alive?
>>>>
>>>>
>>>> On 16 February 2025 13:03:45 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <
>>>>> nate_at_[hidden]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
>>>>>> >
>>>>>> >> And it seems clear that shared_ptr does have to avoid data races
>>>>>> on freeing the control block, which requires there to be some amount of
>>>>>> ordering.
>>>>>> >
>>>>>> > But this ordering can be provided by the single total order on RMW
>>>>>> operations of mo_relaxed, no?
>>>>>>
>>>>>> No. The reference counter is all that provides synchronization
>>>>>> between reads of the control block by "non-last" destructors (i.e. those
>>>>>> whose decrement did not result in zero), and the deletion of the control
>>>>>> block by the "last" destructor. And the only way to get that
>>>>>> synchronization is with both a release and an acquire. (Or a consume, but
>>>>>> let's discount those for now; they say they're being removed from C++26
>>>>>> anyhow.)
>>>>>>
>>>>>> > If you read a refcount of 0 it implies that all other existent
>>>>>> shared_ptrs have also completed their destructors at least to the point of
>>>>>> decrementing the refcount. By definition there cannot be anyone else who
>>>>>> could potentially use the control block while the last owner is inside the
>>>>>> == 0 block. Any reads that happen as part of the delete could potentially
>>>>>> leak outside of the if statement but any writes have to remain inside, lest
>>>>>> a write be invented.
>>>>>>
>>>>>> That's how real machines work, of course, but the C++ model doesn't
>>>>>> promise it. A control dependency without an acquire barrier doesn't
>>>>>> provide any happens-before ordering across threads, and doesn't save you
>>>>>> from a data race and consequent UB. There's no distinction between reads
>>>>>> and writes in this respect, and so either one would be allowed to "leak
>>>>>> out".
>>>>>>
>>>>>> So as I understand, ISO C++ would allow for a hypothetical machine
>>>>>> that could make a write globally visible before the read on which it
>>>>>> depended. For instance, if the machine could peek into the future and see
>>>>>> that the read would inevitably return the correct value. Or, if it could
>>>>>> make the write globally visible while still speculative, and have some way
>>>>>> to roll back the entire system (including other cores) if it turned out to
>>>>>> be wrong.
>>>>>>
>>>>>> > The only thing that the acquire barrier could be synchronising in
>>>>>> that case is the *content* of the control block, which shouldn't need to be
>>>>>> synchronised since it's all trivially destructible and not read as part of
>>>>>> the delete.
>>>>>> >
>>>>>> > The mo_release on the decrement also seems unnecessary if you're
>>>>>> just synchronising the state of the control block. The only case you'd need
>>>>>> that would be if you were modifying part of the control block and needed
>>>>>> that to be visible on another thread when it does the acquire, which we've
>>>>>> established isn't nessecary so the paired release isn't either, putting
>>>>>> aside the fact that the refcount is the only thing being modified
>>>>>>
>>>>>> The release is needed for the "non-last" case, since you'll have
>>>>>> accesses to the control blocked sequenced before the decrement, and they
>>>>>> need to not race with the delete in the thread which does do the final
>>>>>> decrement. Again, the memory model insists on both sides of the
>>>>>> release-acquire pair. And there's no trick like with an acquire fence to
>>>>>> have a release barrier conditional on the value read by a RMW.
>>>>>>
>>>>>> > As I understand it if shared_ptr doesn't provide any ordering
>>>>>> guarantees on accesses to the object then all it requires is a STO on the
>>>>>> increments/decrements, which mo_relaxed provides.
>>>>>> >
>>>>>> >> I'd expect it to say something like this: if D is the invocation
>>>>>> of the destructor that invokes the deleter, and D' is any other invocation
>>>>>> of the destructor of a `shared_ptr` that shares ownership with it, then D'
>>>>>> strongly happens before the invocation of the deleter.
>>>>>> >
>>>>>> > Second this as being a good wording to provide ordering between the
>>>>>> destructor and the deleter, but would still need additional wording beyond
>>>>>> this to ensure that the accesses to the shared object happens-before the
>>>>>> destructor also (although it would need to perhaps cover more than just
>>>>>> accesses to the shared object? If a thread modifies an object via a pointer
>>>>>> stored in the shared object, should that modification be visible in the
>>>>>> deleter? Current library behaviour with rel-acq ordering is yes)
>>>>>>
>>>>>
>>>>> The standard library can't directly control the ordering between the
>>>>> user code that accesses the object (i.e. using the built-in dereference
>>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>>> etc) and something else.
>>>>>
>>>>>
>>>>>>
>>>>>> I think it's sufficient. It puts it on the programmer to ensure that
>>>>>> every access to the shared object (other than its deleter) happens-before
>>>>>> the destructor (or reset, move, etc) of at least one of the shared_ptr
>>>>>> objects that owns it. And I think that's what people are already doing,
>>>>>> based on the general understanding of how shared_ptr "should" work, as you
>>>>>> already articulated. (I presume that in the vast majority of cases, this
>>>>>> happens-before is achieved simply by sequenced-before; within a single
>>>>>> thread, with respect to program order, you access the object through a
>>>>>> shared_ptr, then destroy that shared_ptr, and don't access the object again
>>>>>> after that.)
>>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>>
>>>>>> Assuming they do that, then they have the guarantee you want, because
>>>>>> happens-before is transitive (again I disregard consume for simplicity).
>>>>>> So their modifications of the object happen-before the destructor, which
>>>>>> happens-before the deleter. Therefore their modifications happen-before
>>>>>> the deleter, which means the deleter must observe them.
>>>>>>
>>>>>
>>>>> btw, memory_order_consume will be deprecated in C++26 and given the
>>>>> semantics of memory_order_acquire (which is, I believe, how all
>>>>> implementations currently behave anyway).
>>>>>
>>>>>
>>>>> --
>>> Std-Discussion mailing list
>>> Std-Discussion_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion
>>>
>>
>>
-- *Brian Bi*
Received on 2025-02-16 14:17:57