ISOCPP Std Discussion List: Re: Lack of ordering guarantees in shared

From: Jennifier Burnett <jenni_at_[hidden]>
Date: Sun, 16 Feb 2025 14:22:32 +0000

Great, thanks for taking the time to explain.

On 16 February 2025 14:17:35 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>On Sun, Feb 16, 2025 at 3:02 PM Jennifier Burnett <jenni_at_[hidden]>
>wrote:
>
>> Right, so to put it together:
>>
>> - We want to ensure an acquire fence (or equivalent) between the fetch_sub
>> and the deleter
>> - The acquire fence won't establish a happens-before relationship with a
>> relaxed decrement in another thread, even though logically we could prove
>> that all the other relaxed decrements must've already happened there's no
>> "official" ordering
>> - Therefore the decrement must be release or stronger in order to pair
>> with the acquire
>>
>
>> Is that right? If I've got that then I'm on the same page. If the
>> decrement is logically required to be a release (or equivalent) by your
>> wording then my previous concerns are unfounded.
>>
>
>Yes, I think that's what we all expect implementations to do. (There's some
>debate over whether it's faster to always use acquire-release when
>decrementing, or just release and then the conditional acquire fence when
>you're the last owner left.) And in library wording you would say that
>using the "strongly happens before" concept I mentioned.
>
>Please file an LWG issue. Instructions can be found here:
>https://cplusplus.github.io/LWG/lwg-active.html#submit_issue
>
>(If LWG thinks that the existing wording already establishes the required
>ordering somehow, at least they will explain it in their response, and
>we'll all learn something.)
>
>
>>
>>
>> On 16 February 2025 13:45:13 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>
>>>
>>>
>>> On Sun, Feb 16, 2025 at 2:40 PM Jennifier Burnett via Std-Discussion <
>>> std-discussion_at_[hidden]> wrote:
>>>
>>>> Ah, ok to follow on from this the main thing that seems to have passed
>>>> me by is this in Nate's response:
>>>>
>>>> >I presume that in the vast majority of cases, this happens-before is
>>>> achieved simply by sequenced-before; within a single thread, with respect
>>>> to program order, you access the object through a shared_ptr, then destroy
>>>> that shared_ptr, and don't access the object again after that.
>>>>
>>>> How is sequenced-before equivalent to happens-before without any
>>>> provisions of ordering?
>>>>
>>>> My understanding is that given the following:
>>>>
>>>> std::atomic<int>* foo = ...
>>>> int* bar = ..
>>>>
>>>> int temp = *bar;
>>>> foo->fetch_sub(1, std::memory_order_relaxed);
>>>>
>>>> There is nothing that requires the read from *bar to happen before the
>>>> fetch_sub, because they're from two independent variables and the atomic
>>>> operation provides no ordering constraints.
>>>>
>>>
>>> Assuming that the last two statements are being executed by the same
>>> thread, they have a happens-before relationship because of their
>>> sequenced-before relationship.
>>>
>>> But between two threads, you only get a happens-before edge if there is
>>> synchronization (which is not provided by relaxed atomic operations). So, a
>>> relaxed read that reads a value stored by a relaxed write in some other
>>> thread does not actually happen after the write, which is why you can't
>>> conclude that anything sequenced after that read observes anything
>>> sequenced before the write.
>>>
>>>
>>>>
>>>> However what you seem to be claiming (unless I'm misunderstanding, which
>>>> I probably am) is that in the case of shared_ptr the fact that the read is
>>>> sequenced-before the fetch_sub would be sufficient to prevent both the
>>>> compiler and hardware from moving the read until after the fetch_sub. The
>>>> thing I'm missing is exactly how that conclusion is being reached.
>>>>
>>>>
>>>>
>>>> On 16 February 2025 13:21:31 GMT, Jennifier Burnett via Std-Discussion <
>>>> std-discussion_at_[hidden]> wrote:
>>>>
>>>>> Sorry just to clarify my reading of your response:
>>>>>
>>>>> > The standard library can't directly control the ordering between the
>>>>> user code that accesses the object (i.e. using the built-in dereference
>>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>>> etc) and something else.
>>>>>
>>>>> When you say the standard library can't directly control the ordering
>>>>> between user code accesses and anything else, that's intended to include
>>>>> the call to the deleter? And so in order to ensure that a read from a
>>>>> shared_ptr doesn't race with the deleter it's *intended* that the user
>>>>> needs to manually provide a release fence between the accesses and the
>>>>> shared_ptr destructor, to make sure the accesses stay within the critical
>>>>> section where the lifetime of the object is known alive?
>>>>>
>>>>>
>>>>> On 16 February 2025 13:03:45 GMT, Brian Bi <bbi5291_at_[hidden]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 16, 2025 at 4:34 AM Nate Eldredge <
>>>>>> nate_at_[hidden]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> > On Feb 15, 2025, at 19:22, jenni_at_[hidden] wrote:
>>>>>>> >
>>>>>>> >> And it seems clear that shared_ptr does have to avoid data races
>>>>>>> on freeing the control block, which requires there to be some amount of
>>>>>>> ordering.
>>>>>>> >
>>>>>>> > But this ordering can be provided by the single total order on RMW
>>>>>>> operations of mo_relaxed, no?
>>>>>>>
>>>>>>> No. The reference counter is all that provides synchronization
>>>>>>> between reads of the control block by "non-last" destructors (i.e. those
>>>>>>> whose decrement did not result in zero), and the deletion of the control
>>>>>>> block by the "last" destructor. And the only way to get that
>>>>>>> synchronization is with both a release and an acquire. (Or a consume, but
>>>>>>> let's discount those for now; they say they're being removed from C++26
>>>>>>> anyhow.)
>>>>>>>
>>>>>>> > If you read a refcount of 0 it implies that all other existent
>>>>>>> shared_ptrs have also completed their destructors at least to the point of
>>>>>>> decrementing the refcount. By definition there cannot be anyone else who
>>>>>>> could potentially use the control block while the last owner is inside the
>>>>>>> == 0 block. Any reads that happen as part of the delete could potentially
>>>>>>> leak outside of the if statement but any writes have to remain inside, lest
>>>>>>> a write be invented.
>>>>>>>
>>>>>>> That's how real machines work, of course, but the C++ model doesn't
>>>>>>> promise it. A control dependency without an acquire barrier doesn't
>>>>>>> provide any happens-before ordering across threads, and doesn't save you
>>>>>>> from a data race and consequent UB. There's no distinction between reads
>>>>>>> and writes in this respect, and so either one would be allowed to "leak
>>>>>>> out".
>>>>>>>
>>>>>>> So as I understand, ISO C++ would allow for a hypothetical machine
>>>>>>> that could make a write globally visible before the read on which it
>>>>>>> depended. For instance, if the machine could peek into the future and see
>>>>>>> that the read would inevitably return the correct value. Or, if it could
>>>>>>> make the write globally visible while still speculative, and have some way
>>>>>>> to roll back the entire system (including other cores) if it turned out to
>>>>>>> be wrong.
>>>>>>>
>>>>>>> > The only thing that the acquire barrier could be synchronising in
>>>>>>> that case is the *content* of the control block, which shouldn't need to be
>>>>>>> synchronised since it's all trivially destructible and not read as part of
>>>>>>> the delete.
>>>>>>> >
>>>>>>> > The mo_release on the decrement also seems unnecessary if you're
>>>>>>> just synchronising the state of the control block. The only case you'd need
>>>>>>> that would be if you were modifying part of the control block and needed
>>>>>>> that to be visible on another thread when it does the acquire, which we've
>>>>>>> established isn't nessecary so the paired release isn't either, putting
>>>>>>> aside the fact that the refcount is the only thing being modified
>>>>>>>
>>>>>>> The release is needed for the "non-last" case, since you'll have
>>>>>>> accesses to the control blocked sequenced before the decrement, and they
>>>>>>> need to not race with the delete in the thread which does do the final
>>>>>>> decrement. Again, the memory model insists on both sides of the
>>>>>>> release-acquire pair. And there's no trick like with an acquire fence to
>>>>>>> have a release barrier conditional on the value read by a RMW.
>>>>>>>
>>>>>>> > As I understand it if shared_ptr doesn't provide any ordering
>>>>>>> guarantees on accesses to the object then all it requires is a STO on the
>>>>>>> increments/decrements, which mo_relaxed provides.
>>>>>>> >
>>>>>>> >> I'd expect it to say something like this: if D is the invocation
>>>>>>> of the destructor that invokes the deleter, and D' is any other invocation
>>>>>>> of the destructor of a `shared_ptr` that shares ownership with it, then D'
>>>>>>> strongly happens before the invocation of the deleter.
>>>>>>> >
>>>>>>> > Second this as being a good wording to provide ordering between the
>>>>>>> destructor and the deleter, but would still need additional wording beyond
>>>>>>> this to ensure that the accesses to the shared object happens-before the
>>>>>>> destructor also (although it would need to perhaps cover more than just
>>>>>>> accesses to the shared object? If a thread modifies an object via a pointer
>>>>>>> stored in the shared object, should that modification be visible in the
>>>>>>> deleter? Current library behaviour with rel-acq ordering is yes)
>>>>>>>
>>>>>>
>>>>>> The standard library can't directly control the ordering between the
>>>>>> user code that accesses the object (i.e. using the built-in dereference
>>>>>> operator, lvalue-to-rvalue conversions, the class member access operator,
>>>>>> etc) and something else.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I think it's sufficient. It puts it on the programmer to ensure that
>>>>>>> every access to the shared object (other than its deleter) happens-before
>>>>>>> the destructor (or reset, move, etc) of at least one of the shared_ptr
>>>>>>> objects that owns it. And I think that's what people are already doing,
>>>>>>> based on the general understanding of how shared_ptr "should" work, as you
>>>>>>> already articulated. (I presume that in the vast majority of cases, this
>>>>>>> happens-before is achieved simply by sequenced-before; within a single
>>>>>>> thread, with respect to program order, you access the object through a
>>>>>>> shared_ptr, then destroy that shared_ptr, and don't access the object again
>>>>>>> after that.)
>>>>>>>
>>>>>>
>>>>>> +1
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Assuming they do that, then they have the guarantee you want, because
>>>>>>> happens-before is transitive (again I disregard consume for simplicity).
>>>>>>> So their modifications of the object happen-before the destructor, which
>>>>>>> happens-before the deleter. Therefore their modifications happen-before
>>>>>>> the deleter, which means the deleter must observe them.
>>>>>>>
>>>>>>
>>>>>> btw, memory_order_consume will be deprecated in C++26 and given the
>>>>>> semantics of memory_order_acquire (which is, I believe, how all
>>>>>> implementations currently behave anyway).
>>>>>>
>>>>>>
>>>>>> --
>>>> Std-Discussion mailing list
>>>> Std-Discussion_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion
>>>>
>>>
>>>
>
>--
>*Brian Bi*

Received on 2025-02-16 14:22:41