Date: Mon, 17 Feb 2025 21:41:20 -0700
> On Feb 16, 2025, at 04:38, Jennifier Burnett <jenni_at_[hidden]> wrote:
>
>
>
>> No. The reference counter is all that provides synchronization between reads of the control block by "non-last" destructors
>
>> The release is needed for the "non-last" case, since you'll have accesses to the control blocked sequenced before the decrement, and they need to not race with the delete in the thread which does do the final decrement.
>
> This is the part that confuses me. What reads and accesses are you referring to that aren't already synchronised by coherency ordering on operations on a single atomic variable? Given that the control block is just the reference counter (your example implementation is slightly wrong I'll point out, the pointer to the shared object is stored in each individual shared_ptr, not the control block, so the control block would more correctly be *just* the reference counter) coherency ordering would seem sufficient to ensure that accesses to the reference counter don't escape before the constructor or after the destructor. If you move a relaxed atomic read past a relaxed atomic RMW you end up changing program-order correctness in the program, since if you had a read of use_count() before the destructor that is forwarded into the deleter, the user would read that the refcount was apparently zero before decrementing, therefore contradicting the fact that the branch has been entered
Even if we're only talking about the deletion of the reference counter itself, the issue still applies. The end of the lifetime of an atomic object is not itself an atomic operation, and it doesn't own a slot in the STO modification order. It is subject to the same rules as any other object; namely, that all accesses to the object must happen-before the end of the lifetime. So although the modification order does totally order the atomic writes to the reference counter, a non-atomic operation such as the end of lifetime can't benefit from that without additional barriers.
>> That's how real machines work, of course, but the C++ model doesn't promise it. A control dependency without an acquire barrier doesn't provide any happens-before ordering across threads, and doesn't save you from a data race and consequent UB.
>
> But it should protect you from transformations that are blatantly wrong on single threaded machines? If the abstract machine is able to move the destruction of the control block before it checks if the decremented reference count is zero, that would be incorrect because it would be inventing a destructor where it wouldn't otherwise exist (if it turns out that the reference count is not zero.)
I fully agree that all transformations must preserve the property that the destruction only occurs if the decremented ref counter actually *turns out to be* zero. But in principle, the implementation could make the destruction visible *to other threads* before doing the actual check, if it can deduce by some other means, with absolute certainty, that the decremented counter will inevitably be zero.
Of course it cannot make the destruction visible to *this* thread before the decrement. As you say, single-threaded semantics must be preserved.
Like I said, it is very hard to imagine a machine that could actually do this without some sort of supernatural powers, but the formal memory model doesn't forbid it.
> I believe that the destruction of the control block has a simple program-order happens-before relationship with the RMW decrement of the reference counter, so given that shouldn't the STO on operations on the reference counter ensure that all the non-zero decrements must have happened before the final zero decrement of the reference counter? And since simple coherency ordering ensures that prior accesses to the reference count can't move past the decrement in the destructor, all accesses to the refcount (and hence the control block, if the control block is just the reference count) must have transitively happened before the final decrement
Agreed, the final decrement happens-before the destruction of the control block in that same thread. But without acquire-release barriers, it does not follow that all *other* decrements happen-before the destruction; the fact that they precede the final decrement in the *modification order* does not suffice. The data race exists between those *other* decrements, and the destruction.
>> So as I understand, ISO C++ would allow for a hypothetical machine that could make a write globally visible before the read on which it depended. For instance, if the machine could peek into the future and see that the read would inevitably return the correct value. Or, if it could make the write globally visible while still speculative, and have some way to roll back the entire system (including other cores) if it turned out to be wrong.
>
> Right, this is my understanding too. But there aren't any writes in the destructor that I can see, only a RMW, so I'm not sure exactly what relevance that has to this. RMW operations on a single atomic variable have a STO without release-acquire ordering.
I'm supposing that the end of the lifetime of the reference counter modifies global state in some way that would mess up other concurrent accesses. That's the "write" I'm talking about. (For instance, it could overwrite the counter with some garbage value; or it could write to page tables that would cause the other accesses to fault.)
>
>
>
>> No. The reference counter is all that provides synchronization between reads of the control block by "non-last" destructors
>
>> The release is needed for the "non-last" case, since you'll have accesses to the control blocked sequenced before the decrement, and they need to not race with the delete in the thread which does do the final decrement.
>
> This is the part that confuses me. What reads and accesses are you referring to that aren't already synchronised by coherency ordering on operations on a single atomic variable? Given that the control block is just the reference counter (your example implementation is slightly wrong I'll point out, the pointer to the shared object is stored in each individual shared_ptr, not the control block, so the control block would more correctly be *just* the reference counter) coherency ordering would seem sufficient to ensure that accesses to the reference counter don't escape before the constructor or after the destructor. If you move a relaxed atomic read past a relaxed atomic RMW you end up changing program-order correctness in the program, since if you had a read of use_count() before the destructor that is forwarded into the deleter, the user would read that the refcount was apparently zero before decrementing, therefore contradicting the fact that the branch has been entered
Even if we're only talking about the deletion of the reference counter itself, the issue still applies. The end of the lifetime of an atomic object is not itself an atomic operation, and it doesn't own a slot in the STO modification order. It is subject to the same rules as any other object; namely, that all accesses to the object must happen-before the end of the lifetime. So although the modification order does totally order the atomic writes to the reference counter, a non-atomic operation such as the end of lifetime can't benefit from that without additional barriers.
>> That's how real machines work, of course, but the C++ model doesn't promise it. A control dependency without an acquire barrier doesn't provide any happens-before ordering across threads, and doesn't save you from a data race and consequent UB.
>
> But it should protect you from transformations that are blatantly wrong on single threaded machines? If the abstract machine is able to move the destruction of the control block before it checks if the decremented reference count is zero, that would be incorrect because it would be inventing a destructor where it wouldn't otherwise exist (if it turns out that the reference count is not zero.)
I fully agree that all transformations must preserve the property that the destruction only occurs if the decremented ref counter actually *turns out to be* zero. But in principle, the implementation could make the destruction visible *to other threads* before doing the actual check, if it can deduce by some other means, with absolute certainty, that the decremented counter will inevitably be zero.
Of course it cannot make the destruction visible to *this* thread before the decrement. As you say, single-threaded semantics must be preserved.
Like I said, it is very hard to imagine a machine that could actually do this without some sort of supernatural powers, but the formal memory model doesn't forbid it.
> I believe that the destruction of the control block has a simple program-order happens-before relationship with the RMW decrement of the reference counter, so given that shouldn't the STO on operations on the reference counter ensure that all the non-zero decrements must have happened before the final zero decrement of the reference counter? And since simple coherency ordering ensures that prior accesses to the reference count can't move past the decrement in the destructor, all accesses to the refcount (and hence the control block, if the control block is just the reference count) must have transitively happened before the final decrement
Agreed, the final decrement happens-before the destruction of the control block in that same thread. But without acquire-release barriers, it does not follow that all *other* decrements happen-before the destruction; the fact that they precede the final decrement in the *modification order* does not suffice. The data race exists between those *other* decrements, and the destruction.
>> So as I understand, ISO C++ would allow for a hypothetical machine that could make a write globally visible before the read on which it depended. For instance, if the machine could peek into the future and see that the read would inevitably return the correct value. Or, if it could make the write globally visible while still speculative, and have some way to roll back the entire system (including other cores) if it turned out to be wrong.
>
> Right, this is my understanding too. But there aren't any writes in the destructor that I can see, only a RMW, so I'm not sure exactly what relevance that has to this. RMW operations on a single atomic variable have a STO without release-acquire ordering.
I'm supposing that the end of the lifetime of the reference counter modifies global state in some way that would mess up other concurrent accesses. That's the "write" I'm talking about. (For instance, it could overwrite the counter with some garbage value; or it could write to page tables that would cause the other accesses to fault.)
Received on 2025-02-18 04:41:34