C++ Logo

std-discussion

Advanced search

Lack of ordering guarantees in shared_ptr wording

From: Jennifier Burnett <jenni_at_[hidden]>
Date: Sat, 15 Feb 2025 22:30:22 +0000
Hi all, I've talked to a few people I know personally about this and also posted about it online but I've not been able to get a firm response from anyone that I'm mistaken so I'm escalating this to this mailing list in the hope that someone here might be able to point out anything I'm missing.

Basically, I think there's issues with the standard's wording around shared_ptr, and I'd prefer to make sure I've actually found something before starting a defect report against a decade old library class.

The background for this is that I was curious as to if there was a mandated acquire fence between the destructor of shared_ptr and the destructor of the owned object. Obviously all implementations of the standard I'm aware of do have an acquire fence but I was curious as to if that was mandated or if a compiler could effectively use a memory_order_consume equivalent on platforms that allow it to provide the acquire ordering constraints only within the destructor.

The problem I think I've found is that shared_ptr doesn't actually seem to mandate *any* ordering requirements, beyond the bare minimum to make the object usable simultaneously across multiple threads. By the current wording, it seems like a standard library implementation would be entirely correct if it implemented both the increment and decrement of the reference counter by using memory_order_relaxed.

The only concrete things I can find that the standard says regarding how shared_ptr orders accesses on multiple threads are:

[util.smartptr.shared]§1:
> shared_ptr implements semantics of shared ownership; the last remaining owner of the pointer is responsible for destroying the object or otherwise releasing the resources associated with the stored pointer

The fact that there is a single last remaining owner implies that there's a single total order on shared_ptr destructor calls.

[util.smartptr.shared]§4:
> Changes in use_count() do
not reflect modifications that can introduce data races.

This simply allows that shared_ptr constructors, destructors and accesses of use_count do not need to be synchronised between multiple threads.

[util.smartptr.shared.dest]§2:
> after *this has been destroyed all shared_ptr instances that shared ownership with *this will report a use_count() that is one less than its previous value.

This seems to mandate a single total order on destructor calls.

None of the above seem to provide sufficient ordering constraints (or clearly imply that ordering constraints must be provided) as to ensure that any accesses to the owned object must happens-before the destructor associated with the owned object being called on a different thread.

To give a concrete example of the problem, consider the following program:

<code>
#include <memory>
#include <semaphore>
#include <thread>

void foo(std::shared_ptr<int>* in, std::binary_semaphore* flag, int* out)
{
    std::shared_ptr<int> in_copy = *in;
    flag->release();
    int temp = *in_copy;
    in_copy.reset();
    *out = temp;
}

int bar()
{
    int out_value;
    std::binary_semaphore flag{ 0 };
    std::thread t1;

    std::shared_ptr<int> in_value = std::make_shared<int>(15);
    t1 = std::thread{ &foo, &in_value, &flag, &out_value };
    flag.acquire();
    in_value.reset();
    t1.join();
    return out_value;
}
</code>

Given this program, is there any undefined behaviour?

To make it clearer what the issue is I'll add some made-up functions to foo to indicate the internals of shared_ptr:

<code>
void foo(std::shared_ptr<int>* in, std::binary_semaphore* flag, int* out)
{
    __refcount* in_refs = in->refcount();
    int* in_copy = in->get();
    in_refs->increment();

    flag->release();

    int temp = *in_copy;

    if (in_refs->decrement_and_check_zero())
    {
        delete in_copy;
    }

    *out = temp;
}
</code>

Now, a more focused question: is the read of in_copy ordered before the call to decrement_and_check_zero()?

The problem I see with the current wording of shared_ptr is that it doesn't seem to provide an unambiguous "yes", which means if we interpret the standard such that we say "no" to that question it suddenly becomes legal to perform the following transformation:

<code>
void foo(std::shared_ptr<int>* in, std::binary_semaphore* flag, int* out)
{
    __refcount* in_refs = in->refcount();
    int* in_copy = in->get();
    in_refs->increment();

    flag->release();

    if (in_refs->decrement_and_check_zero())
    {
        *out = *in_copy;
        delete in_copy;
    }
    else
    {
        *out = *in_copy;
    }
}
</code>

Which now raises undefined behaviour. As soon as decrement_and_check_zero() returns and we enter the else branch, we enter a state in which the main thread may, at any point, delete the last remaining shared_ptr instance and call the destructor of the owned object. Since the read of in_copy in the else branch isn't sequenced relative to that destructor, we have a potential read from a object after it's lifetime has ended, and thus the program exhibits undefined behaviour. The only way to protect against this would to manually insert a std::atomic_thread_fence(std:: memory_order_release) before the reset() of the shared pointer to ensure that the read happens before the shared_ptr is destroyed.

As I've said, to my reading the standard doesn't provide sufficient wording to definitively say that the above transformation *isn't* legal. I think it's reasonable that you can interpret that the intention is that accesses to the reference counter should be considered an atomic operation (and thus some of the atomic ordering constraints such as single total order of accesses and read-read coherency ordering should apply), but it doesn't seem at all clear that it should be treated like std::mutex::lock() or unlock() where there are implicit acquire or release semantics (not least because those examples are called out specifically in the section on atomic memory orderings to give them those semantics.)

Anyway, I'm hoping someone here can point out something that I'm missing here. Obviously I don't think this is a case where everyone is using shared_ptr wrong and we need to all be adding std::atomic_thread_fence in our code using shared_ptr to provide ordering, but rather a deficiency in the standard's wording that doesn't clearly communicate the intention.

Thank you for your time,
Jennifier Burnett

Received on 2025-02-15 22:46:37