Date: Tue, 23 Jul 2024 09:13:59 -0400
> The ABA problem seems only relevant for pointer swaps (at least I cannot
think of another use-case where that would be relevant).
Where there are mixed orderings at play, that can also matter. If the LL/SC
is only lowered to cmpxchg at the machine code level (avoids compiler
reorderings and also pointer provenance issues), *and* the hardware
provides at least acquire/release semantics on every load/store, then I
believe the lowering is correct.
On Tue, 23 Jul 2024 at 09:00, Joseph Schuchart via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> Thanks everyone for the valuable feedback. That's just what I was hoping
> for :)
>
> Let me try to summarize and comment:
> 1) We probably don't want to allow arbitrary code and instead provide a
> mechanism to specify predefined operations, inspired by Thiago's
> atomic_mutate implementation (thanks for sharing that!). Both RISC-V and
> arm come with restrictions on what is allowed between LL and SC so
> "portable".
> 2) I agree that probably the biggest issue is portability. An interface
> like atomic_load_store can be implemented with CAS instructions but the
> semantics would be different. Most notably it would not save us from the
> ABA problem. That of course defeats my purpose. However, ABA is specific to
> pointers and other types would work just fine.
> 3) There seems to be a way to emulate the LL/SC semantics with CAS (based
> on the LLX/SCX paper) but my understanding is that it is rather expensive
> and users wouldn't know whether the current platform provides native
> support or not. Their approach seems to require additional blocks in the
> data that we couldn't provide without extending std::atomic<T>.
> 4) I had heard about hazard pointers and had forgotten about them, thanks
> for the reference. They appear to solve a very specific problem (memory
> management, like RCU) and I'm struggling to generalize that to my use-case.
> Maybe there is something I am missing.
>
> I do believe that offering an abstraction like
> atomic_load_store/atomic_mutate has merit, beyond the narrow use-case of
> data structures I use as my motivating example. If used carefully (see 1))
> it can help reduce overheads from emulated CAS, esp with pointers, and it
> opens the door to custom atomic operations beyond what is currently
> provided by std::atomic (as demonstrated by Thiago's atomic_mutate
> examples).
>
> The ABA problem seems only relevant for pointer swaps (at least I cannot
> think of another use-case where that would be relevant). Maybe we can
> provide a facility that protects pointers by augmenting an atomic pointer
> variable, along the lines of std::atomic<std::atomic_safe_ptr<T>>. On
> systems with only CAS, this would expand to a struct {T*, intptr_t} and
> atomic_load_store would handle the ABA problem for us by incrementing the
> second part on each atomic_load_store (the only operation allowed on it).
> That would not be necessary on architectures with LL/SC so
> std::atomic_safe_ptr<T> simply becomes T*. I wonder if ABA alone is a
> common enough problem that it's worth standardizing a solution for it.
>
> I will play around with Thiago's code and try to combine it with the
> safe_ptr abstraction. I'll need to find some time on the side for this but
> I think there is something there there.
>
> Thanks again for the feedback,
> Joseph
>
> On Mon, Jul 22, 2024 at 12:51 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
> wrote:
>
>> Hi Joseph,
>> Have you looked at the recently-adopted-into-C++26 library clause on
>> hazard pointers (and to a lesser extent RCU)?
>> https://eel.is/c++draft/saferecl#hp
>> I think these are higher-level than your proposed facility, but the way
>> you're using your facility to build a lock-free linked list is exactly
>> why things like hazard pointers exist, IIUC.
>>
>> I would summarize your proposed interface as "ABA-free compare-and-swap"
>> — which I guess highlights exactly why it's not implementable (more
>> charitably, "why it must be a hardware primitive").
>>
>> –Arthur
>>
>> On Sun, Jul 21, 2024 at 6:24 PM Joseph Schuchart via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> All,
>>>
>>> Long time reader, first time writer here. I am working on a sizable
>>> project where we have traditionally provided custom-built abstractions
>>> (pre-C/C++11) for atomic operations. In particular, we are trying to use
>>> LL/SC instructions (load-linked/store-conditional) for some data structures
>>> where available, since they provide stronger guarantees than CAS (avoiding
>>> the ABA problem).
>>>
>>> Neither C11 nor C++11 provide a primitive that lets developers utilize
>>> these primitives directly so we are using inline assembly and
>>> configure-time detection. I'd like to get to a point where we don't have to
>>> write inline assembly anymore to utilize this feature.
>>>
>>> I wonder whether there have been discussions on adding an abstraction to
>>> the standard that exposes LL/SC capabilities where available? I looked
>>> through papers in the last ten years but nothing struck me as relevant. So
>>> I wonder whether there is interest in adding a facility to utilize these
>>> instructions? Apologies if this has been discussed before, I'd appreciate a
>>> pointer to the discussion/proposal if it has.
>>>
>>> Here is my approach: Since LL/SC is a load and a (conditional) store
>>> that work in tandem with potentially arbitrary code in between (vs a single
>>> CAS instruction or well-defined operations if std::atomic is implemented
>>> using LL/SC) they should be encapsulated into one abstraction, like:
>>>
>>> ```
>>> struct lifo_t {
>>> std::atomic<lifo_element_t*> head;
>>> };
>>> lifo_element_t* new_elem = ...;
>>> /* load current head and exchange with new */
>>> while (!std::atomic_load_store(lifo->head,
>>> [&](list_element_t* elem) {
>>> new_elem->next = elem; // link
>>> return new_elem;
>>> }))
>>> { /* keep trying if the store fails */ }
>>> ```
>>>
>>> Both the load and the store will happen inside std::atomic_load_store
>>> and the provided callable takes the loaded value and returns the value to
>>> be stored in its place. If the conditional-store succeeds,
>>> std::atomic_load_store would return true, false otherwise. Similar to
>>> std::atomic_compare_exchange_weak, the store may fail spuriously. If
>>> std::atomic_load_store succeeds, we are sure that no write has happened to
>>> the atomic object in between the load and the store. This avoids the ABA
>>> problem we have with CAS and so we do not need to work on 128bit values (on
>>> 64bit machines). Beyond the case I'm describing here, this API could enable
>>> custom floating point atomic operations for example.
>>>
>>> I'm not sure that this API could be implemented everywhere but it would
>>> expose an important capability on architectures where it can be implemented
>>> (LL/SC is available, and maybe transactional memory?) so it would have to
>>> be guarded by a preprocessor macro like (__cpp_atomic_load_store) or a
>>> trait (std::have_atomic_load_store<T>) and applications would have to
>>> provide a fallback implementation.
>>>
>>> I'm happy to write a full proposal if there is interest in the community
>>> but would probably need some help to get started.
>>>
>>> I appreciate any feedback on this.
>>>
>>> Thanks
>>> Joseph
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
think of another use-case where that would be relevant).
Where there are mixed orderings at play, that can also matter. If the LL/SC
is only lowered to cmpxchg at the machine code level (avoids compiler
reorderings and also pointer provenance issues), *and* the hardware
provides at least acquire/release semantics on every load/store, then I
believe the lowering is correct.
On Tue, 23 Jul 2024 at 09:00, Joseph Schuchart via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> Thanks everyone for the valuable feedback. That's just what I was hoping
> for :)
>
> Let me try to summarize and comment:
> 1) We probably don't want to allow arbitrary code and instead provide a
> mechanism to specify predefined operations, inspired by Thiago's
> atomic_mutate implementation (thanks for sharing that!). Both RISC-V and
> arm come with restrictions on what is allowed between LL and SC so
> "portable".
> 2) I agree that probably the biggest issue is portability. An interface
> like atomic_load_store can be implemented with CAS instructions but the
> semantics would be different. Most notably it would not save us from the
> ABA problem. That of course defeats my purpose. However, ABA is specific to
> pointers and other types would work just fine.
> 3) There seems to be a way to emulate the LL/SC semantics with CAS (based
> on the LLX/SCX paper) but my understanding is that it is rather expensive
> and users wouldn't know whether the current platform provides native
> support or not. Their approach seems to require additional blocks in the
> data that we couldn't provide without extending std::atomic<T>.
> 4) I had heard about hazard pointers and had forgotten about them, thanks
> for the reference. They appear to solve a very specific problem (memory
> management, like RCU) and I'm struggling to generalize that to my use-case.
> Maybe there is something I am missing.
>
> I do believe that offering an abstraction like
> atomic_load_store/atomic_mutate has merit, beyond the narrow use-case of
> data structures I use as my motivating example. If used carefully (see 1))
> it can help reduce overheads from emulated CAS, esp with pointers, and it
> opens the door to custom atomic operations beyond what is currently
> provided by std::atomic (as demonstrated by Thiago's atomic_mutate
> examples).
>
> The ABA problem seems only relevant for pointer swaps (at least I cannot
> think of another use-case where that would be relevant). Maybe we can
> provide a facility that protects pointers by augmenting an atomic pointer
> variable, along the lines of std::atomic<std::atomic_safe_ptr<T>>. On
> systems with only CAS, this would expand to a struct {T*, intptr_t} and
> atomic_load_store would handle the ABA problem for us by incrementing the
> second part on each atomic_load_store (the only operation allowed on it).
> That would not be necessary on architectures with LL/SC so
> std::atomic_safe_ptr<T> simply becomes T*. I wonder if ABA alone is a
> common enough problem that it's worth standardizing a solution for it.
>
> I will play around with Thiago's code and try to combine it with the
> safe_ptr abstraction. I'll need to find some time on the side for this but
> I think there is something there there.
>
> Thanks again for the feedback,
> Joseph
>
> On Mon, Jul 22, 2024 at 12:51 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
> wrote:
>
>> Hi Joseph,
>> Have you looked at the recently-adopted-into-C++26 library clause on
>> hazard pointers (and to a lesser extent RCU)?
>> https://eel.is/c++draft/saferecl#hp
>> I think these are higher-level than your proposed facility, but the way
>> you're using your facility to build a lock-free linked list is exactly
>> why things like hazard pointers exist, IIUC.
>>
>> I would summarize your proposed interface as "ABA-free compare-and-swap"
>> — which I guess highlights exactly why it's not implementable (more
>> charitably, "why it must be a hardware primitive").
>>
>> –Arthur
>>
>> On Sun, Jul 21, 2024 at 6:24 PM Joseph Schuchart via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> All,
>>>
>>> Long time reader, first time writer here. I am working on a sizable
>>> project where we have traditionally provided custom-built abstractions
>>> (pre-C/C++11) for atomic operations. In particular, we are trying to use
>>> LL/SC instructions (load-linked/store-conditional) for some data structures
>>> where available, since they provide stronger guarantees than CAS (avoiding
>>> the ABA problem).
>>>
>>> Neither C11 nor C++11 provide a primitive that lets developers utilize
>>> these primitives directly so we are using inline assembly and
>>> configure-time detection. I'd like to get to a point where we don't have to
>>> write inline assembly anymore to utilize this feature.
>>>
>>> I wonder whether there have been discussions on adding an abstraction to
>>> the standard that exposes LL/SC capabilities where available? I looked
>>> through papers in the last ten years but nothing struck me as relevant. So
>>> I wonder whether there is interest in adding a facility to utilize these
>>> instructions? Apologies if this has been discussed before, I'd appreciate a
>>> pointer to the discussion/proposal if it has.
>>>
>>> Here is my approach: Since LL/SC is a load and a (conditional) store
>>> that work in tandem with potentially arbitrary code in between (vs a single
>>> CAS instruction or well-defined operations if std::atomic is implemented
>>> using LL/SC) they should be encapsulated into one abstraction, like:
>>>
>>> ```
>>> struct lifo_t {
>>> std::atomic<lifo_element_t*> head;
>>> };
>>> lifo_element_t* new_elem = ...;
>>> /* load current head and exchange with new */
>>> while (!std::atomic_load_store(lifo->head,
>>> [&](list_element_t* elem) {
>>> new_elem->next = elem; // link
>>> return new_elem;
>>> }))
>>> { /* keep trying if the store fails */ }
>>> ```
>>>
>>> Both the load and the store will happen inside std::atomic_load_store
>>> and the provided callable takes the loaded value and returns the value to
>>> be stored in its place. If the conditional-store succeeds,
>>> std::atomic_load_store would return true, false otherwise. Similar to
>>> std::atomic_compare_exchange_weak, the store may fail spuriously. If
>>> std::atomic_load_store succeeds, we are sure that no write has happened to
>>> the atomic object in between the load and the store. This avoids the ABA
>>> problem we have with CAS and so we do not need to work on 128bit values (on
>>> 64bit machines). Beyond the case I'm describing here, this API could enable
>>> custom floating point atomic operations for example.
>>>
>>> I'm not sure that this API could be implemented everywhere but it would
>>> expose an important capability on architectures where it can be implemented
>>> (LL/SC is available, and maybe transactional memory?) so it would have to
>>> be guarded by a preprocessor macro like (__cpp_atomic_load_store) or a
>>> trait (std::have_atomic_load_store<T>) and applications would have to
>>> provide a fallback implementation.
>>>
>>> I'm happy to write a full proposal if there is interest in the community
>>> but would probably need some help to get started.
>>>
>>> I appreciate any feedback on this.
>>>
>>> Thanks
>>> Joseph
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2024-07-23 13:14:12