Date: Thu, 9 May 2024 02:39:58 +0300
On 5/9/24 01:05, Nate Eldredge wrote:
> On Thu, 9 May 2024, Andrey Semashev via Std-Discussion wrote:
>>
>> Another example might be no memory access instructions within the LL/SC
>> pair. Even if they are formally allowed, page faults may get in the way.
>> Which would make the whole concept extremely unreliable, as the compiler
>> normally doesn't have those kind of restrictions and e.g. is free to
>> generate loads from the function object state or bound variables. With
>> optimizations disabled, the compiler will also generate loads and stores
>> on the stack for every variable.
>
> So I guess those cases would fall back to compare-exchange also.
>
> I fully agree that this is only going to be beneficial in practice if f
> is "extremely simple", even if we cannot precisely define what that
> means in general, and that a smart programmer would only use it in that
> way. But I think it could at least still behave correctly when used in
> a non-smart way, even if not with optimal performance, by falling back
> to compare-exchange, either at compile time or run time.
Thing is, atomics are used by experts where performance matters. If a
certain mechanism doesn't provide a reliable performance
characteristics, why would anyone use it? I mean, if I can't be
reasonably sure that the code generated by the compiler for this new API
is good, I will rather write the loop or even inline assembler myself.
(By "good" I mean at the very least "not broken", e.g. when the LL/SC
section always fails, whether or not there is a fallback code path. And
the code has to be "good" regardless of the compiler's mood or
optimization level.)
> Thiago also makes a good point that there is still value in having a
> library function to perform a CAS loop, since in some cases there are
> machine-specific optimizations that could be inserted.
I can see a point in providing an abstraction specifically for the CAS
loop idiom, just for the sake of a better API for it. But this alone
doesn't seem worth it, given that it will probably be less efficient
than the manually written CAS loop anyway.
> On Thu, 9 May 2024, Andrey Semashev via Std-Discussion wrote:
>>
>> Another example might be no memory access instructions within the LL/SC
>> pair. Even if they are formally allowed, page faults may get in the way.
>> Which would make the whole concept extremely unreliable, as the compiler
>> normally doesn't have those kind of restrictions and e.g. is free to
>> generate loads from the function object state or bound variables. With
>> optimizations disabled, the compiler will also generate loads and stores
>> on the stack for every variable.
>
> So I guess those cases would fall back to compare-exchange also.
>
> I fully agree that this is only going to be beneficial in practice if f
> is "extremely simple", even if we cannot precisely define what that
> means in general, and that a smart programmer would only use it in that
> way. But I think it could at least still behave correctly when used in
> a non-smart way, even if not with optimal performance, by falling back
> to compare-exchange, either at compile time or run time.
Thing is, atomics are used by experts where performance matters. If a
certain mechanism doesn't provide a reliable performance
characteristics, why would anyone use it? I mean, if I can't be
reasonably sure that the code generated by the compiler for this new API
is good, I will rather write the loop or even inline assembler myself.
(By "good" I mean at the very least "not broken", e.g. when the LL/SC
section always fails, whether or not there is a fallback code path. And
the code has to be "good" regardless of the compiler's mood or
optimization level.)
> Thiago also makes a good point that there is still value in having a
> library function to perform a CAS loop, since in some cases there are
> machine-specific optimizations that could be inserted.
I can see a point in providing an abstraction specifically for the CAS
loop idiom, just for the sake of a better API for it. But this alone
doesn't seem worth it, given that it will probably be less efficient
than the manually written CAS loop anyway.
Received on 2024-05-08 23:40:12