C++ Logo

std-discussion

Advanced search

Re: Generalized atomic updates

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 08 May 2024 11:53:05 -0700
On Wednesday 8 May 2024 11:17:51 GMT-7 Nate Eldredge wrote:
> In the code you linked, I only saw a generic compare-exchange
> implementation. Have you tried at all to create any machine-specific
> implementations, using inline assembly for LL/SC and/or RMW instructions
> where available? I'd be curious what kind of code the compiler can
> produce. If you haven't, I might try my hand at it, just for fun.

Yes, I tried writing x86-specific code. That's why I began this: so I could use
the new CMPccXADD instructions. The tests that I added do show the compiler
emits the correct instructions (by way of the intrinsic).

I have no interest in writing code for other architectures.

> Still, though, I feel like even when you enlarge the "menu" of available
> operations like this, there's always going to be things left out. For
> instance, an LL/SC machine can trivially do an atomic shift, which we
> currently have no way to do in C++ short of compare-exchange. Or imagine
> some sort of multi-threaded Collatz code that wants to do an atomic 3*x+1.

Indeed, I hadn't thought of shifts, but those do make sense to add.

More complex operations are troublesome. It's highly unlikely the hardware
will implement that in single or few instructions, so it's only going to work
with LL/SC and you have to know whether the operation is acceptable in that
block. For example, I could see some low-power architectures not having a
divide instruction, thus an operation that attempted a division always failing
(or even if such an instruction is present, it taking 70-100 cycles to
complete may be too much).

It might be important to know whether the particular sequence of operations is
loop-free or not.

On the other hand, implementing a compare-exchange loop with proper PAUSE
hints and backoffs is important too. Improper looping shows up on our
benchmarking a dozen times a year...

> (Minor digression: it's annoyed me for some time that C++ doesn't have
> fetch_xxx for all the built-in arithmetic operators, which would at least
> solve atomic shift. C allows you to apply any compound assignment
> operator to an _Atomic object, so all compilers with C front-ends must
> already have them implemented. The advantage of fetch_xxx would be that
> you can recover the old value, and specify the memory order.)


-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Principal Engineer - Intel DCAI Cloud Engineering

Received on 2024-05-08 18:53:09