Date: Mon, 29 Dec 2025 08:50:35 -0300
On Sunday, 28 December 2025 22:35:31 Brasilia Standard Time Jonathan Wakely
via Std-Proposals wrote:
> > Amazingly, this wasn't lockfree either. I'm dealing with the latest
> > 'trunk' GNU g++ compiler on x86_64 Linux, which has a CPU instruction
> > (compare and swap 16 bytes) to make this possible.
>
> You need -mcx16 to let GCC use that instruction, and even with that option
> atomic ops will call into libatomic which decides at runtime whether to use
> cmpxchg16b or not.
Indeed. Do note that a lockfree CAS isn't enough: there needs to be an atomic
way of loading and storing 128 bits, which is lacking a dedicated instruction
on x86-64. Without that, one must lock and unlock to perform loads and stores.
This is possible but undocumented: all machines capable of AVX have 128-bit
atomic data paths. Using the VMOVDQA instruction will also have a much higher
performance penalty due to the register file transfers.
via Std-Proposals wrote:
> > Amazingly, this wasn't lockfree either. I'm dealing with the latest
> > 'trunk' GNU g++ compiler on x86_64 Linux, which has a CPU instruction
> > (compare and swap 16 bytes) to make this possible.
>
> You need -mcx16 to let GCC use that instruction, and even with that option
> atomic ops will call into libatomic which decides at runtime whether to use
> cmpxchg16b or not.
Indeed. Do note that a lockfree CAS isn't enough: there needs to be an atomic
way of loading and storing 128 bits, which is lacking a dedicated instruction
on x86-64. Without that, one must lock and unlock to perform loads and stores.
This is possible but undocumented: all machines capable of AVX have 128-bit
atomic data paths. Using the VMOVDQA instruction will also have a much higher
performance penalty due to the register file transfers.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
Received on 2025-12-29 11:50:38
