Date: Fri, 9 Jan 2026 00:24:19 +0000
On Sun, Jan 4, 2026 at 11:00 PM Jonathan Wakely wrote:
>
>
> As I said, using -mcx16 doesn't affect whether __atomic_compare_exchange
> calls into libatomic or not (it always does). Libatomic then decides at runtime
> whether to use cmpxchg16b, based on the actual CPU running the program.
>
> If you run the program on an older CPU, libatomic will not use that instruction.
Looking at the following GodBolt:
https://godbolt.org/z/EzncP891z
The output from the program is 5 lines.
The third line begins with: "f30f1efa41544989d1.......". If you take
that hexadecimal string and navigate to:
https://defuse.ca/online-x86-assembler.htm
You can copy it into the second text box to get the assembler, and it
comes back with:
0: f3 0f 1e fa endbr64
4: 41 54 push r12
6: 49 89 d1 mov r9,rdx
9: 49 89 fa mov r10,rdi
c: 49 89 f0 mov r8,rsi
f: 53 push rbx
. . .
. . . and so on . . .
You'll see that the assembler includes the instruction 'cmpxchg16b',
so this means that __atomic_fetch_add_16 is lockfree on the GodBolt
server computer.
However, even though __atomic_fetch_add_16 is lockfree, we still get
'false' for "atomic< __uint128_t >{}.is_lock_free()" at runtime. Given
the following function:
__uint128_t Func( atomic< __uint128_t > &x )
{
return x.fetch_add(5u);
}
We compile it to the following assembler . . . . oh wait it doesn't
compile. 'fetch_add' isn't implemented for atomic 128-Bit unsigned.
See failed compilation here:
https://godbolt.org/z/vPMPYe6Yh
This is dire, really. When the compiler sees a specialisation of
'std::atomic' for a POD type whose size and alignment are both 16,
then "is_lock_free()" should be true at runtime on modern x86_64
CPU's. Is anyone currently working on this among GNU gcc developers?
Or do I need to start work on the patch myself? Also 'fetch_add' and
the rest of them need to be implemented for __int128_t.
I still think maybe the C++ Standard needs a new type,
'atomic_pointer_pair'. I think I need to do the following:
Step 1) Patch the compiler so that "atomic< __uint128_t >" or any
16-sized 16-aligned atomic POD type is lockfree
Step 2) Implement 'atomic_pointer_pair' to use either "atomic<
__uint128_t >" or "atomic< pair<void*, void*> >"
Step 3) Implement 'atomic<chimeric_ptr>' to use 'atomic_pointer_pair'
>
>
> As I said, using -mcx16 doesn't affect whether __atomic_compare_exchange
> calls into libatomic or not (it always does). Libatomic then decides at runtime
> whether to use cmpxchg16b, based on the actual CPU running the program.
>
> If you run the program on an older CPU, libatomic will not use that instruction.
Looking at the following GodBolt:
https://godbolt.org/z/EzncP891z
The output from the program is 5 lines.
The third line begins with: "f30f1efa41544989d1.......". If you take
that hexadecimal string and navigate to:
https://defuse.ca/online-x86-assembler.htm
You can copy it into the second text box to get the assembler, and it
comes back with:
0: f3 0f 1e fa endbr64
4: 41 54 push r12
6: 49 89 d1 mov r9,rdx
9: 49 89 fa mov r10,rdi
c: 49 89 f0 mov r8,rsi
f: 53 push rbx
. . .
. . . and so on . . .
You'll see that the assembler includes the instruction 'cmpxchg16b',
so this means that __atomic_fetch_add_16 is lockfree on the GodBolt
server computer.
However, even though __atomic_fetch_add_16 is lockfree, we still get
'false' for "atomic< __uint128_t >{}.is_lock_free()" at runtime. Given
the following function:
__uint128_t Func( atomic< __uint128_t > &x )
{
return x.fetch_add(5u);
}
We compile it to the following assembler . . . . oh wait it doesn't
compile. 'fetch_add' isn't implemented for atomic 128-Bit unsigned.
See failed compilation here:
https://godbolt.org/z/vPMPYe6Yh
This is dire, really. When the compiler sees a specialisation of
'std::atomic' for a POD type whose size and alignment are both 16,
then "is_lock_free()" should be true at runtime on modern x86_64
CPU's. Is anyone currently working on this among GNU gcc developers?
Or do I need to start work on the patch myself? Also 'fetch_add' and
the rest of them need to be implemented for __int128_t.
I still think maybe the C++ Standard needs a new type,
'atomic_pointer_pair'. I think I need to do the following:
Step 1) Patch the compiler so that "atomic< __uint128_t >" or any
16-sized 16-aligned atomic POD type is lockfree
Step 2) Implement 'atomic_pointer_pair' to use either "atomic<
__uint128_t >" or "atomic< pair<void*, void*> >"
Step 3) Implement 'atomic<chimeric_ptr>' to use 'atomic_pointer_pair'
Received on 2026-01-09 00:23:29
