Date: Fri, 9 Jan 2026 00:49:46 +0000
On Fri, 9 Jan 2026, 00:23 Frederick Virchanza Gotham via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> On Sun, Jan 4, 2026 at 11:00 PM Jonathan Wakely wrote:
> >
> >
> > As I said, using -mcx16 doesn't affect whether __atomic_compare_exchange
> > calls into libatomic or not (it always does). Libatomic then decides at
> runtime
> > whether to use cmpxchg16b, based on the actual CPU running the program.
> >
> > If you run the program on an older CPU, libatomic will not use that
> instruction.
>
>
> Looking at the following GodBolt:
>
> https://godbolt.org/z/EzncP891z
>
> The output from the program is 5 lines.
>
> The third line begins with: "f30f1efa41544989d1.......". If you take
> that hexadecimal string and navigate to:
>
> https://defuse.ca/online-x86-assembler.htm
>
> You can copy it into the second text box to get the assembler, and it
> comes back with:
>
> 0: f3 0f 1e fa endbr64
> 4: 41 54 push r12
> 6: 49 89 d1 mov r9,rdx
> 9: 49 89 fa mov r10,rdi
> c: 49 89 f0 mov r8,rsi
> f: 53 push rbx
> . . .
> . . . and so on . . .
>
> You'll see that the assembler includes the instruction 'cmpxchg16b',
> so this means that __atomic_fetch_add_16 is lockfree on the GodBolt
> server computer.
>
> However, even though __atomic_fetch_add_16 is lockfree, we still get
> 'false' for "atomic< __uint128_t >{}.is_lock_free()" at runtime. Given
> the following function:
>
> __uint128_t Func( atomic< __uint128_t > &x )
> {
> return x.fetch_add(5u);
> }
>
> We compile it to the following assembler . . . . oh wait it doesn't
> compile. 'fetch_add' isn't implemented for atomic 128-Bit unsigned.
> See failed compilation here:
>
> https://godbolt.org/z/vPMPYe6Yh
>
> This is dire, really. When the compiler sees a specialisation of
> 'std::atomic' for a POD type whose size and alignment are both 16,
> then "is_lock_free()" should be true at runtime on modern x86_64
> CPU's. Is anyone currently working on this among GNU gcc developers?
>
I'm not a "GNU gcc developer" but I'm working on filling in all the missing
pieces for int128 as an integral type, which includes std::atomic.
Or do I need to start work on the patch myself? Also 'fetch_add' and
> the rest of them need to be implemented for __int128_t.
>
> I still think maybe the C++ Standard needs a new type,
> 'atomic_pointer_pair'. I think I need to do the following:
>
> Step 1) Patch the compiler so that "atomic< __uint128_t >" or any
> 16-sized 16-aligned atomic POD type is lockfree
> Step 2) Implement 'atomic_pointer_pair' to use either "atomic<
> __uint128_t >" or "atomic< pair<void*, void*> >"
> Step 3) Implement 'atomic<chimeric_ptr>' to use 'atomic_pointer_pair'
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
std-proposals_at_[hidden]> wrote:
> On Sun, Jan 4, 2026 at 11:00 PM Jonathan Wakely wrote:
> >
> >
> > As I said, using -mcx16 doesn't affect whether __atomic_compare_exchange
> > calls into libatomic or not (it always does). Libatomic then decides at
> runtime
> > whether to use cmpxchg16b, based on the actual CPU running the program.
> >
> > If you run the program on an older CPU, libatomic will not use that
> instruction.
>
>
> Looking at the following GodBolt:
>
> https://godbolt.org/z/EzncP891z
>
> The output from the program is 5 lines.
>
> The third line begins with: "f30f1efa41544989d1.......". If you take
> that hexadecimal string and navigate to:
>
> https://defuse.ca/online-x86-assembler.htm
>
> You can copy it into the second text box to get the assembler, and it
> comes back with:
>
> 0: f3 0f 1e fa endbr64
> 4: 41 54 push r12
> 6: 49 89 d1 mov r9,rdx
> 9: 49 89 fa mov r10,rdi
> c: 49 89 f0 mov r8,rsi
> f: 53 push rbx
> . . .
> . . . and so on . . .
>
> You'll see that the assembler includes the instruction 'cmpxchg16b',
> so this means that __atomic_fetch_add_16 is lockfree on the GodBolt
> server computer.
>
> However, even though __atomic_fetch_add_16 is lockfree, we still get
> 'false' for "atomic< __uint128_t >{}.is_lock_free()" at runtime. Given
> the following function:
>
> __uint128_t Func( atomic< __uint128_t > &x )
> {
> return x.fetch_add(5u);
> }
>
> We compile it to the following assembler . . . . oh wait it doesn't
> compile. 'fetch_add' isn't implemented for atomic 128-Bit unsigned.
> See failed compilation here:
>
> https://godbolt.org/z/vPMPYe6Yh
>
> This is dire, really. When the compiler sees a specialisation of
> 'std::atomic' for a POD type whose size and alignment are both 16,
> then "is_lock_free()" should be true at runtime on modern x86_64
> CPU's. Is anyone currently working on this among GNU gcc developers?
>
I'm not a "GNU gcc developer" but I'm working on filling in all the missing
pieces for int128 as an integral type, which includes std::atomic.
Or do I need to start work on the patch myself? Also 'fetch_add' and
> the rest of them need to be implemented for __int128_t.
>
> I still think maybe the C++ Standard needs a new type,
> 'atomic_pointer_pair'. I think I need to do the following:
>
> Step 1) Patch the compiler so that "atomic< __uint128_t >" or any
> 16-sized 16-aligned atomic POD type is lockfree
> Step 2) Implement 'atomic_pointer_pair' to use either "atomic<
> __uint128_t >" or "atomic< pair<void*, void*> >"
> Step 3) Implement 'atomic<chimeric_ptr>' to use 'atomic_pointer_pair'
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2026-01-09 00:50:06
