Date: Mon, 29 Dec 2025 14:17:54 +0000
On Mon, 29 Dec 2025, 1:35 am Jonathan Wakely wrote:
>
> You need -mcx16 to let GCC use that instruction, and even with that option
> atomic ops will call into libatomic which decides at runtime whether to use
> cmpxchg16b or not.
>
There are a few operating systems nowadays that refuse to boot if
'cmpxchg16b' is missing. MS-Windows won't boot up since version 8.1, as
well as openSUSE.
'cmpxchg16b' became commonplace 19 years ago. It doesn't make sense that a
programmer in 2025 should be burdened by old hardware from 2006.
>> Why should this type have those conversations to/from integers when
> std::atomic<void*> doesn't?
>
> Why would both signed and unsigned forms be needed?
>
Instead of dealing with a "pointer + pointer", the lockfree container might
deal with "pointer + tag" or "pointer + counter". Hence the need for
methods that work with integers. The counter might start at -1, hence
intptr_t.
I think it's clear what I need to do here:
I need to edit the GNU g++ compiler to add a new command line option
"-mcx16-force", but actually I will name it "-mlockfree2ptrs". When this
command line option is given, the following boolean is true at compile time:
atomic< __uint128_t >::is_always_lock_free
And when you work with this type, the assembler is placed inline (i.e. it
doesn't call into libatomic).
I will also add a second command line option, "-mlockfree2ptrs-main=main2".
If you use this command line option, then the '_start' routine gets extra
instructions as follows:
mov $1, %eax
cpuid
bt $13, %ecx # CF = ECX[13]
jnc main2
So basically if 'cmpxchg16b' is not supported, it jumps into 'main2', where
you can do something like:
void main2(void)
{
puts("Contact Stephen on stephen_at_[hidden] to get the build you
need for your system -- you need the x86_64 build without atomic pointer
pairs");
}
And then the last thing I would do is argue to the GNU decision makers that
"-mlockfree2ptrs" should be the default, and that you should have to
disable it with "-mno-lockfree2ptrs".
To still be calling functions in libatomic for 128-Bit numbers on x86_64
going into 2026 is not good enough -- performance-critical algorithms are
being slowed down on modern day CPU's in order to accommodate old CPU's
from 19 years ago. It's not good enough.
In my work, it looks like I'll soon be tasked with a piece of software to
'speed up', and it runs on x86_64. Before I even look at the code, I think
the first thing I'll do is re-build it with my "-mlockfree2ptrs" compiler
and see if that makes it any faster.
In fact, after building my own compiler, I'll have to rebuild my own
compiler with my own compiler to make sure that libc and libstdc++ and so
on also get forced 128-Bit atomics (although I think maybe the GNU build
system does this itself automatically -- I think it builds 3 times).
Oh and just as an aside, every compiler uses something like this already in
order to implement a lockfree std::atomic< std::shared_ptr<T> >.
But first thing's first, I will write "-mlockfree2ptrs" into the GNU g++
compiler.
>
> You need -mcx16 to let GCC use that instruction, and even with that option
> atomic ops will call into libatomic which decides at runtime whether to use
> cmpxchg16b or not.
>
There are a few operating systems nowadays that refuse to boot if
'cmpxchg16b' is missing. MS-Windows won't boot up since version 8.1, as
well as openSUSE.
'cmpxchg16b' became commonplace 19 years ago. It doesn't make sense that a
programmer in 2025 should be burdened by old hardware from 2006.
>> Why should this type have those conversations to/from integers when
> std::atomic<void*> doesn't?
>
> Why would both signed and unsigned forms be needed?
>
Instead of dealing with a "pointer + pointer", the lockfree container might
deal with "pointer + tag" or "pointer + counter". Hence the need for
methods that work with integers. The counter might start at -1, hence
intptr_t.
I think it's clear what I need to do here:
I need to edit the GNU g++ compiler to add a new command line option
"-mcx16-force", but actually I will name it "-mlockfree2ptrs". When this
command line option is given, the following boolean is true at compile time:
atomic< __uint128_t >::is_always_lock_free
And when you work with this type, the assembler is placed inline (i.e. it
doesn't call into libatomic).
I will also add a second command line option, "-mlockfree2ptrs-main=main2".
If you use this command line option, then the '_start' routine gets extra
instructions as follows:
mov $1, %eax
cpuid
bt $13, %ecx # CF = ECX[13]
jnc main2
So basically if 'cmpxchg16b' is not supported, it jumps into 'main2', where
you can do something like:
void main2(void)
{
puts("Contact Stephen on stephen_at_[hidden] to get the build you
need for your system -- you need the x86_64 build without atomic pointer
pairs");
}
And then the last thing I would do is argue to the GNU decision makers that
"-mlockfree2ptrs" should be the default, and that you should have to
disable it with "-mno-lockfree2ptrs".
To still be calling functions in libatomic for 128-Bit numbers on x86_64
going into 2026 is not good enough -- performance-critical algorithms are
being slowed down on modern day CPU's in order to accommodate old CPU's
from 19 years ago. It's not good enough.
In my work, it looks like I'll soon be tasked with a piece of software to
'speed up', and it runs on x86_64. Before I even look at the code, I think
the first thing I'll do is re-build it with my "-mlockfree2ptrs" compiler
and see if that makes it any faster.
In fact, after building my own compiler, I'll have to rebuild my own
compiler with my own compiler to make sure that libc and libstdc++ and so
on also get forced 128-Bit atomics (although I think maybe the GNU build
system does this itself automatically -- I think it builds 3 times).
Oh and just as an aside, every compiler uses something like this already in
order to implement a lockfree std::atomic< std::shared_ptr<T> >.
But first thing's first, I will write "-mlockfree2ptrs" into the GNU g++
compiler.
Received on 2025-12-29 14:17:58
