Date: Wed, 31 Dec 2025 14:26:46 +0000
On Monday, December 29, 2025, Jonathan Wakely wrote:
>
>
> It's not a bug.
>
> -march says which instructions can be used for a given translation unit
> (or even for a given function within a translation unit) but the choice of
> how to implement atomic operations on a memory location is not local to a
> single function or a single TU.
>
> If one function uses cmpxchg16b to perform a read-modify-write operation
> on a variable and another function uses a lock, you have a problem. There
> is no requirement for all functions or all TUs to be compiled with the same
> -march option.
>
Just to drill down into this a bit.
We can already use the "-march" flag to produce object files that won't
work together, I mean I can do:
g++ -c a.cpp -march=armv9-a
g++ -c b.cpp -march=x86-64
g++ a.o b.o -march=x86-64
So I think that the argument you're making, Jonathan, is that if there are
two architectures, the GNU compiler will work properly and produce a viable
program so long as one of the architectures is a superset of the other.
"armv9-a" is neither a subset nor a superset of "x86-64" so neither should
be burdened by the other in selecting what CPU instructions to use.
"x86-64-v2" is however a superset of "x86-64", and if I understand you
correctly Jonathan, then what you're saying is that an "x86-64-v2" object
file must be compiled in such a way so that it will link successfully and
work properly with other object files compiled with x86-64, or x86-64-v3,
or x86-64-v4. Do I understand correctly? That is to say, those four
architectures are considered to be a family, and all family members must be
intercompatible, right?
I do see the rationale here, I can see why the GNU decision makers decided
to do this -- the logic is not lost on me -- but I also think it means that
x86-64-v2 users suffer an unnecessary performance penalty (i.e. jumping
into a function that contains a 'cmpxchg16b' instruction instead of just
placing the instruction inline).
I think there might be a simple solution to keep everyone happy. Consider
the following line:
g++ -c b.cpp -march=x86-64-v2
As things currently stand, when the GNU compiler compiles 'b.cpp', it must
assume that 'b.o' could be linked into a program built with
"-march=x86-64", and therefore it must jump into the function
'__atomic_compare_exchange' instead of placing 'cmpxchg16b' inline.
How about having a new command line option, "-m1"? This command line option
tells the compiler that all object files will be compiled with exactly the
same set of architecture flags, and therefore if you use
"-march=x86-64-v2", the compiler can assume that all object files will use
the "x86-64-v2 strategy" for accessing a 128-Bit atomic variable, and
therefore this means that the compiler can place the 'cmpxchg16b'
instruction inline. It also means that 'atomic< __uint128_t
>::is_always_lock_free' will be true at compile time.
I was thinking of a few names other than '-m1' for the new flag:
-mmono
-mrestrict
-mone
-msame
-mno-compatibility
-mno-mixing
Programmers could then get the absolute best performance out of their local
machine with:
g++ -march=native -m1
And we could combine those two flags into one as follows:
g++ -march=native1
I realise I'm talking in depth here about _one_ compiler and that we're at
the borderline of needing to take this over to the GNU mailing list, but
for the time being I think this is okay here because the compile-time value
of atomic< __uint128_t >::is_always_lock_free has implications for my
proposed new standard library type: std::atomic_pointer_pair.
>
>
> It's not a bug.
>
> -march says which instructions can be used for a given translation unit
> (or even for a given function within a translation unit) but the choice of
> how to implement atomic operations on a memory location is not local to a
> single function or a single TU.
>
> If one function uses cmpxchg16b to perform a read-modify-write operation
> on a variable and another function uses a lock, you have a problem. There
> is no requirement for all functions or all TUs to be compiled with the same
> -march option.
>
Just to drill down into this a bit.
We can already use the "-march" flag to produce object files that won't
work together, I mean I can do:
g++ -c a.cpp -march=armv9-a
g++ -c b.cpp -march=x86-64
g++ a.o b.o -march=x86-64
So I think that the argument you're making, Jonathan, is that if there are
two architectures, the GNU compiler will work properly and produce a viable
program so long as one of the architectures is a superset of the other.
"armv9-a" is neither a subset nor a superset of "x86-64" so neither should
be burdened by the other in selecting what CPU instructions to use.
"x86-64-v2" is however a superset of "x86-64", and if I understand you
correctly Jonathan, then what you're saying is that an "x86-64-v2" object
file must be compiled in such a way so that it will link successfully and
work properly with other object files compiled with x86-64, or x86-64-v3,
or x86-64-v4. Do I understand correctly? That is to say, those four
architectures are considered to be a family, and all family members must be
intercompatible, right?
I do see the rationale here, I can see why the GNU decision makers decided
to do this -- the logic is not lost on me -- but I also think it means that
x86-64-v2 users suffer an unnecessary performance penalty (i.e. jumping
into a function that contains a 'cmpxchg16b' instruction instead of just
placing the instruction inline).
I think there might be a simple solution to keep everyone happy. Consider
the following line:
g++ -c b.cpp -march=x86-64-v2
As things currently stand, when the GNU compiler compiles 'b.cpp', it must
assume that 'b.o' could be linked into a program built with
"-march=x86-64", and therefore it must jump into the function
'__atomic_compare_exchange' instead of placing 'cmpxchg16b' inline.
How about having a new command line option, "-m1"? This command line option
tells the compiler that all object files will be compiled with exactly the
same set of architecture flags, and therefore if you use
"-march=x86-64-v2", the compiler can assume that all object files will use
the "x86-64-v2 strategy" for accessing a 128-Bit atomic variable, and
therefore this means that the compiler can place the 'cmpxchg16b'
instruction inline. It also means that 'atomic< __uint128_t
>::is_always_lock_free' will be true at compile time.
I was thinking of a few names other than '-m1' for the new flag:
-mmono
-mrestrict
-mone
-msame
-mno-compatibility
-mno-mixing
Programmers could then get the absolute best performance out of their local
machine with:
g++ -march=native -m1
And we could combine those two flags into one as follows:
g++ -march=native1
I realise I'm talking in depth here about _one_ compiler and that we're at
the borderline of needing to take this over to the GNU mailing list, but
for the time being I think this is okay here because the compile-time value
of atomic< __uint128_t >::is_always_lock_free has implications for my
proposed new standard library type: std::atomic_pointer_pair.
Received on 2025-12-31 14:26:49
