Date: Mon, 29 Dec 2025 22:11:16 +0000
On Mon, 29 Dec 2025, 14:53 Andrey Semashev via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> On 29 Dec 2025 17:17, Frederick Virchanza Gotham via Std-Proposals wrote:
> >
> > I think it's clear what I need to do here:
> >
> > I need to edit the GNU g++ compiler to add a new command line option "-
> > mcx16-force", but actually I will name it "-mlockfree2ptrs". When this
> > command line option is given, the following boolean is true at compile
> time:
> >
> > atomic< __uint128_t >::is_always_lock_free
> >
> > And when you work with this type, the assembler is placed inline (i.e.
> > it doesn't call into libatomic).
>
> I think, this should already be the case if you specify a recent enough
> -march (or a sufficient combination of -m flags). It partly is with
> clang, but not with gcc. I think, you should report bugs to compiler devs.
>
It's not a bug.
-march says which instructions can be used for a given translation unit (or
even for a given function within a translation unit) but the choice of how
to implement atomic operations on a memory location is not local to a
single function or a single TU.
If one function uses cmpxchg16b to perform a read-modify-write operation on
a variable and another function uses a lock, you have a problem. There is
no requirement for all functions or all TUs to be compiled with the same
-march option.
So GCC will not use cmpxchg16b for __atomic_compare_exchange, it will make
a call to libatomic for all TUs. Inside libatomic it will check the CPU
flags at runtime and use cmpxchg16b if supported, but that means it's
consistent for all TUs doing 16-byte __atomic_compare_exchange. Either all
such calls use cmpxchg16b for a given process, or none do. The performance
is fine, but the compile-time is_always_lock_free constant is false because
it depends on runtime properties.
> https://godbolt.org/z/5j8aseGMG
>
> > I will also add a second command line option, "-mlockfree2ptrs-
> > main=main2". If you use this command line option, then the '_start'
> > routine gets extra instructions as follows:
> >
> > mov $1, %eax
> > cpuid
> > bt $13, %ecx # CF = ECX[13]
> > jnc main2
> >
> > So basically if 'cmpxchg16b' is not supported, it jumps into 'main2',
> > where you can do something like:
> >
> > void main2(void)
> > {
> > puts("Contact Stephen on stephen_at_[hidden]
> > <mailto:stephen_at_[hidden]> to get the build you need for your system
> > -- you need the x86_64 build without atomic pointer pairs");
> > }
>
> Why do you need a separate main2() if you can already do all of this in
> the standard main()?
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
std-proposals_at_[hidden]> wrote:
> On 29 Dec 2025 17:17, Frederick Virchanza Gotham via Std-Proposals wrote:
> >
> > I think it's clear what I need to do here:
> >
> > I need to edit the GNU g++ compiler to add a new command line option "-
> > mcx16-force", but actually I will name it "-mlockfree2ptrs". When this
> > command line option is given, the following boolean is true at compile
> time:
> >
> > atomic< __uint128_t >::is_always_lock_free
> >
> > And when you work with this type, the assembler is placed inline (i.e.
> > it doesn't call into libatomic).
>
> I think, this should already be the case if you specify a recent enough
> -march (or a sufficient combination of -m flags). It partly is with
> clang, but not with gcc. I think, you should report bugs to compiler devs.
>
It's not a bug.
-march says which instructions can be used for a given translation unit (or
even for a given function within a translation unit) but the choice of how
to implement atomic operations on a memory location is not local to a
single function or a single TU.
If one function uses cmpxchg16b to perform a read-modify-write operation on
a variable and another function uses a lock, you have a problem. There is
no requirement for all functions or all TUs to be compiled with the same
-march option.
So GCC will not use cmpxchg16b for __atomic_compare_exchange, it will make
a call to libatomic for all TUs. Inside libatomic it will check the CPU
flags at runtime and use cmpxchg16b if supported, but that means it's
consistent for all TUs doing 16-byte __atomic_compare_exchange. Either all
such calls use cmpxchg16b for a given process, or none do. The performance
is fine, but the compile-time is_always_lock_free constant is false because
it depends on runtime properties.
> https://godbolt.org/z/5j8aseGMG
>
> > I will also add a second command line option, "-mlockfree2ptrs-
> > main=main2". If you use this command line option, then the '_start'
> > routine gets extra instructions as follows:
> >
> > mov $1, %eax
> > cpuid
> > bt $13, %ecx # CF = ECX[13]
> > jnc main2
> >
> > So basically if 'cmpxchg16b' is not supported, it jumps into 'main2',
> > where you can do something like:
> >
> > void main2(void)
> > {
> > puts("Contact Stephen on stephen_at_[hidden]
> > <mailto:stephen_at_[hidden]> to get the build you need for your system
> > -- you need the x86_64 build without atomic pointer pairs");
> > }
>
> Why do you need a separate main2() if you can already do all of this in
> the standard main()?
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2025-12-29 22:11:35
