On Mon, 29 Dec 2025, 22:28 Andrey Semashev via Std-Proposals, <std-proposals@lists.isocpp.org> wrote:
On 30 Dec 2025 01:11, Jonathan Wakely wrote:
>
>
> On Mon, 29 Dec 2025, 14:53 Andrey Semashev via Std-Proposals, <std-
> proposals@lists.isocpp.org <mailto:std-proposals@lists.isocpp.org>> wrote:
>
>     On 29 Dec 2025 17:17, Frederick Virchanza Gotham via Std-Proposals
>     wrote:
>     >
>     > I think it's clear what I need to do here:
>     >
>     > I need to edit the GNU g++ compiler to add a new command line
>     option "-
>     > mcx16-force", but actually I will name it "-mlockfree2ptrs". When this
>     > command line option is given, the following boolean is true at
>     compile time:
>     >
>     >     atomic< __uint128_t >::is_always_lock_free
>     >
>     > And when you work with this type, the assembler is placed inline (i.e.
>     > it doesn't call into libatomic).
>
>     I think, this should already be the case if you specify a recent enough
>     -march (or a sufficient combination of -m flags). It partly is with
>     clang, but not with gcc. I think, you should report bugs to compiler
>     devs.
>
> It's not a bug.
>
> -march says which instructions can be used for a given translation unit
> (or even for a given function within a translation unit) but the choice
> of how to implement atomic operations on a memory location is not local
> to a single function or a single TU.
>
> If one function uses cmpxchg16b to perform a read-modify-write operation
> on a variable and another function uses a lock, you have a problem.
> There is no requirement for all functions or all TUs to be compiled with
> the same -march option. 
>
> So GCC will not use cmpxchg16b for __atomic_compare_exchange, it will
> make a call to libatomic for all TUs. Inside libatomic it will check the
> CPU flags at runtime and use cmpxchg16b if supported, but that means
> it's consistent for all TUs doing 16-byte __atomic_compare_exchange.
> Either all such calls use cmpxchg16b for a given process, or none do.
> The performance is fine, but the compile-time is_always_lock_free
> constant is false because it depends on runtime properties. 

Thanks for the explanation. But I still think that there should be a way
to tell the compiler that it should just use cmpxchg16b because that's
my minimum requirement for a CPU. More so if the instruction is required
by the OS even before the program is run.

That's 
PR 94649 – 16-byte aligned atomic_compare_exchange doesn not generate cmpxcg16b on x86_64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649


Ironically, one could try using is_always_lock_free to enforce that
hardware requirement at compile time, but with the current
implementation that wouldn't work.

--
Std-Proposals mailing list
Std-Proposals@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals