Date: Mon, 29 Dec 2025 23:13:19 +0000
On Mon, 29 Dec 2025, 22:28 Andrey Semashev via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> On 30 Dec 2025 01:11, Jonathan Wakely wrote:
> >
> >
> > On Mon, 29 Dec 2025, 14:53 Andrey Semashev via Std-Proposals, <std-
> > proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> wrote:
> >
> > On 29 Dec 2025 17:17, Frederick Virchanza Gotham via Std-Proposals
> > wrote:
> > >
> > > I think it's clear what I need to do here:
> > >
> > > I need to edit the GNU g++ compiler to add a new command line
> > option "-
> > > mcx16-force", but actually I will name it "-mlockfree2ptrs". When
> this
> > > command line option is given, the following boolean is true at
> > compile time:
> > >
> > > atomic< __uint128_t >::is_always_lock_free
> > >
> > > And when you work with this type, the assembler is placed inline
> (i.e.
> > > it doesn't call into libatomic).
> >
> > I think, this should already be the case if you specify a recent
> enough
> > -march (or a sufficient combination of -m flags). It partly is with
> > clang, but not with gcc. I think, you should report bugs to compiler
> > devs.
> >
> > It's not a bug.
> >
> > -march says which instructions can be used for a given translation unit
> > (or even for a given function within a translation unit) but the choice
> > of how to implement atomic operations on a memory location is not local
> > to a single function or a single TU.
> >
> > If one function uses cmpxchg16b to perform a read-modify-write operation
> > on a variable and another function uses a lock, you have a problem.
> > There is no requirement for all functions or all TUs to be compiled with
> > the same -march option.
> >
> > So GCC will not use cmpxchg16b for __atomic_compare_exchange, it will
> > make a call to libatomic for all TUs. Inside libatomic it will check the
> > CPU flags at runtime and use cmpxchg16b if supported, but that means
> > it's consistent for all TUs doing 16-byte __atomic_compare_exchange.
> > Either all such calls use cmpxchg16b for a given process, or none do.
> > The performance is fine, but the compile-time is_always_lock_free
> > constant is false because it depends on runtime properties.
>
> Thanks for the explanation. But I still think that there should be a way
> to tell the compiler that it should just use cmpxchg16b because that's
> my minimum requirement for a CPU. More so if the instruction is required
> by the OS even before the program is run.
>
That's
PR 94649 – 16-byte aligned atomic_compare_exchange doesn not generate
cmpxcg16b on x86_64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
> Ironically, one could try using is_always_lock_free to enforce that
> hardware requirement at compile time, but with the current
> implementation that wouldn't work.
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
std-proposals_at_[hidden]> wrote:
> On 30 Dec 2025 01:11, Jonathan Wakely wrote:
> >
> >
> > On Mon, 29 Dec 2025, 14:53 Andrey Semashev via Std-Proposals, <std-
> > proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> wrote:
> >
> > On 29 Dec 2025 17:17, Frederick Virchanza Gotham via Std-Proposals
> > wrote:
> > >
> > > I think it's clear what I need to do here:
> > >
> > > I need to edit the GNU g++ compiler to add a new command line
> > option "-
> > > mcx16-force", but actually I will name it "-mlockfree2ptrs". When
> this
> > > command line option is given, the following boolean is true at
> > compile time:
> > >
> > > atomic< __uint128_t >::is_always_lock_free
> > >
> > > And when you work with this type, the assembler is placed inline
> (i.e.
> > > it doesn't call into libatomic).
> >
> > I think, this should already be the case if you specify a recent
> enough
> > -march (or a sufficient combination of -m flags). It partly is with
> > clang, but not with gcc. I think, you should report bugs to compiler
> > devs.
> >
> > It's not a bug.
> >
> > -march says which instructions can be used for a given translation unit
> > (or even for a given function within a translation unit) but the choice
> > of how to implement atomic operations on a memory location is not local
> > to a single function or a single TU.
> >
> > If one function uses cmpxchg16b to perform a read-modify-write operation
> > on a variable and another function uses a lock, you have a problem.
> > There is no requirement for all functions or all TUs to be compiled with
> > the same -march option.
> >
> > So GCC will not use cmpxchg16b for __atomic_compare_exchange, it will
> > make a call to libatomic for all TUs. Inside libatomic it will check the
> > CPU flags at runtime and use cmpxchg16b if supported, but that means
> > it's consistent for all TUs doing 16-byte __atomic_compare_exchange.
> > Either all such calls use cmpxchg16b for a given process, or none do.
> > The performance is fine, but the compile-time is_always_lock_free
> > constant is false because it depends on runtime properties.
>
> Thanks for the explanation. But I still think that there should be a way
> to tell the compiler that it should just use cmpxchg16b because that's
> my minimum requirement for a CPU. More so if the instruction is required
> by the OS even before the program is run.
>
That's
PR 94649 – 16-byte aligned atomic_compare_exchange doesn not generate
cmpxcg16b on x86_64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
> Ironically, one could try using is_always_lock_free to enforce that
> hardware requirement at compile time, but with the current
> implementation that wouldn't work.
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2025-12-29 23:13:34
