C++ Logo

std-proposals

Advanced search

Re: [std-proposals] std::atomic_pointer_pair

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 31 Dec 2025 16:45:51 -0300
On Wednesday, 31 December 2025 15:25:52 Brasilia Standard Time Frederick
Virchanza Gotham via Std-Proposals wrote:
> The problem here can be generalised.
>
> Here's the generalisation:
> "The CPU I'm working with is version X, and I want to take full advantage
> of all the possible optimizations that come with version X. I don't want to
> be burdened by backward compatibility with versions earlier than X (nor
> forward-compatibility with versions later than X).

There are very few operations that would burden you with those penalties. Off
the top of my head, we're discussing the only two available in the compilers.

Beyond that, we'd need a completely new ABI so *some* of the new registers
(new in the last 15 years) are callee-saved. I'd slip in the death of copy
relocations too (make -mno-direct-extern-access the default).

That's QoI. A new ABI is not going to happen, it's just too much of a cost for
too little benefit.

> One of these backward-compatible pessimisations is solved with:
> -mno-vzeroupper

This could be fixed by the linker or dynamic linker. Both the compiler and the
linker know when you're making calls to inside of the same shared object (if
you're doing your job right, which almost no one does), so they could be told
to skip the VZEROUPPER. The compiler could emit a relocation that the linker
relaxes to an identical-length NOP when the target is found to be in an .o file
marked with x86-64-v3 ISA. Across DSOs it's harder, but a similar solution can
be found.

Is it worth? I don't know, some benchmarking is required.

Probably not, because there's a much bigger gain to be had instead: compile
the target code with AVX. If you're interested in performance, this is what
you should do, not fix the ability to call legacy code. And you should call
upon your software vendors to have x86-64-v3 binaries for all the libraries
they provide.

> And perhaps in the future we would remedy another pessimisation with:
> -mno-lockatomic16

If libatomic uses proper IFUNC (it does), the cost of the call is pretty much
negligible when compared to the atomic operation itself, especially if the
libatomic code is in cache. And if you're using this in performance-critical
code, it probably will be; if you're not, then it hardly matters, doesn't it?

> But who wants to keep track of all these flags needed to pass to the
> compiler? There should be one flag called something like "-mno-mixing" or
> "-m1" that pulls in all these individual flags to disable these
> pessimisations.

There's a cost involved in adding them, in terms of compiler complexity and
testing. Maybe get the GCC folks to add the list of "things we'd do if/when we
next break BC".

Anyway, this all off-topic to the Standards list.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Data Center - Platform & Sys. Eng.

Received on 2025-12-31 19:45:59