On Mon, 25 Aug 2025 at 02:12, Adrian Johnston via Std-Proposals <std-proposals@lists.isocpp.org> wrote:

Hello,

(If you spend a lot of time looking at generated assembly you might
want to skip this one.)

As we all know the compiler's budget for optimizing C++ is an
implementation-defined metric where it knows best what is needed and
we are not supposed to be doing the compiler's job for it.
Unfortunately, this view has never really worked well in practice.
Recently I spent some time looking at the generated assembly from gcc
and clang and it doesn't look like the situation has improved in 30
years.

This is nonsense.

Optimizations like autovectorization and conditional devirtualization do lots that wasn't possible 30 years ago. A problem is that many programs are much bigger than they were 30 years ago.

I will make a couple proposals but let me explain the problems first.

The keyword const never mattered to the code generator before and it
seems the keyword constexpr doesn't matter now. The compiler may be
willing to execute constexpr code in a manifestly-const context but it
isn't going to do anything extra to optimize the generated assembly
otherwise. This is bothersome because everyone learning C++ is going
around acting like the constexpr keyword results in their runtime code
being more thoroughly evaluated at compile time when it doesn't even
matter at all. This is more of an education issue, but I wanted to
point out how dangerous the wording around things being made "possible
to evaluate at compile time" was for the uninitiated.

Meanwhile gcc/clang are still perfectly happy to inline and execute
arbitrary non-const code at link time clear across different
translation units. Let me give you an example of code inlining using a
simple insertion sort template that just uses pointers and operator<:

int example1() {
int x[3] = { 7, 3 };
hxinsertion_sort(x+0, x+2);
printf("%d %d", x[0], x[1]);
}

This results in the following assembly:

.string "%d %d"
sub rsp, 8
mov edx, 7
mov esi, 3
xor eax, eax
mov edi, OFFSET FLAT:.LC0
call "printf"

This means the two numbers were sorted at compile time without any of
the new C++ constant evaluation machinery involved. Meanwhile, if you
write that with std::sort you get this:

.string "%d %d"
movabs rax, 12884901895
sub rsp, 24
mov rdi, rsp
lea rsi, [rsp+8]
mov QWORD PTR [rsp], rax
mov DWORD PTR [rsp+8], 0
call "void std::__insertion_sort<int*,
__gnu_cxx::__ops::_Iter_less_iter>(int*, int*,
__gnu_cxx::__ops::_Iter_less_iter) [clone .isra.0]"
mov edx, DWORD PTR [rsp+4]
mov esi, DWORD PTR [rsp]
xor eax, eax
mov edi, OFFSET FLAT:.LC1
call "printf"

It appears that the real problem here is actually that the standard
template library uses iterators.

No, you're misreading the assembly. The only "iterators" here are pointers. The word "iter" in a symbol name doesn't mean there's an iterator class involved.

Sacrilege you say? Well, we know the
compiler has an optimization budget and we know that inlining does
require some work by the compiler.... I would argue that it is
impressive that the compiler was able to call insertion sort directly
without invoking a partitioning scheme first.

So, my first proposal is to have a template library that can operate
using pointers and arrays without extra templated abstraction layers.

That already exists.