I’d like to move onto less unpopular proposals after this…

No, you're misreading the assembly. The only "iterators" here are pointers.”

I’m not claiming iterators are directly visible in the assembly. I am claiming that they add inlining overhead.

I am also not saying that the compiler was insufficiently aggressive at inlining.  Actually, what I observed was that the cost of inlining was deducted from the budget for the constant folding pass and so the constant folding pass was getting cut short. In an era where constant folding is being encouraged.

It seems the compiler is just willing to do more constant folding when you tell it that the code base is more modern.  I can see myself making that same call too.





On Tuesday, August 26, 2025, Jonathan Wakely <cxx@kayari.org> wrote:


On Mon, 25 Aug 2025 at 02:12, Adrian Johnston via Std-Proposals <std-proposals@lists.isocpp.org> wrote:
Hello,

(If you spend a lot of time looking at generated assembly you might
want to skip this one.)

As we all know the compiler's budget for optimizing C++ is an
implementation-defined metric where it knows best what is needed and
we are not supposed to be doing the compiler's job for it.
Unfortunately, this view has never really worked well in practice.
Recently I spent some time looking at the generated assembly from gcc
and clang and it doesn't look like the situation has improved in 30
years.

This is nonsense. 

Optimizations like autovectorization and conditional devirtualization do lots that wasn't possible 30 years ago. A problem is that many programs are much bigger than they were 30 years ago.



I will make a couple proposals but let me explain the problems first.

The keyword const never mattered to the code generator before and it
seems the keyword constexpr doesn't matter now. The compiler may be
willing to execute constexpr code in a manifestly-const context but it
isn't going to do anything extra to optimize the generated assembly
otherwise. This is bothersome because everyone learning C++ is going
around acting like the constexpr keyword results in their runtime code
being more thoroughly evaluated at compile time when it doesn't even
matter at all. This is more of an education issue, but I wanted to
point out how dangerous the wording around things being made "possible
to evaluate at compile time" was for the uninitiated.

Meanwhile gcc/clang are still perfectly happy to inline and execute
arbitrary non-const code at link time clear across different
translation units. Let me give you an example of code inlining using a
simple insertion sort template that just uses pointers and operator<:

int example1() {
   int x[3] = { 7, 3 };
   hxinsertion_sort(x+0, x+2);
   printf("%d %d", x[0], x[1]);
}

This results in the following assembly:

       .string "%d %d"
       sub     rsp, 8
       mov     edx, 7
       mov     esi, 3
       xor     eax, eax
       mov     edi, OFFSET FLAT:.LC0
       call    "printf"

This means the two numbers were sorted at compile time without any of
the new C++ constant evaluation machinery involved. Meanwhile, if you
write that with std::sort you get this:

       .string "%d %d"
       movabs  rax, 12884901895
       sub     rsp, 24
       mov     rdi, rsp
       lea     rsi, [rsp+8]
       mov     QWORD PTR [rsp], rax
       mov     DWORD PTR [rsp+8], 0
       call    "void std::__insertion_sort<int*,
__gnu_cxx::__ops::_Iter_less_iter>(int*, int*,
__gnu_cxx::__ops::_Iter_less_iter) [clone .isra.0]"
       mov     edx, DWORD PTR [rsp+4]
       mov     esi, DWORD PTR [rsp]
       xor     eax, eax
       mov     edi, OFFSET FLAT:.LC1
       call    "printf"

It appears that the real problem here is actually that the standard
template library uses iterators.

No, you're misreading the assembly. The only "iterators" here are pointers. The word "iter" in a symbol name doesn't mean there's an iterator class involved.

 
Sacrilege you say? Well, we know the
compiler has an optimization budget and we know that inlining does
require some work by the compiler....  I would argue that it is
impressive that the compiler was able to call insertion sort directly
without invoking a partitioning scheme first.

So, my first proposal is to have a template library that can operate
using pointers and arrays without extra templated abstraction layers.

That already exists.