C++ Logo

std-proposals

Advanced search

[std-proposals] The real problems with optimizing C++ are not getting better

From: Adrian Johnston <adrian3_at_[hidden]>
Date: Sun, 24 Aug 2025 18:12:00 -0700
Hello,

(If you spend a lot of time looking at generated assembly you might
want to skip this one.)

As we all know the compiler's budget for optimizing C++ is an
implementation-defined metric where it knows best what is needed and
we are not supposed to be doing the compiler's job for it.
Unfortunately, this view has never really worked well in practice.
Recently I spent some time looking at the generated assembly from gcc
and clang and it doesn't look like the situation has improved in 30
years.

I will make a couple proposals but let me explain the problems first.

The keyword const never mattered to the code generator before and it
seems the keyword constexpr doesn't matter now. The compiler may be
willing to execute constexpr code in a manifestly-const context but it
isn't going to do anything extra to optimize the generated assembly
otherwise. This is bothersome because everyone learning C++ is going
around acting like the constexpr keyword results in their runtime code
being more thoroughly evaluated at compile time when it doesn't even
matter at all. This is more of an education issue, but I wanted to
point out how dangerous the wording around things being made "possible
to evaluate at compile time" was for the uninitiated.

Meanwhile gcc/clang are still perfectly happy to inline and execute
arbitrary non-const code at link time clear across different
translation units. Let me give you an example of code inlining using a
simple insertion sort template that just uses pointers and operator<:

int example1() {
   int x[3] = { 7, 3 };
   hxinsertion_sort(x+0, x+2);
   printf("%d %d", x[0], x[1]);
}

This results in the following assembly:

       .string "%d %d"
       sub rsp, 8
       mov edx, 7
       mov esi, 3
       xor eax, eax
       mov edi, OFFSET FLAT:.LC0
       call "printf"

This means the two numbers were sorted at compile time without any of
the new C++ constant evaluation machinery involved. Meanwhile, if you
write that with std::sort you get this:

       .string "%d %d"
       movabs rax, 12884901895
       sub rsp, 24
       mov rdi, rsp
       lea rsi, [rsp+8]
       mov QWORD PTR [rsp], rax
       mov DWORD PTR [rsp+8], 0
       call "void std::__insertion_sort<int*,
__gnu_cxx::__ops::_Iter_less_iter>(int*, int*,
__gnu_cxx::__ops::_Iter_less_iter) [clone .isra.0]"
       mov edx, DWORD PTR [rsp+4]
       mov esi, DWORD PTR [rsp]
       xor eax, eax
       mov edi, OFFSET FLAT:.LC1
       call "printf"

It appears that the real problem here is actually that the standard
template library uses iterators. Sacrilege you say? Well, we know the
compiler has an optimization budget and we know that inlining does
require some work by the compiler.... I would argue that it is
impressive that the compiler was able to call insertion sort directly
without invoking a partitioning scheme first.

So, my first proposal is to have a template library that can operate
using pointers and arrays without extra templated abstraction layers.
Then the compiler does a better job, your compiler errors are nice and
clean and the debugger is a relative joy to use. And when it comes to
safety, the clang sanitizers can be used to make raw pointers just as
safe as iterators these days.

If that seems unnecessary because the compiler should be up to the
task at hand then please consider my second proposal. Why don't we
have a way to convince the compiler to do what we are telling everyone
it does? I have seen a professionally written C++ codebase spend 3% of
its time inside std::vector::operator[] with all optimizations turned
on. (This is why the standard library gets banned from real-time
embedded projects.) If we required support for -O9 and told everyone
they may have to let the compiler run overnight then at least all this
talk about what is possible with compile time evaluation would be less
deceptive.

I know neither of these suggestions are very savory. But the situation
isn't really improving within the current set of constraints. Again, I
am happy to write a proposal if anyone thinks it would be well
received.

Received on 2025-08-25 01:12:12