ISOCPP std-proposals List: Re: [std-proposals] The real problems with optimizing C++ are not getting better

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Tue, 26 Aug 2025 11:01:27 +0100

On Mon, 25 Aug 2025 at 02:12, Adrian Johnston via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Hello,
>
> (If you spend a lot of time looking at generated assembly you might
> want to skip this one.)
>
> As we all know the compiler's budget for optimizing C++ is an
> implementation-defined metric where it knows best what is needed and
> we are not supposed to be doing the compiler's job for it.
> Unfortunately, this view has never really worked well in practice.
> Recently I spent some time looking at the generated assembly from gcc
> and clang and it doesn't look like the situation has improved in 30
> years.
>

This is nonsense.

Optimizations like autovectorization and conditional devirtualization do
lots that wasn't possible 30 years ago. A problem is that many programs are
much bigger than they were 30 years ago.

> I will make a couple proposals but let me explain the problems first.
>
> The keyword const never mattered to the code generator before and it
> seems the keyword constexpr doesn't matter now. The compiler may be
> willing to execute constexpr code in a manifestly-const context but it
> isn't going to do anything extra to optimize the generated assembly
> otherwise. This is bothersome because everyone learning C++ is going
> around acting like the constexpr keyword results in their runtime code
> being more thoroughly evaluated at compile time when it doesn't even
> matter at all. This is more of an education issue, but I wanted to
> point out how dangerous the wording around things being made "possible
> to evaluate at compile time" was for the uninitiated.
>
> Meanwhile gcc/clang are still perfectly happy to inline and execute
> arbitrary non-const code at link time clear across different
> translation units. Let me give you an example of code inlining using a
> simple insertion sort template that just uses pointers and operator<:
>
> int example1() {
> int x[3] = { 7, 3 };
> hxinsertion_sort(x+0, x+2);
> printf("%d %d", x[0], x[1]);
> }
>
> This results in the following assembly:
>
> .string "%d %d"
> sub rsp, 8
> mov edx, 7
> mov esi, 3
> xor eax, eax
> mov edi, OFFSET FLAT:.LC0
> call "printf"
>
> This means the two numbers were sorted at compile time without any of
> the new C++ constant evaluation machinery involved. Meanwhile, if you
> write that with std::sort you get this:
>
> .string "%d %d"
> movabs rax, 12884901895
> sub rsp, 24
> mov rdi, rsp
> lea rsi, [rsp+8]
> mov QWORD PTR [rsp], rax
> mov DWORD PTR [rsp+8], 0
> call "void std::__insertion_sort<int*,
> __gnu_cxx::__ops::_Iter_less_iter>(int*, int*,
> __gnu_cxx::__ops::_Iter_less_iter) [clone .isra.0]"
> mov edx, DWORD PTR [rsp+4]
> mov esi, DWORD PTR [rsp]
> xor eax, eax
> mov edi, OFFSET FLAT:.LC1
> call "printf"
>
> It appears that the real problem here is actually that the standard
> template library uses iterators.

No, you're misreading the assembly. The only "iterators" here are pointers.
The word "iter" in a symbol name doesn't mean there's an iterator class
involved.

> Sacrilege you say? Well, we know the
> compiler has an optimization budget and we know that inlining does
> require some work by the compiler.... I would argue that it is
> impressive that the compiler was able to call insertion sort directly
> without invoking a partitioning scheme first.
>
> So, my first proposal is to have a template library that can operate
> using pointers and arrays without extra templated abstraction layers.
>

That already exists.

>
>

Received on 2025-08-26 10:01:48