C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Every variable is volatile, everything is laundered, no optimisation

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Sun, 27 Aug 2023 18:18:09 -0400
On Sun, Aug 27, 2023 at 5:20 PM Frederick Virchanza Gotham via
Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Sun, Aug 27, 2023 at 5:03 PM Thiago Macieira wrote:
> >
> > I hear what you're saying, but what you're saying is just not how things work.
> > You're making an assumption of how compilers work and that's a flawed
> > assumption.
> >
> > I mean, under some reading, it's true. The problem is that they are not coded
> > the way your sentence, as phrased, supposed they are. It's not that they
> > detected the UB and therefore made an optimisation. If that were the case,
> > then they could emit warnings for every single possible UB and let the
> > programmer know. That is clearly not possible, therefore that's clearly not
> > how they are written.
> >
> > Instead of being written to detect undefined behaviour, they're written to
> > operate on defined behaviour. What you're effectively asking for is that they
> > begin detecting the UB possibilities (all of them) and make them specified.
> >
> > Moreover, you're not asking that they all have the same behaviour: you're
> > assuming there's an underlying true behaviour that everyone agrees upon, and
> > that can be demonstrated not to be true either. This is especially true when
> > compilers have a choice of which instructions to use but whose behaviour is
> > only the same within the defined behaviour parameters of the language.
>
>
> I've never written a compiler nor been on a team of people writing a
> compiler, nor even worked in a company that has a compiler-writing
> department, nor do I frequent the same yoga studios as Jacob Navia,
> but still. . . really. . . even in all my ignorance. . . I find that
> hard to believe.
>
> If I were to write a compiler to parse the following code:
>
> for ( int i = 0; i >= 0; ++i ) DoSomething();
>
> Then I would split the text between parentheses into three parts:
> (1) int i = 0
> (2) i >= 0
> (3) ++i
>
> I would then write the assembler for each of the three parts:
> (1) sub rsp, 8; mov [rsp], 0
> (2) cmp [rsp], 0; jlt Out_Of_Loop
> (3) add [rsp], 1

This would be part of that whole "I've never written a compiler..."
thing. While I too have never written a compiler, I know enough to
know that compilers don't generally jump straight from a parse tree
into assembly.

And you've provided an excellent example of why they don't:

> And then I'd put them together something like:
>
> sub rsp, 8
> mov [rsp], 0
> Start_Loop:
> cmp [rsp], 0
> jlt Out_Of_Loop
> call DoSomething
> add [rsp], 1
> jmp Start_Loop
> OutOfLoop:
> add rsp, 8
>
> If I were then some how some way to make the decision to omit the
> third and fourth instructions, I could only have come to such a
> decision by deeply analysing what's happening inside the loop and then
> also applying a piece of pre-programmed knowledge, i.e. "incrementing
> a signed integer can never go negative and so the check for negativity
> is redundant". This is a lot more extra processing and elaborate
> compiler writing than just simply doing what the programmer wrote.

But you can't do any of that anymore. All of the information needed to
do any of those things is *gone*. That information is bound up in the
parse tree which you have since discarded once you went to assembly.
That information requires an analysis of where `i` gets used within
the loop. And that's not around anymore; it's been reduced to a
meaningless register or number on the stack. You don't even know if it
was a signed integer or not, and that's *really important* to know if
you're going to apply this code transformation.

You would have to *decompile* this back into C++ to recover the
information needed to do any real optimizations based on the actual
meaning of the code in C++.

So no, compilers do not optimize assembly. OK, they do have an entire
step where they specifically optimize assembly code, but this is not
*all* of the optimization they do. And the optimizations based on C++
behavior are not done on assembly. These require semantic
understanding of what the code actually means, and information is
largely lost when it turns into assembly.

The default state of a compiler is not a route sequence of operations
to turn letters and numbers into machine language. And in fact that's
really important if you want your compiler to be able to generate
assembly for multiple different CPU architectures. You don't want to
have to do redo all of your basic, structural optimization logic just
because you're compiling for ARM rather than x64.

> Applying the marker '__verbose' to a function would also tell the
> compiler, "Don't deeply analyse what's going on here and don't try to
> get clever with your analysis -- just convert the C++ code expression
> by expression, line by line into assembler please".
>
> I think you quite rightly mentioned in a previous post Thiago that any
> code written inside a '__verbose' function would be nonportable, and
> so why try to standardise something that's nonportable? What we can do
> is standardise a portable way of adding nonportable code to your C++
> program.

But that's not what you're asking for. You're asking for a C++
standard language feature that has specific expectations from the
implementation. But you don't say what those expectations *are*
because they're all undefined behavior.

> So if we need to compensate for a bug or for an avoidance to
> break ABI, then we can at least do it as cleanly as possible. The
> alternative would be to isolate the dodgy function into its own
> translation unit all by itself, compile it with "-O0", and then link
> it later with the rest of the program which was compiled with "-O2".Of
> course different compilers have different command line switches for
> optimisation. If we had a program that is compiled by a few different
> compilers then we could do:
>
> void Func(void) __verbose
> {
> #ifdef _BORLANC_
> // Put hack here for Borland ABI
> #elif _GNU_
> // Put hack here for GNU ABI
> #elif MICROSOFT_VCPP
> // Put hack here for Microsoft ABI
> #endif
> }
>
> This would be a lot handier than having to edit the Makefile and to
> write three different ways of isolating this function in its own
> translation unit with all optimisations disabled.

Here's the thing: hacks are, and *should be* rare. Extremely rare.
Rare things like these don't need to be in the language.

Received on 2023-08-27 22:18:22