C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Every variable is volatile, everything is laundered, no optimisation

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Sun, 27 Aug 2023 17:14:13 +0200
To stay with the simple example of the loop:   There are two errors in the program:   1. The assumption that it is valid for signed integers to overflow. It is UB. 2. The condition >= 0 in combination with only incrementing and with 1) is always true   There are two ways how to handle it in your sense:   A. Define a result of the undefined signed increment operation (wrap-around) to make the program correct.   or   B. Keep the UB, but compile the program with less optimizations. Kind of the -O0 solution.     Problems with those approaches:   For A) Not all UB can or should get a defined result or would have a natural result. For many programming errors even giving a defined result would not help: E.g. nullptr dereference could construct a new local temporary object, if the type has a default constructor. Perhaps the program would not crash in 65% of cases, but it was intended in 0%. Also defining everything would make the programs really slow. Slower than today's debug builds.   For B) The meaning of the programmer is not obvious as there is still UB involved. On platforms, which e.g. crash at signed overflow, one would not get the intended result.     The question is still why would one want to do that? Buggy programs should be fixed, not be compiled as a program with similar meaning. Otherwise we can introduce automatic name lookup for wrongly spelled function names or compilers with AI models "improving" our programs by introducing some changes in the background. Only valid defined programs should be created by programmers.   If you want to hack, why not use or create implementation-specific gcc/llvm intrinsics and attributes. Then you can access the vtable, freely convert types, turn on/off optimizations, etc. You could even specify static asserts that the first four entries of the vtable are compatible.   -----Ursprüngliche Nachricht----- Von:Frederick Virchanza Gotham via Std-Proposals <std-proposals_at_[hidden]> Gesendet:So 27.08.2023 15:31 Betreff:Re: [std-proposals] Every variable is volatile, everything is laundered, no optimisation An:std-proposals_at_[hidden]; CC:Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>; I reply in series to Sebastian, Jason and Thiago below. Sebastian wrote: > > Not all instances of UB can be detected and not all instances of UB can be defined. I'm talk specifically about situations where the compiler makes the determination:    "That would result in UB and I can therefore take certain liberties here, such as not bothering to check if the signed integer goes from positive to negative" So I'm talking specifically about situations where UB is detected. Sebastian wrote: > If it would be so easy than the compilers would just give an compile error for all instances of UB. Yeah I agree with you here. If a loop checks to see if a positive integer increments to negative then I think the compiler should at the very least issue a warning. I was playing around with the GNU g++ compiler yesterday on GodBolt, using it in the "-O3" mode, and I was surprised to see that it never neglects to check if a positive integer has incremented to become negative. I reckon the guys at GNU figured that if a person is letting a signed integer overflow in a loop then there are only two explanations: (1) The programmer is unaware that signed integer overflow is UB (2) The programmer is aware that signed integer overflow is UB, but they're familar with the architecture and know that it's safe to increment INT_MAX to INT_MIN, but they didn't anticipate that the compiler would optimise-out the check to see if 'i' has become negative. In either case, omitting the check on 'i' isn't helpful. Sebastian wrote: > What you want is a language without abstract machine, but with concrete machine. Not only C++, > but even the hardware is moving further and further from it. See speculative, out-of-order, ... execution. We still have computers.... we haven't hooked up copper wires to a jelly fish's spleen yet to do all the computation. The future's coming and we're probably gonna be doing all sorts of mad stuff 200 years from now (if we're not extinct), but for the time being we're still dealing with 0's and 1's made by voltages on transistors. Sebastian wrote: > My advice is: Stay within defined behavior and you have the cozy situation of not having to worry about >that all. If you want to hack at the interface between abstract and concrete machine, than you have to > embrace all those subtleties. I'm talking about repair jobs where the damage has already been done. I'm talking about compensating for a bug, or compensating for a deliberate avoidance to break ABI. Jason wrote: > "Obviously intended"? What makes you say that? After all, if that were > the user's intent, then "obviously" they would write the C++ code that > would actually *do that*, rather than relying on UB: > > ``` > for ( unsigned int i = 0; i < (unsigned > int)std::numeric_limits<int>::max(); ++i ) > { >   if ( SomeFunc((int)i) ) break; > } > > SomeFunc(-1); > } > ``` I haven't done a poll but I reckon 9 out of 10 C++ programmers don't know that the compiler _might_ neglect to check if a positive number has incremented negatively. Jason wrote: > That's why the committee standardized two's complement signed integers > to begin with. Maybe we should have a new keyword in the language: '_Fsigned' An integer type marked as '_Fsigned' would have defined behaviour where INT_MAX increments to INT_MIN. Jason wrote: > There's that phrase again: "obviously intended". When I say 'obviously intended', I mean read the code line by line and just see plainly what it's meant to do. If you see:    int *p = (int*)((char*)0x80000u + SomeFunc()); It's obvious that the programmer wants to call SomeFunc to retrieve an offset which will be applied to the address 0x80000u which will then be put inside a 'pointer to int' (presumably later to read or write to an 'int' at that address). Another example is:    for ( int i = 0; i >= 0; ++i ) {    ......    }; It's plain to see that the programmer wants to check, upon each iteration of the loop, whether 'i' has become negative. Thiago wrote: > > The correct fix is then to have both of them to derive from a common base > class, which you can then cast either to. That solution would be correct and > work on any compiler, any architecture, any word or pointer size, whether > optimisations are turned on or not. That's quite different from: > > >  // ### HACK HACK HACK - Clean me up ASAP > > Because you don't know why it worked, you don't know long it will keep on > working. You don't know what the boundary conditions of it working or failing > again are. And the only worse thing than failing spectacularly is failing > silently, where things appear to work, but are producing silent garbage and > that is being propagated out (note: dealing with Silent Data Errors is my > $DAYJOB). I can't change the SDK binary that they already have. I can check the SHA256 hashsum of the binary before applying the hack so that I'm sure it'll work. > What you're arguing for is that the standard should define all behaviours, at > least under a certain language mode. You're not the first to argue this -- > search the mailing list for the term "optimisationists" and you'll find posts > by another person whose name escapes me now who was arguing that our trying to > extract the most performance out of the computers was hurting our ability to > have safe code. No that's not what I mean. I'm specifically talking about where the compiler performs an optimisation where it detects UB (such as neglecting to check if a signed int has incremented from positive to negative). -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2023-08-27 15:14:15