Date: Sun, 27 Aug 2023 14:31:13 +0100
I reply in series to Sebastian, Jason and Thiago below.
Sebastian wrote:
>
> Not all instances of UB can be detected and not all instances of UB can be defined.
I'm talk specifically about situations where the compiler makes the
determination:
"That would result in UB and I can therefore take certain
liberties here, such as not bothering to check if the signed integer
goes from positive to negative"
So I'm talking specifically about situations where UB is detected.
Sebastian wrote:
> If it would be so easy than the compilers would just give an compile error for all instances of UB.
Yeah I agree with you here. If a loop checks to see if a positive
integer increments to negative then I think the compiler should at the
very least issue a warning.
I was playing around with the GNU g++ compiler yesterday on GodBolt,
using it in the "-O3" mode, and I was surprised to see that it never
neglects to check if a positive integer has incremented to become
negative. I reckon the guys at GNU figured that if a person is letting
a signed integer overflow in a loop then there are only two
explanations:
(1) The programmer is unaware that signed integer overflow is UB
(2) The programmer is aware that signed integer overflow is UB, but
they're familar with the architecture and know that it's safe to
increment INT_MAX to INT_MIN, but they didn't anticipate that the
compiler would optimise-out the check to see if 'i' has become
negative.
In either case, omitting the check on 'i' isn't helpful.
Sebastian wrote:
> What you want is a language without abstract machine, but with concrete machine. Not only C++,
> but even the hardware is moving further and further from it. See speculative, out-of-order, ... execution.
We still have computers.... we haven't hooked up copper wires to a
jelly fish's spleen yet to do all the computation. The future's coming
and we're probably gonna be doing all sorts of mad stuff 200 years
from now (if we're not extinct), but for the time being we're still
dealing with 0's and 1's made by voltages on transistors.
Sebastian wrote:
> My advice is: Stay within defined behavior and you have the cozy situation of not having to worry about
>that all. If you want to hack at the interface between abstract and concrete machine, than you have to
> embrace all those subtleties.
I'm talking about repair jobs where the damage has already been done.
I'm talking about compensating for a bug, or compensating for a
deliberate avoidance to break ABI.
Jason wrote:
> "Obviously intended"? What makes you say that? After all, if that were
> the user's intent, then "obviously" they would write the C++ code that
> would actually *do that*, rather than relying on UB:
>
> ```
> for ( unsigned int i = 0; i < (unsigned
> int)std::numeric_limits<int>::max(); ++i )
> {
> if ( SomeFunc((int)i) ) break;
> }
>
> SomeFunc(-1);
> }
> ```
I haven't done a poll but I reckon 9 out of 10 C++ programmers don't
know that the compiler _might_ neglect to check if a positive number
has incremented negatively.
Jason wrote:
> That's why the committee standardized two's complement signed integers
> to begin with.
Maybe we should have a new keyword in the language: '_Fsigned'
An integer type marked as '_Fsigned' would have defined behaviour
where INT_MAX increments to INT_MIN.
Jason wrote:
> There's that phrase again: "obviously intended".
When I say 'obviously intended', I mean read the code line by line and
just see plainly what it's meant to do. If you see:
int *p = (int*)((char*)0x80000u + SomeFunc());
It's obvious that the programmer wants to call SomeFunc to retrieve an
offset which will be applied to the address 0x80000u which will then
be put inside a 'pointer to int' (presumably later to read or write to
an 'int' at that address).
Another example is:
for ( int i = 0; i >= 0; ++i ) { ...... };
It's plain to see that the programmer wants to check, upon each
iteration of the loop, whether 'i' has become negative.
Thiago wrote:
>
> The correct fix is then to have both of them to derive from a common base
> class, which you can then cast either to. That solution would be correct and
> work on any compiler, any architecture, any word or pointer size, whether
> optimisations are turned on or not. That's quite different from:
>
>
> // ### HACK HACK HACK - Clean me up ASAP
>
> Because you don't know why it worked, you don't know long it will keep on
> working. You don't know what the boundary conditions of it working or failing
> again are. And the only worse thing than failing spectacularly is failing
> silently, where things appear to work, but are producing silent garbage and
> that is being propagated out (note: dealing with Silent Data Errors is my
> $DAYJOB).
I can't change the SDK binary that they already have. I can check the
SHA256 hashsum of the binary before applying the hack so that I'm sure
it'll work.
> What you're arguing for is that the standard should define all behaviours, at
> least under a certain language mode. You're not the first to argue this --
> search the mailing list for the term "optimisationists" and you'll find posts
> by another person whose name escapes me now who was arguing that our trying to
> extract the most performance out of the computers was hurting our ability to
> have safe code.
No that's not what I mean. I'm specifically talking about where the
compiler performs an optimisation where it detects UB (such as
neglecting to check if a signed int has incremented from positive to
negative).
Sebastian wrote:
>
> Not all instances of UB can be detected and not all instances of UB can be defined.
I'm talk specifically about situations where the compiler makes the
determination:
"That would result in UB and I can therefore take certain
liberties here, such as not bothering to check if the signed integer
goes from positive to negative"
So I'm talking specifically about situations where UB is detected.
Sebastian wrote:
> If it would be so easy than the compilers would just give an compile error for all instances of UB.
Yeah I agree with you here. If a loop checks to see if a positive
integer increments to negative then I think the compiler should at the
very least issue a warning.
I was playing around with the GNU g++ compiler yesterday on GodBolt,
using it in the "-O3" mode, and I was surprised to see that it never
neglects to check if a positive integer has incremented to become
negative. I reckon the guys at GNU figured that if a person is letting
a signed integer overflow in a loop then there are only two
explanations:
(1) The programmer is unaware that signed integer overflow is UB
(2) The programmer is aware that signed integer overflow is UB, but
they're familar with the architecture and know that it's safe to
increment INT_MAX to INT_MIN, but they didn't anticipate that the
compiler would optimise-out the check to see if 'i' has become
negative.
In either case, omitting the check on 'i' isn't helpful.
Sebastian wrote:
> What you want is a language without abstract machine, but with concrete machine. Not only C++,
> but even the hardware is moving further and further from it. See speculative, out-of-order, ... execution.
We still have computers.... we haven't hooked up copper wires to a
jelly fish's spleen yet to do all the computation. The future's coming
and we're probably gonna be doing all sorts of mad stuff 200 years
from now (if we're not extinct), but for the time being we're still
dealing with 0's and 1's made by voltages on transistors.
Sebastian wrote:
> My advice is: Stay within defined behavior and you have the cozy situation of not having to worry about
>that all. If you want to hack at the interface between abstract and concrete machine, than you have to
> embrace all those subtleties.
I'm talking about repair jobs where the damage has already been done.
I'm talking about compensating for a bug, or compensating for a
deliberate avoidance to break ABI.
Jason wrote:
> "Obviously intended"? What makes you say that? After all, if that were
> the user's intent, then "obviously" they would write the C++ code that
> would actually *do that*, rather than relying on UB:
>
> ```
> for ( unsigned int i = 0; i < (unsigned
> int)std::numeric_limits<int>::max(); ++i )
> {
> if ( SomeFunc((int)i) ) break;
> }
>
> SomeFunc(-1);
> }
> ```
I haven't done a poll but I reckon 9 out of 10 C++ programmers don't
know that the compiler _might_ neglect to check if a positive number
has incremented negatively.
Jason wrote:
> That's why the committee standardized two's complement signed integers
> to begin with.
Maybe we should have a new keyword in the language: '_Fsigned'
An integer type marked as '_Fsigned' would have defined behaviour
where INT_MAX increments to INT_MIN.
Jason wrote:
> There's that phrase again: "obviously intended".
When I say 'obviously intended', I mean read the code line by line and
just see plainly what it's meant to do. If you see:
int *p = (int*)((char*)0x80000u + SomeFunc());
It's obvious that the programmer wants to call SomeFunc to retrieve an
offset which will be applied to the address 0x80000u which will then
be put inside a 'pointer to int' (presumably later to read or write to
an 'int' at that address).
Another example is:
for ( int i = 0; i >= 0; ++i ) { ...... };
It's plain to see that the programmer wants to check, upon each
iteration of the loop, whether 'i' has become negative.
Thiago wrote:
>
> The correct fix is then to have both of them to derive from a common base
> class, which you can then cast either to. That solution would be correct and
> work on any compiler, any architecture, any word or pointer size, whether
> optimisations are turned on or not. That's quite different from:
>
>
> // ### HACK HACK HACK - Clean me up ASAP
>
> Because you don't know why it worked, you don't know long it will keep on
> working. You don't know what the boundary conditions of it working or failing
> again are. And the only worse thing than failing spectacularly is failing
> silently, where things appear to work, but are producing silent garbage and
> that is being propagated out (note: dealing with Silent Data Errors is my
> $DAYJOB).
I can't change the SDK binary that they already have. I can check the
SHA256 hashsum of the binary before applying the hack so that I'm sure
it'll work.
> What you're arguing for is that the standard should define all behaviours, at
> least under a certain language mode. You're not the first to argue this --
> search the mailing list for the term "optimisationists" and you'll find posts
> by another person whose name escapes me now who was arguing that our trying to
> extract the most performance out of the computers was hurting our ability to
> have safe code.
No that's not what I mean. I'm specifically talking about where the
compiler performs an optimisation where it detects UB (such as
neglecting to check if a signed int has incremented from positive to
negative).
Received on 2023-08-27 13:31:22