Date: Mon, 28 Aug 2023 03:12:50 -0400
I'm the "optimizationist" guy. What I have argued is that the standards
process has been hijacked by compiler writers who have gratuitously
introduced undefined behavior as opposed to implementation-defined behavior
so that they can produce optimised code, at the expense of violating the
intentions of the programmers who wrote the program. I think the standard
is playing a "gotcha" game with programmers, trying every trick it can to
render their code invalid, under the very troubling assumption that a
program that quickly produces a result the programmer doesn't want is
better than a program that slowly produces a desired result.
A couple of years ago, as part of the contracts discussion (I worked for
Lakos at Bloomberg then), I was in violent disagreement with members of the
committee who insisted that accesses to automatic volatile variables did
not have to be treated as side effects and could be elided.
Incrementing a signed integer variable translates to some underlying
machine operation; instead of calling the increment of a variable
containing the maximum signed value undefined behavior, it should result in
whatever number the underlying operation produces. (This is not a
particularly strange notion; the original K&R book specified that C worked
in just this way.)
Accessing through a pointer should treat the memory as if it were an object
of the type the pointer points to, regardless of what the compiler can
figure out about how the memory got that way.
An uninitialized automatic variable should be treated as if it has some
fixed but unspecified value, which will be whatever bits it happens to
contain.
Order of evaluation should be strictly left-to-right for all expressions.
The whole notion that executing undefined behavior makes the behavior of
the entire program unspecified, and therefore allows compilers to pretend
that undefined behavior is never executed, is completely wrongheaded. At
the very least, it should be required that all side effects that would have
been encountered by the abstract machine ahead of the undefined behavior
must occur. The best thing would be to incorporate Ada's notion of "bounded
errors", where many things now specified as undefined behavior would become
defined to produce a range of possible outcomes, with the program
continuing to execute in most cases.
I would agree that you cannot use a pointer into a static or automatic
object to access memory outside the complete object, nor should a pointer
value that exists prior to a function being called be considered to
possibly point into the automatic variables of that function.
On Sun, Aug 27, 2023 at 12:04 AM Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Saturday, 26 August 2023 17:21:02 PDT Jason McKesson via Std-Proposals
> wrote:
> > There's that phrase again: "obviously intended".
>
> This is the crux of the problem: if it were possible to know what the
> developer "obviously intended", we wouldn't have undefined behaviour. The
> compilers would always compile to what the developer "obviously intended".
>
> In the absence of mind-reading, there has to be a contract between
> developer
> and compiler. That's the standard: it defines that when the developer
> writes
> this, that happens. But it also sets certain boundaries, where the
> standard
> explicitly says "I won't make any determinations about what shall happen
> if
> you do this".
>
> What you're arguing for is that the standard should define all behaviours,
> at
> least under a certain language mode. You're not the first to argue this --
> search the mailing list for the term "optimisationists" and you'll find
> posts
> by another person whose name escapes me now who was arguing that our
> trying to
> extract the most performance out of the computers was hurting our ability
> to
> have safe code.
>
> I hear you both, I appreciate the problem and I do think we need to revise
> some of the recent changes, such as the start_lifetime_as(). You can see
> my
> post on the subject on why that one in particular should never have
> existed.
>
> But arguing for no UB at all is too steep of a hill to climb. In
> particular,
> you're arguing effectively for another language inside of C++. (Maybe you
> should give Rust a try)
>
> I'll give you two examples of where the standard explicitly leaves UB
> because
> it allows for "don't pay for what you don't need". And neither are signed
> integer overflow:
>
> 1) shift beyond the width of a type
>
> This is left undefined because processors have different behaviours when
> doing
> shifting, usually because shifting used to be very slow (1 cycle per shift
> count).. Some processors will mask off the number of bits in the count, so
> shifting a 32-bit integer by 32 is the same as shifting by 0. Other
> processors
> may not mask at all and would then shift the input out of the register,
> resulting in zero. Some others may not mask by the type's width but use
> the
> same machinery as a bigger type, so shifting by 32 would return 0, but
> shifting by 64 would be a no-op.
>
> x86 is actually all of the above. The SHL instruction on 8- and 16-bit
> registers uses 5 bits of the shift count, so shifting by 16 shifts
> everything
> out of those. But when you use 32- and 64-bit registers, it uses the
> type's
> width, so shifting by 64 is a no-op.
>
> 2) converting an out-of-range floating point number to integer
>
> This is also left undefined because different processors will do different
> things. Some may return the saturated maximum and minimum, some others may
> return a value in range that matches the original modulo the integer's
> width,
> some others may return a sentinel value indicating overflow (x86/SSE is
> the
> latter), whereas some others may set a flag indicating overflow and return
> garbage. And then add to this FP emulation in software.
>
> The latter discussion happened recently in the IETF CBOR mailing list,
> which
> made me look up what Rust does. It does define what the result shall be
> (saturated values, and zero for NaN), which is safe but requires more
> code.
> Compare: https://rust.godbolt.org/z/8h6Mhf885
>
> If your argument is "define what is today UB", then it requires more code
> and a
> ton of time by standard writers and compiler implementers. If your
> argument is
> "generate assembly with no assumptions", then the behaviour isn't
> portable. It
> might not be even inside of a single processor architecture, q.v. fused
> multiply adds, the extended precision x87 stack, and the DPPS/DPPD
> instructions differing in behaviour between Atom and Big Core lines.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel DCAI Cloud Engineering
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
process has been hijacked by compiler writers who have gratuitously
introduced undefined behavior as opposed to implementation-defined behavior
so that they can produce optimised code, at the expense of violating the
intentions of the programmers who wrote the program. I think the standard
is playing a "gotcha" game with programmers, trying every trick it can to
render their code invalid, under the very troubling assumption that a
program that quickly produces a result the programmer doesn't want is
better than a program that slowly produces a desired result.
A couple of years ago, as part of the contracts discussion (I worked for
Lakos at Bloomberg then), I was in violent disagreement with members of the
committee who insisted that accesses to automatic volatile variables did
not have to be treated as side effects and could be elided.
Incrementing a signed integer variable translates to some underlying
machine operation; instead of calling the increment of a variable
containing the maximum signed value undefined behavior, it should result in
whatever number the underlying operation produces. (This is not a
particularly strange notion; the original K&R book specified that C worked
in just this way.)
Accessing through a pointer should treat the memory as if it were an object
of the type the pointer points to, regardless of what the compiler can
figure out about how the memory got that way.
An uninitialized automatic variable should be treated as if it has some
fixed but unspecified value, which will be whatever bits it happens to
contain.
Order of evaluation should be strictly left-to-right for all expressions.
The whole notion that executing undefined behavior makes the behavior of
the entire program unspecified, and therefore allows compilers to pretend
that undefined behavior is never executed, is completely wrongheaded. At
the very least, it should be required that all side effects that would have
been encountered by the abstract machine ahead of the undefined behavior
must occur. The best thing would be to incorporate Ada's notion of "bounded
errors", where many things now specified as undefined behavior would become
defined to produce a range of possible outcomes, with the program
continuing to execute in most cases.
I would agree that you cannot use a pointer into a static or automatic
object to access memory outside the complete object, nor should a pointer
value that exists prior to a function being called be considered to
possibly point into the automatic variables of that function.
On Sun, Aug 27, 2023 at 12:04 AM Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Saturday, 26 August 2023 17:21:02 PDT Jason McKesson via Std-Proposals
> wrote:
> > There's that phrase again: "obviously intended".
>
> This is the crux of the problem: if it were possible to know what the
> developer "obviously intended", we wouldn't have undefined behaviour. The
> compilers would always compile to what the developer "obviously intended".
>
> In the absence of mind-reading, there has to be a contract between
> developer
> and compiler. That's the standard: it defines that when the developer
> writes
> this, that happens. But it also sets certain boundaries, where the
> standard
> explicitly says "I won't make any determinations about what shall happen
> if
> you do this".
>
> What you're arguing for is that the standard should define all behaviours,
> at
> least under a certain language mode. You're not the first to argue this --
> search the mailing list for the term "optimisationists" and you'll find
> posts
> by another person whose name escapes me now who was arguing that our
> trying to
> extract the most performance out of the computers was hurting our ability
> to
> have safe code.
>
> I hear you both, I appreciate the problem and I do think we need to revise
> some of the recent changes, such as the start_lifetime_as(). You can see
> my
> post on the subject on why that one in particular should never have
> existed.
>
> But arguing for no UB at all is too steep of a hill to climb. In
> particular,
> you're arguing effectively for another language inside of C++. (Maybe you
> should give Rust a try)
>
> I'll give you two examples of where the standard explicitly leaves UB
> because
> it allows for "don't pay for what you don't need". And neither are signed
> integer overflow:
>
> 1) shift beyond the width of a type
>
> This is left undefined because processors have different behaviours when
> doing
> shifting, usually because shifting used to be very slow (1 cycle per shift
> count).. Some processors will mask off the number of bits in the count, so
> shifting a 32-bit integer by 32 is the same as shifting by 0. Other
> processors
> may not mask at all and would then shift the input out of the register,
> resulting in zero. Some others may not mask by the type's width but use
> the
> same machinery as a bigger type, so shifting by 32 would return 0, but
> shifting by 64 would be a no-op.
>
> x86 is actually all of the above. The SHL instruction on 8- and 16-bit
> registers uses 5 bits of the shift count, so shifting by 16 shifts
> everything
> out of those. But when you use 32- and 64-bit registers, it uses the
> type's
> width, so shifting by 64 is a no-op.
>
> 2) converting an out-of-range floating point number to integer
>
> This is also left undefined because different processors will do different
> things. Some may return the saturated maximum and minimum, some others may
> return a value in range that matches the original modulo the integer's
> width,
> some others may return a sentinel value indicating overflow (x86/SSE is
> the
> latter), whereas some others may set a flag indicating overflow and return
> garbage. And then add to this FP emulation in software.
>
> The latter discussion happened recently in the IETF CBOR mailing list,
> which
> made me look up what Rust does. It does define what the result shall be
> (saturated values, and zero for NaN), which is safe but requires more
> code.
> Compare: https://rust.godbolt.org/z/8h6Mhf885
>
> If your argument is "define what is today UB", then it requires more code
> and a
> ton of time by standard writers and compiler implementers. If your
> argument is
> "generate assembly with no assumptions", then the behaviour isn't
> portable. It
> might not be even inside of a single processor architecture, q.v. fused
> multiply adds, the extended precision x87 stack, and the DPPS/DPPD
> instructions differing in behaviour between Atom and Big Core lines.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel DCAI Cloud Engineering
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2023-08-28 07:13:05