ISOCPP liaison List: Re: [wg14/wg21 liaison] Semantics of angelic nondeterminism

From: Martin Uecker <ma.uecker_at_[hidden]>
Date: Sat, 13 Apr 2024 11:54:43 +0200

Am Samstag, dem 13.04.2024 um 11:12 +0200 schrieb Jens Maurer:
>
> On 13/04/2024 10.03, Martin Uecker via Liaison wrote:
> > Am Freitag, dem 12.04.2024 um 17:14 -0500 schrieb Davis Herring:
>
> > > I see neither how that inutility follows from a notion of local reasoning nor what available specification strategy avoids such reliance. Consider the ill-advised C program
> > >
> > > int main() {
> > > int a[2]={0};
> > > return a[flip()]; // flip from the previous example
> > > }
> > >
> > > The possible observable behaviors of this program are that it returns 0 (having accessed a[1]) or has undefined behavior (having attempted to access a[2]). Of course, that set of
> > > observable behaviors is identical to that of a program that unconditionally exhibits undefined behavior, since C says that "this document imposes no requirements" and so "the program
> > > returns 0" does not extend the set. C lacks the explicit admonition on this subject that in C++ is [intro.abstract]/5:
> > >
> > > However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to
> > > operations preceding the first undefined operation).
> >
> > Indeed, C does not have this. We want to be able to
> > reason about executions where flip() returns 1 and we also
> > want to be able to reason about executions where flip() returns
> > 2 until the point the operation that has UB is encountered.
>
> Not everybody seems to share that view of the status quo of C:
>
> https://blog.regehr.org/archives/213
> quote
>
> "If any step in a program’s execution has undefined behavior,
> then the entire execution is without meaning. This is important:
> it’s not that evaluating (1<<32) has an unpredictable result,
> but rather that the entire execution of a program that evaluates
> this expression is meaningless. Also, it’s not that the execution
> is meaningful up to the point where undefined behavior happens:
> the bad effects can actually precede the undefined operation."
>
> It seems what is explicitly specified in C++ ("not even with
> regard to operations preceding the first undefined operation")
> is believed to be implied by some readers of the C standard.
> Thus, if you want to make C have the "until the UB point"
> smeantics, some changes to C are desirable.

There is certainly a lot of confusion as this can not be
derived from the wording in the C standard. Is not how most
C programmers read the C standard, or how C was traditionally
interpreted by compiler writers, e.g:

"The problem is there may be observable side effects on the *0 path
between the test and the actual *0 -- including calls to nonreturning
functions, setjmp/longjmp, things that could trap, etc. This case is
similar. We can't back-propagate the non-null status through any
statements with observable side effects."

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590757.html

In any case, the UB study group of WG14 unanimously decided that
C should have more concrete interpretation of UB.

>
>
> Interestingly, neither gcc nor clang seem to purge the null pointer
> check in this case, for either C or C++:
>
>
> int x = 1;
>
> void f(int *p)
> {
> if(p) x = 0;
> *p = 42;
> }
>
>
> Although that would seem to be one of the easier cases
> where implementations could exploit UB back-propagation.

Indeed, probably because this would break real-world code
that relies on *p trapping.

>
> Are there some specific examples where that happens?

Yes, for example

https://godbolt.org/z/ccjaTvx6T

Which would be ok according to the "as if" rule also in C.
I have examples for GCC and clang too somewhere.

But note that MSVC was traditionally very aggressive with
such optimization to the extend of being buggy:

https://developercommunity.visualstudio.com/t/Invalid-optimization-in-CC/10337428?q=muecker

>
> > > This omission seems consistent with my understanding that WG14 and WG21 have not always agreed about the permission for an implementation to optimize a program under the assumption that
> > > undefined behavior never occurs.
> >
> > I do not understand why WG21 goes down the path of essentially
> > producing a specification that gives "no semantics" to almost
> > any real world program, while seemingly providing (almost)
> > no advantages in terms of optimization because compilers
> > can not reason globally about all paths of executions anyway.
>
> You seem to be assuming that "almost any real world program"
> encounters undefined behavior somewhere during a particular
> execution.

Yes, e.g. already by calling a non-standard API.

But even if you exclude this (assuming those are defined
elsewhere), many real world programs rely on undefined
or unspecific behavior having platform specific behavior.

> Let's assume that's true. I'm not seeing why
> the resulting potential exploitation of the undefined behavior
> by the optimizer after the undefined behavior situation is
> categorically less concerning than any potential back-propagation.

It is also concerning. But my point is how easy or difficult we
make it to complement our specification of the language with
additional assumptions about external components and/or platform
specific behavior to make sense of real-world programs.

And maybe also how hard we make it to for ourselves to later
have standard extensions related to UB (e.g. trapping for
out-of-bounds).

> In the real world, calls to unknown functions and syscalls
> are quite sturdy optimization barriers, but it's hard to
> rely on that in the specification because of LTO and other
> similar whole-program approaches.

Yes, i am not proposing that unknown functions get special
treatment. I just observe that existing C and C++ compilers
evolved in a world where they process only parts of a program.
So whole-program semantics are less natural and also useless
to them, because they have to work in scenarious with partial
information. This is also still true with LTO.

In particular, most observable behavior is hidden behind
a function-call ABI and compilers tend to not make any special
assumptions about it. This optimizations barrier *naturally*
already gives as the C semantics of UB for such observable
behavior and *not* the much more rigrous UB semantics.

So why not use this as the basis for the spec?

Martin

>
> Jens

Received on 2024-04-13 09:54:49