ISOCPP liaison List: Re: [wg14/wg21 liaison] Semantics of angelic nondeterminism

From: Martin Uecker <ma.uecker_at_[hidden]>
Date: Sat, 13 Apr 2024 10:03:29 +0200

Am Freitag, dem 12.04.2024 um 17:14 -0500 schrieb Davis Herring:
> > It is very much an essential feature of C that one can
> > reason locally. Compilers know how to reason locally,
> > because in many cases they do not see the full program.
>
> Obviously it is impossible to reason completely locally: a call to a function as simple as
>
> int range(const int *a,size_t n) {return a[n-1]-a[0];}
>
> might have defined or undefined behavior based on the size of the array into which a points (let alone more exotic possibilities like a pointing too close to the end or having the wrong
> type). The local reasoning applied by compilers is exactly that which (they can prove) would apply for any program containing a call to the function in question that has defined behavior.
> The compiler does not see the full program, but its reasoning quantifies over not only all potential executions of a program but also all programs that contain the portion under
> examination.

Yes, but we can describe the properties of the arrays as
preconditions in terms of the state of the abstract system
on function entry.

>
> > Almost all C programs are also not strictly conforming,
> > but have undefined behavior. Their practical meaning is
> > inferred by reasoning about individual parts.
>
> Well, "strictly conforming" and "undefined behavior" are not exhaustive, but maybe that's not the point.
>
> Of course global practical meanings are inferred from local (practical) meanings, but practical meanings local and global are also inferred from known behaviors of practical implementations
> that are not specified by any standard. I don't think the necessity of invoking such knowledge in many cases is a reason to suppose that a standard specification must be directly analogous
> to it (despite the resulting difficulty in representing multiple implementation possibilities).

I does not have to, but it seems obvious to me that it is
much more useful when it is.

>
> > A specification strategy that fundamentally relies on
> > quantifying over all executations of a program to decide
> > whether the whole program has defined behavior or not,
> > seems not very useful to me.
>
> I see neither how that inutility follows from a notion of local reasoning nor what available specification strategy avoids such reliance. Consider the ill-advised C program
>
> int main() {
> int a[2]={0};
> return a[flip()]; // flip from the previous example
> }
>
> The possible observable behaviors of this program are that it returns 0 (having accessed a[1]) or has undefined behavior (having attempted to access a[2]). Of course, that set of
> observable behaviors is identical to that of a program that unconditionally exhibits undefined behavior, since C says that "this document imposes no requirements" and so "the program
> returns 0" does not extend the set. C lacks the explicit admonition on this subject that in C++ is [intro.abstract]/5:
>
> However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to
> operations preceding the first undefined operation).

Indeed, C does not have this. We want to be able to
reason about executions where flip() returns 1 and we also
want to be able to reason about executions where flip() returns
2 until the point the operation that has UB is encountered.

>
> This omission seems consistent with my understanding that WG14 and WG21 have not always agreed about the permission for an implementation to optimize a program under the assumption that
> undefined behavior never occurs.

I do not understand why WG21 goes down the path of essentially
producing a specification that gives "no semantics" to almost
any real world program, while seemingly providing (almost)
no advantages in terms of optimization because compilers
can not reason globally about all paths of executions anyway.

> Nonetheless, I cannot avoid the conclusion that an unspecified choice between defined and undefined behavior is undefined behavior, and therefore that the status of a program having
> defined behavior inevitably depends on the set of all possible executions. (All of this is continuing in the tradition of ignoring the possibility of input received mid-execution that
> influences whether a program exhibits undefined behavior. One thing at a time.)
>

I do not see why you cannot avoid the conclusion. We can simply
say "once an operation which has UB is encountered, the standard
allows arbitrary behavior." This eliminates a lot of problems
and is far simpler to understand and to reason about. In
particular, it makes the specification useful even in cases
where some component is involved that does not have a full
formal specification and where it is not clear what "game" this
component would play with some angels and demons in the
language spec, i.e. in 99% of all real world applications of C.

Martin

Received on 2024-04-13 08:03:33