ISOCPP liaison List: Re: [wg14/wg21 liaison] Semantics of angelic nondeterminism

From: Martin Uecker <ma.uecker_at_[hidden]>
Date: Thu, 18 Apr 2024 19:22:51 +0200

Am Mittwoch, dem 17.04.2024 um 15:18 -0600 schrieb Davis Herring:.

..
> > >

> > I
> > claim that we want it to be well-defined (although there's no guarantee
> > which branch is taken) in isolation because z might be the old pointer
> > stored in an atomic, but that
> >
> > int use_zombie(int *z) {
> > int *const p=(int*)malloc(sizeof *p);
> > *p=0;
> > *z=1;
> > const int ret=*p;
> > free(p);
> > return ret;
> > }
> >
> > can be reduced to
> >
> > int use_zombie(int *z) {
> > *z=1;
> > return 0;
> > }
> >
> > because z must point to something that exists independently of the
> > malloc (at an unknown address) and so must not alias p.

It is well defined with TS 6010.
Note that this is also true for:

int use_zombie(uintptr_t zi) {

int *const p = malloc(sizeof *p);
*p=0;
*(int*)zi = 1;
const int ret=*p;
free(p);
return ret;
}

Here, the transformation is allowed because p has a fresh
provenance which is never exposed.

On the other hand, we do *not* want the optimization to be
allowed in general for the following example.

int use_zombie(uintptr_t zi) {

int *const p = malloc(sizeof *p);
*p=0;
if ((uintptr_t)p == zi)
*(int*)zi = 1;
const int ret= *p;
free(p);
return ret;
}

The validity of such an optimization in this can not be
derived using the model in TS 6010.

It can be derived using indeterminism in a closed world
model where one could assume that a compiler demonically
moves away the allocation to a different address to make the
comparison always fail. But this is where I argue that
we do not want this in C, for several reasons:

1.) It does not buy us anything useful. In fact, it
reduces the expressiveness of the language.

2.) It makes it hard to reason about code.

3.) Most importantly, it relies on a closed world model
which is not appropriate for a systems programming language.

> >
> > > > I does not have to, but it seems obvious to me that it is
> > > > much more useful when it is.
> >
> > Certainly a specification that maps more directly onto actual
> > implementation operations is easier to deal with, but I am not at all
> > sure that there is such a "direct specification" available here.

What you mean by "here"?

> > Already the Provenance TS has to talk about pointers with a
> > superposition of provenances, and that seems consistent with the
> > effective type approach, but I suspect that attempting to formally
> > define those notions in the context of multiple threads will require
> > specification gymnastics similar in amount (if not kind) to those in
> > P2434 since there might not be any happens-before edges between the uses
> > of such pointers on multiple threads. Extending it thereafter to
> > support the pointer-zap case will not improve things.

To me pointer-zap and multiple threads all seem consistent
with a direct specification technique where we only make
specific operations UB and never quantify over all executions.
Where do you see any problems?

> >
> > > > Indeed, C does not have this. We want to be able to
> > > > reason about executions where flip() returns 1 and we also
> > > > want to be able to reason about executions where flip() returns
> > > > 2 until the point the operation that has UB is encountered.
> >
> > I think this point is best interpreted as being the motivation for your
> > WG14:N3128, and indeed you wrote to me a year ago (sorry I never got
> > back to it until now!) suggesting that my std::observable (from P1494)
> > might be more profitably considered implicit in every library I/O
> > function (recalling that that set includes fflush rather than printf).
> > I plan to present a revision of P1494 that includes that idea, as it
> > indeed seems that the value of the paper might be realized without the
> > syntax. That might make the distinction between the different models of
> > undefined behavior a question of phrasing rather than substance, which
> > is an improvement.

Yes, it is certainly mostly a question of phrasing (but see above).

The key insight is that almost all I/O is hidden behind function calls
that can not be (completely) inlined anyway because they do system
calls. And in C and C++ for function calls a compiler currently must
assume that it may not return to the caller. This automatically ensures
that UB that is triggered after the call can not in any way affect
previous observable behavior.

> >
> > > > I do not understand why WG21 goes down the path of essentially
> > > > producing a specification that gives "no semantics" to almost
> > > > any real world program, while seemingly providing (almost)
> > > > no advantages in terms of optimization because compilers
> > > > can not reason globally about all paths of executions anyway.
> >
> > Is it meaningfully different to give semantics to such programs but only
> > up to the first undefined operation?

I would say so. Think about controlling a machine: It gets easier
to prove that it gets safely shut down, when the the later error
in a diagnostic routine can affect this anymore. Or think about
a program that starts with high priveledges which are later
dropped. You can then prove useful security properties without
having to prove that the full program has no UB, which may not
even be feasible! Similar issues if you have a program that
forks where UB in the child could affect the parent. Another
example is debugging where you could make reliable inferences
from externally observing some state transition.

> >
> > > > In particular, it makes the specification useful even in cases where
> > > > some component is involved that does not have a full formal
> > > > specification and where it is not clear what "game" this component
> > > > would play with some angels and demons in the language spec, i.e. in
> > > > 99% of all real world applications of C.
> > I don't think it's actually any more clear in the absence of "quantifier
> > semantics" what happens when you call a function whose specification is
> > imprecise: it might call abort, or open an arbitrary file, or anything.
> > That's not an impediment to practical work getting done, of course, and
> > that means it's also not an impediment to precise specification of what
> > we can specify.
> >

The point is that you can make general assumptions about other parts
of the system (which could be hardware, or some remote system) and
then prove something about the local part conditional on certain
preconditions. For this you need to be able to state the preconditions
at a certain interface boundary. This is much harder if you do
not have a specification language that expresses the valid states
of the abstract machine in terms of local properties (e.g. the
pointer has provenance X) but only in terms of global validness
of a program when quantifying over all possible executions.

Martin

Received on 2024-04-18 17:22:55