ISOCPP liaison List: Re: [wg14/wg21 liaison] UB

From: Martin Uecker <ma.uecker_at_[hidden]>
Date: Fri, 26 May 2023 07:15:18 +0200

Am Donnerstag, dem 25.05.2023 um 21:09 +0000 schrieb Cranmer, Joshua:
> I've spent some time thinking about this, and my current conclusion is that there are some things about the proposal that just gives me bad vibes, even though I don't have specific examples I can give to back it up.
>
> As most of C and C++ are defined (albeit not formally) in an operational semantics manner, most of the UB we come across is similarly defined essentially operationally. Dereference a null pointer? UB. Read an uninitialized value? UB. Signed integer overflow? UB--all of the UB is defined in terms of an operational step that does not complete but instead produces UB. And this kind of UB is the one that is largely being considered when talking about this proposal.

Yes.
>
> But this is not the only kind of UB.

At least in C, it is almost all of the UB. Why? Because this is the traditional
way of thinking about UB! Run-time operations were made UB, because the
hardware could trap or or crash etc., not for optimization. So this originally
had a concrete interpretation. The few cases you mention below were all introduced
later after some more extreme concept of UB became more widely accepted. But
I think this makes all the old UB much more dangerous then originally, and
also makes the text not coherent anymore. So if we need this more extreme
kind of UB, I think we should exclusively use it for such relatively new things
where we can not avoid it, and not the millions of other things. (But I do not
think we need it for anything! Not sure about C++ though...) And we should
then not call it UB (as in "this operation has UB"), but just call the program
invalid or something.

> Some of the semantics are defined in a more axiomatic semantics manner. The most salient example is data races. Essentially, the memory model of C and C++ generally pretends to be sequentially consistent, which is realizable if code uses proper synchronization, and code that doesn't do that properly is UB. There's already been discussion of data races, so I won't cover it in detail. But the idea of just declaring the UB to happen at the moment that the conflicting access happens to cause the data race doesn't feel sufficient to me; I can't convince myself that there's no way that the reordering couldn't cause observable changes before the data race happens.

The argument with the function calls does not convince you? It basically
rules out that for non-volatile I/O compilers do reorder anything even
with the abstract interpretation. And this argument applied to data
races as well.

Also there seem to be good arguments why a concrete notion may actually
be necessary if one ever wants to fix the problems we have now in
the semantics for concurrent programs (e.g. the Hans paper)

> Others have already picked up on other cases where we have axiomatic rather than operational UB.

In C I know about infinite loops, which can easily be defined operationally.
I know C++ has a more abstract concept of forward progress, which may not fit
well with the idea of concrete UB. Are there others?

>
> Overall, I think the basic sentiment that a "store volatile" that is postdominated by a known-UB statement (in LLVM terms, the `unreachable` instruction) shall not be deleted by the optimizer is a reasonable suggestion.

People using volatile stores tell us that this is very important. But
the important thing to understand is that it is not possible to
even formulate such things as long as you also have the notion of abstract
UB that takes all semantics away. We have seen similar issues
with memset_explicit.

> I am somewhat less convinced that the "backwards time-travel" of inferred properties in such situations is reasonable, but I am not prepared to definitively rule such a rule out.
>
Note that an optimizer can always backwards time-travel inferred properties
if it can prove that this does not change the obervable behavior, i.e. in
this case the value that is stored.

> However, I don't think this sentiment is expandable to fully cover *all* kinds of UB; it's something that has to be limited to a smaller set of UB (although this probably covers most of the UB that people are worrying about!).

Yes, but then let's turn this around: If it turns out we need
such an abstract notion of UB that makes programs completely
invalid, then let's limit it to cases were it is really needed
and not everything which is now explicitely or implicitely UB.

> The wording proposed in WG14::N3128 feels insufficient to me; perhaps we need to actually provide two different UB-like concepts, and go through all existing uses of the term to work out which one is appropriate in every circumstance.

Yes, but let's figure out for what we would actually need the
abstract UB. In C, there is currently nothing which convinces
me that we need it at all. For C++ this needs to be looked at.

Martin

Received on 2023-05-26 05:15:22