ISOCPP liaison List: Re: [wg14/wg21 liaison] UB

From: Cranmer, Joshua <joshua.cranmer_at_[hidden]>
Date: Tue, 30 May 2023 22:12:49 +0000

> > Clearly then we would have to update the standard in all places where
> > it claims explicit UB and change the language to "fault".
>
> One problem is that not all UB in the standard are faults. Some are extension
> points or simple cases where we do not care exactly how the
> implementation behaves, but we expect the program to still work. So one
> would also need a new term for such things. Also one would need to go over
> the whole standard and classify things into different categories.

Someone going through each existing usage of UB to verify what kind of bucket it falls into, and if it makes sense for this kind of change is in any case necessary, I think.

The way I see things, there are perhaps 5 categories of UB:

* Intentional extension points for implementors (this is more common in C than in C++, I think). It's never been entirely clear to me why these end up being UB instead of implementation-defined, and I think it is long overdue that we pull these things out of UB so that it doesn't confuse users as much.

* May-trap operations. Self-explanatory, I think, at least for people who sit on standards committees (many users may not be fully educated on the insanity that can happen on traps, but that's a user education problem, not a specification problem).

* What LLVM calls poison values. The best motivating example in the C/C++ specification is uninitialized value, but LLVM's definition is much looser. In LLVM, it is not UB to construct a poison value (so reading uninitialized memory is defined in LLVM IR terms!), but a poison value will tend to cause UB if used as a branch condition or a pointer value or other places where values are capable of causing may-trap. There is an instruction to prevent further propagation of poison, but this hasn't been exposed in frontends yet that I'm aware of. For the purposes of your proposed concrete UB rule, I think it makes sense to define the production of a poison value as the point at which the "fault" occurs.

* Violation of explicit compiler optimization hints (e.g., noreturn, restrict). Given that most of these (especially when you get into compiler extensions) tend to be for describing how to optimize calls to external functions, it seems that defining the function invocation as the point of "fault" is tenable.

* Statements of global correctness of code: e.g., data races, strict aliasing, pointer provenance, forward progress. These are the ones that take a lot more thought about if a "fault" point can be clearly identified for these cases that isn't ruined by reasonable optimizations, and I'm definitely not prepared to say that it can be done for everything in this category that we know of, and even less prepared to say that it can always be categorically done.

As a more minor naming issue, I'm not entirely satisfied with the name "fault" for the issue here, since it strikes me as something that would tend to imply it's the point of a hardware trap, whereas the "fault" associated with producing a poison value can occur at some distance from the actual trap, and we absolutely want to have the weird effects of UB kick in after the "fault" but before the trap.

Received on 2023-05-30 22:12:57