sg12: Re: [ub] C provenance semantics proposal

From: Peter Sewell <Peter.Sewell_at_[hidden]>
Date: Wed, 10 Apr 2019 21:44:45 +0100

On 10/04/2019, Martin Sebor <msebor_at_[hidden]> wrote:
> On 4/9/19 5:41 AM, Niall Douglas via cl-C-memory-object-model wrote:
>>> continuing the discussion from last year at EuroLLVM, the GNU Tools
>>> Cauldron,
>>> and elsewhere, we (the WG14 C memory object model study group) now
>>> have a detailed proposal for pointer provenance semantics, refining
>>> the "provenance not via integers (PNVI)" model presented there.
>>> This will be discussed at the ISO WG14 C standards committee at the
>>> end of April, and comments from the community before then would
>>> be very welcome. The proposal reconciles the needs of existing code
>>> and the behaviour of existing compilers as well as we can, but it
>>> doesn't
>>> exactly match any of the latter, so we'd especially like to know whether
>>> it would be feasible to implement - our hope is that it would only
>>> require
>>> minor changes.
>>
>> Firstly thank you for the enormous amount of work that you have done on
>> this topic. I am currently at the ACCU conference. Myself, Guy, Herb and
>> Michael Wong are hoping to arrange a meeting whilst here to discuss your
>> proposals such that I can take an informed opinion to the WG14 meeting
>> end of this month.
>>
>> On the papers themselves, for me personally, I found them a bit too
>> language-theory-centric. I think everybody gets that aliasing
>> inconsistency ought to get standardised, however I notice a visceral
>> reaction amongst attendees here to the notion that this part of the
>> language is going to get fiddled with. In other words, I think that this
>> part of the language gives people a lot of fear and doubt, and that
>> emotional reaction may impede the progress of these reforms at WG21
>> which is, after all, many times larger than WG14, and thus far harder to
>> achieve consensus at.
>>
>> Personally speaking, I find that what programmers particularly hate is
>> the ground shifting beneath them without realising it. They especially
>> hate code which worked, and then subtly stops working in non-obvious
>> ways on later CPUs and toolchains. They would FAR prefer if a newer
>> compiler refused to compile code which might not behave as it did on an
>> earlier compiler, even if that meant increased upgrade costs.
>>
>> To that end, if your papers spoke more about how compilers might refuse
>> to compile problem pieces of code found in the real world, with example
>> potential diagnostic messages, rather than reduced examples showing UB
>> vs DB, I think that would help "sell" the value vs fear in these
>> proposals.
>
> A lot of effort has been going into detecting these sorts of bugs
> at compile time, but it's not possible to find all of them. In
> fact, those that are behind some of hardest cases of "working"
> code ceasing to work after a compiler upgrade and can be impossible
> to detect statically. Besides false negatives, their detection is
> also almost always subject to false positives, so issuing errors
> is not appropriate (at least not without something like -Werror).
>
> There's also a catch 22 here. Detecting many of these kinds of
> bugs relies on the same kind of analyses the compiler has to do
> to optimize code. These analyses are based on the requirements
> the standard imposes on conforming programs. Making a language
> change to relax some of these requirements would prevent
> the optimizations that depend on them. There would then be
> little reason for the analyses to exist, and, in many cases,
> no justification for issuing diagnostics. Yet in many of these
> same cases the previously undefined but now sanctioned code
> would still be wrong.
>
> Let me give an example:

Just to clarify: this example is talking about the ISO "pointer
lifetime-end zap"
semantics. These provenance notes leave that unchanged; but there is a
separate proposal that discusses whether this semantics is consistent
with the code out there (especially concurrent algorithms).

> void f (void*, void*);
>
> void g (void)
> {
> void *p, *q;
> { p = (int[2]){ 1, 2 }; }
> { q = (int[4]){ 3, 4 }; }
>
> f (p, q);
> }
>
> The call to f() above is undefined because p and q are
> indeterminate (the lifetimes of the compound literals they
> pointed to ended at the curly closing the nested blocks). This
> justifies issuing a diagnostic that could prevent a hard to find
> bug if f() dereferenced the objects pointed to by the pointers,
> as it in all likelihood does.
>
> But if the rule about reading indeterminate pointers were to
> change to make it valid (as is being proposed elsewhere), this
> kind of a bug could no longer be diagnosed. A compiler would
> need to see the pointer dereferenced in order to diagnose it.
>
> I chose the example above because it illustrates the concern
> you bring up. Current GCC releases hang on to the memory
> allocated for compound literals until the function returns
> (as if it were allocated by alloca), so the two objects do not
> overlap and retain their values until the pointers go out of
> scope. But GCC 9 sometimes deallocates compound literals at
> the end of the enclosing block, so with it, the values of
> the arrays pointed to p and q are not necessarily distinct.
> (GCC bug 89990 describes where this has already caused
> trouble.)
>
> Martin
>

Received on 2019-04-10 22:44:46