sg12: Re: [ub] C provenance semantics proposal (without pointer equality)

From: Peter Sewell <Peter.Sewell_at_[hidden]>
Date: Tue, 16 Apr 2019 12:17:18 +0100

>> >> The integer-to-pointer cast *could* be left ambiguous until it's
>> >> resolved - but that allows other strange things, eg it can be
>> >> (eventually) resolved to an object that didn't even exist earlier.
>> >> The temporal view seems (I hope :-) quite easy for programmers to
>> >> understand.
>> >>
>> >
>> > I think it's surprising either way -- either you find that a cast from
>> > integer to pointer has an execution side-effect, and it matters where you
>> > write it, or you find that you can cast an integer into a pointer to an
>> > object that doesn't yet exist at the point of the cast.
>>
>> true
>>
>> > I think the former is surprising for practicing programmers, whereas the
>> > latter is likely only surprising for language lawyers.
>>
>> hmm - the latter would be annoyingly complex to specify (like the "udi"
>> part of PNVI-ae-udi but worse, as one gradually accumulates constraints
>> on what the result of such a cast might be pointing to, based on
>> what arithmetic is done to the pointer). It's hard to imagine that becoming
>> widely understood...
>
>
> Under p0593r3 (recently approved by WG21's Evolution Working Group and on its way towards C++20), a similar "pick the answer that gives the program defined behavior" rule will already be in use in C++ for other cases. That's not to say that that means it'll be understood, but we will at least have precedent.

I'm very uneasy with that precedent, I'm afraid. It's ingenious, and
it's easy to write in words, but it'd be very hard to work with in any
precise way - for compilers, or analysis tools, or
executable-as-test-oracle semantics, or (eventually :-) compiler
correctness proof. The added quantification, and the need to look at
a potentially enormous number of complete future executions to check
which answers to a local question are allowed, are both really
awkward.

>> > Moreover, I'd expect
>> > the latter to be the model that implementations actually use,
>>
>> Can you expand on that? Not sure I understand.
>
>
> Here's an example in C++ where implementations might care about which object a pointer points to:
>
> struct A { virtual void f() = 0; };
> struct B : A { void f() override; };
> struct C : A { void f() override; };
> alignas(B, C) std::byte storage[std::max(sizeof(B), sizeof(C))];
> A *p = new (storage) B;
> intptr_t n = (intptr_t)p;
> A *q = (A*)n; // #0
> new (storage) C;
> q->f(); // #1
> q->f(); // #2
>
> The C++ rules allow us to assume that the two q->f() calls call the same function: the same pointer value is used, so the pointer must point to an object with the same dynamic type. This permits us to remove a redundant vptr load in line #2.

got it, thanks

> Under the temporal model, an implementation could go further and prove that there's a B object within its lifetime at the point of the integer-to-pointer cast in line #0, and thereby decide that #1 and #2 both call B::f instead of C::f. I'm suggesting that implementations aren't going to do that, and that instead they'll only make the more conservative assumption that #1 and #2 load the same vptr value without assuming what that value is.
>
> More broadly, I would expect the implementation model actually used for pointer-to-int conversions and int-to-pointer conversions to be based on (a conservative approximation to) determining on which pointer-to-integer conversions a given integer-to-pointer conversion is value- or control-dependent, rather than giving either conversion side-effects in the intermediate representation.

I'd likewise expect implementations to be doing something quite
conservative w.r.t. PNVI-ae-udi. In an ideal world we'd have some
correctness proofs or (at least) testing of current alias analysis
with respect to it. No-one has done anything like that, as far as I
know, but it's worth thinking of.

best,
Peter

Received on 2019-04-16 13:17:30