sg12: Re: [ub] C provenance semantics proposal

From: Peter Sewell <Peter.Sewell_at_[hidden]>
Date: Wed, 10 Apr 2019 16:34:31 +0100

Hi Niall,

On Wed, 10 Apr 2019 at 16:13, Niall Douglas <s_sourceforge_at_[hidden]> wrote:
>
>
> >> N2363 C provenance semantics: examples.
> >> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
> >> This explains the proposal and its design choices with discussion of a
> >> series of examples.
> >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >
> > It seems to me that most of the introductory examples have clear
> > answers in current C++. In particular:
> >
> > — If one pointer represents the address of a complete object, and another pointer represents the address
> > one past the last element of a different complete object, the result of the comparison is unspecified.
> >
> > The proposal in N2362 seems to make such a comparison yield "true".

It's important to distinguish the effect of provenance on alias analysis and
what accesses are allowed (which is the main point) and the effect
of provenance on equality comparison (which one might choose one
way or the other).

> > I actually like our current rule: never ever compare / subtract pointers
> > to different complete objects. If you must compare, use std::less.
>
> I'm just finishing off my ACCU talk there right now, and thus am finally
> knee deep into what the C provenance papers actually propose.
>
> To date, the C proposals appear to be a weaker variant of what we
> already have in C++ 20, though much depends on interpretation.
>
> In the situation above, I don't think it is correct that N2362 proposes
> to make the comparison of one past a storage instance to the subsequent
> storage instance always "true". Rather, the one-past pointer would have
> "ambiguous provenance" which becomes deambiguated by the programmer
> using the ambiguous provenance pointer in one way or another.

No, a straightforward one-past pointer keeps its original provenance.
Ambiguous provenance only occurs when one tries to (eg)
cast a one-past integer to a pointer.

> So for
> this code fragment:
>
> int x, y;
> int *p = &x + 1, *q = &y;
> if(0 == memcmp(&p, &q, sizeof(p)))
> assert(p == q);

If you want to check this kind of thing, you can try it in the
Cerberus web interface. Here:

https://cerberus.cl.cam.ac.uk/#%7B%220%22%3A%7B%221%22%3A%220%22%2C%222%22%3A%220%22%2C%223%22%3A%221%22%2C%22t%22%3A%220%22%2C%22p%22%3A%221%22%2C%22blockedPopoutsThrowError%22%3A%220%22%2C%22closePopoutsOnUnload%22%3A%220%22%2C%22showPopoutIcon%22%3A%221%22%2C%22showMaximiseIcon%22%3A%220%22%2C%22showCloseIcon%22%3A%220%22%2C%22responsiveMode%22%3A%22onload%22%2C%22tabOverlapAllowance%22%3A0%2C%22reorderOnTabMenuClick%22%3A%220%22%2C%22tabControlOffset%22%3A10%7D%2C%224%22%3A%7B%225%22%3A5%2C%226%22%3A10%2C%227%22%3A150%2C%228%22%3A20%2C%229%22%3A300%2C%22u%22%3A15%2C%22a%22%3A200%7D%2C%22b%22%3A%7B%22c%22%3A%22Close%22%2C%22d%22%3A%22Maximise%22%2C%22e%22%3A%22Minimise%22%2C%22f%22%3A%229%22%2C%22popin%22%3A%22pop%20in%22%2C%22tabDropdown%22%3A%22additional%20tabs%22%7D%2C%22g%22%3A%5B%7B%22l%22%3A%222%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22g%22%3A%5B%7B%22l%22%3A%223%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22k%22%3A50%2C%22g%22%3A%5B%7B%22l%22%3A%224%22%2C%22m%22%3A50%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22s%22%3A0%2C%22g%22%3A%5B%7B%22l%22%3A%225%22%2C%22h%22%3A%22source%22%2C%22o%22%3A%22example.c%22%2C%22n%22%3A%221%22%2C%22t%22%3A%220%22%7D%5D%7D%2C%7B%22l%22%3A%224%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22m%22%3A50%2C%22s%22%3A0%2C%22g%22%3A%5B%7B%22l%22%3A%225%22%2C%22h%22%3A%22tab%22%2C%22i%22%3A%7B%22tab%22%3A%22Console%22%2C%22h%22%3A%22tab%22%7D%2C%22o%22%3A%22Console%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%7D%5D%7D%5D%7D%2C%7B%22l%22%3A%224%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22k%22%3A50%2C%22s%22%3A0%2C%22g%22%3A%5B%7B%22l%22%3A%225%22%2C%22h%22%3A%22tab%22%2C%22i%22%3A%7B%22tab%22%3A%22Memory%22%2C%22h%22%3A%22tab%22%7D%2C%22o%22%3A%22Memory%22%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%7D%5D%7D%5D%7D%5D%2C%22n%22%3A%220%22%2C%22t%22%3A%220%22%2C%22o%22%3A%22%22%2C%22q%22%3A%5B%5D%2C%22maximisedItemId%22%3A%7B%7D%2C%22title%22%3A%22example.c%22%2C%22source%22%3A%22%23include%3Cstring.h%3E%5Cnint%20main()%20%7B%5Cn%20%20int%20y%2Cx%3B%5Cn%20%20int%20*p%20%3D%20%26x%20%2B%201%2C%20*q%20%3D%20%26y%3B%5Cn%20%20if(0%20%3D%3D%20memcmp(%26p%2C%20%26q%2C%20sizeof(p)))%5Cn%20%20%20%20assert(p%20%3D%3D%20q)%3B%5Cn%7D%22%7D

is a link to an executable version of that example:

#include<string.h>
int main() {
  int y,x;
  int *p = &x + 1, *q = &y;
  if(0 == memcmp(&p, &q, sizeof(p)))
    assert(p == q);
}

If you step through (click on "Step Forward"), you'll see that p keeps
its original
provenance, the memcmp succeeds (with the default allocator of the web
interface),
and then the p==q is nondeterministically either true or false
(the Left and Right options at Step 21) so the assert can either
succeed or fail. This accommodates current implementation
behaviour, though for == comparison it might well be better to
require it to compare addresses and ignore provenance,

> ... the assert would succeed, because p's provenance is deambiguated
> into y's storage instance. But if one wrote this instead:
>
> int x, y;
> int *p = &x, *q = &y;
> (void) *p++;
> if(0 == memcmp(&p, &q, sizeof(p)))
> assert(p == q); // UB!!!
>
> Now p's provenance is x's storage instance, so comparing it to q is now
> a comparison of dissimilar provenance, which is UB.

I'm not sure why you think this is different to the example above.
An equality (== or !=) comparison between pointers to different storage
instances is not UB (in current ISO C and in the new proposal), and
if you run this in Cerberus you'll see basically the same as the above.

> Which isn't quite
> the C++ 20 specification which says "unspecified", but it's also not
> always true either.
>
> (Note that I understand the C proposals to not make provenance ambiguous
> on pointer increment, if they do, then Jens is right the test becomes
> always true, and I too find that inferior to the C++ 20 specification)
>
>
> My current understanding of the C proposals is that they would bring C
> much closer to C++ 20's existing memory model, though perhaps still not
> quite as strong as it, depending on how you interpret the C++ 20
> standardese.
>
> However in both the C proposals, and the C++ 20 standard, I personally
> find far too much "unspecified", "UB" and so on. I note from the C
> proposals that GCC and ICC take an opposite interpretation of
> "unspecified" to what clang (and MSVC incidentally) do, reporting false
> where GCC and ICC might report true, and so on. I find this *not good*,
> but then that's the whole point of undefined and implementation defined
> behaviours.
>
> What I'd much prefer to see is "compiler diagnostic", "contract asserted
> precondition" and so on for these situations. I want to see code like
> this fail to compile at best, fail to execute due to contract violation
> at worst:
>
> unsigned int *a = (unsigned int *) 0x42;
> ...
> unsigned int *b = (unsigned int *) 0x42;
> ...
> if(a < b) ... // This should FAIL to compile!
>
> ... because the provenance of a and b is dissimilar.
>
> I appreciate that a minority of existing code now fails to compile, and
> of course we need an escape hatch for that. But where I'd like to get to
> is that 99% of pointer and reference using code could be checked by the
> compiler for correctness, and it would refuse anything suspect.
>
> Perhaps for C++ that simply means "constexpr all things". So, by making
> a function constexpr, one switches on compiler diagnostics for
> pointer-caused UB, otherwise C++ must assume it's dealing with C-style
> code and maybe fall back onto the runtime UB sanitiser.
>
> Anyway, back to the conference!
>
> Niall

Received on 2019-04-10 17:34:43