sg12: Re: [ub] C provenance semantics proposal

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Wed, 10 Apr 2019 16:13:13 +0100

>> N2363 C provenance semantics: examples.
>> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
>> This explains the proposal and its design choices with discussion of a
>> series of examples.
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>
> It seems to me that most of the introductory examples have clear
> answers in current C++. In particular:
>
> — If one pointer represents the address of a complete object, and another pointer represents the address
> one past the last element of a different complete object, the result of the comparison is unspecified.
>
> The proposal in N2362 seems to make such a comparison yield "true".
>
> I actually like our current rule: never ever compare / subtract pointers
> to different complete objects. If you must compare, use std::less.

I'm just finishing off my ACCU talk there right now, and thus am finally
knee deep into what the C provenance papers actually propose.

To date, the C proposals appear to be a weaker variant of what we
already have in C++ 20, though much depends on interpretation.

In the situation above, I don't think it is correct that N2362 proposes
to make the comparison of one past a storage instance to the subsequent
storage instance always "true". Rather, the one-past pointer would have
"ambiguous provenance" which becomes deambiguated by the programmer
using the ambiguous provenance pointer in one way or another. So for
this code fragment:

int x, y;
int *p = &x + 1, *q = &y;
if(0 == memcmp(&p, &q, sizeof(p)))
assert(p == q);

... the assert would succeed, because p's provenance is deambiguated
into y's storage instance. But if one wrote this instead:

int x, y;
int *p = &x, *q = &y;
(void) *p++;
if(0 == memcmp(&p, &q, sizeof(p)))
assert(p == q); // UB!!!

Now p's provenance is x's storage instance, so comparing it to q is now
a comparison of dissimilar provenance, which is UB. Which isn't quite
the C++ 20 specification which says "unspecified", but it's also not
always true either.

(Note that I understand the C proposals to not make provenance ambiguous
on pointer increment, if they do, then Jens is right the test becomes
always true, and I too find that inferior to the C++ 20 specification)

My current understanding of the C proposals is that they would bring C
much closer to C++ 20's existing memory model, though perhaps still not
quite as strong as it, depending on how you interpret the C++ 20
standardese.

However in both the C proposals, and the C++ 20 standard, I personally
find far too much "unspecified", "UB" and so on. I note from the C
proposals that GCC and ICC take an opposite interpretation of
"unspecified" to what clang (and MSVC incidentally) do, reporting false
where GCC and ICC might report true, and so on. I find this *not good*,
but then that's the whole point of undefined and implementation defined
behaviours.

What I'd much prefer to see is "compiler diagnostic", "contract asserted
precondition" and so on for these situations. I want to see code like
this fail to compile at best, fail to execute due to contract violation
at worst:

unsigned int *a = (unsigned int *) 0x42;
...
unsigned int *b = (unsigned int *) 0x42;
...
if(a < b) ... // This should FAIL to compile!

... because the provenance of a and b is dissimilar.

I appreciate that a minority of existing code now fails to compile, and
of course we need an escape hatch for that. But where I'd like to get to
is that 99% of pointer and reference using code could be checked by the
compiler for correctness, and it would refuse anything suspect.

Perhaps for C++ that simply means "constexpr all things". So, by making
a function constexpr, one switches on compiler diagnostics for
pointer-caused UB, otherwise C++ must assume it's dealing with C-style
code and maybe fall back onto the runtime UB sanitiser.

Anyway, back to the conference!

Niall

Received on 2019-04-10 17:13:27