sg12: Re: [ub] "Provenance" and all that

From: Freek Wiedijk <freek_at_[hidden]>
Date: Fri, 12 Apr 2019 22:57:22 +0200

Hi Peter,

Thanks for the friendly reply to my somewhat frustrated mail,
I very much appreciate that. And many thanks for adding
me to the study group list. (I don't know what happened to
that mail back then, maybe it got caught in the spam filter,
or lost in my sometimes rather chaotic mailbox.)

If you don't mind, I'll send my questions about Cerberus
to you privately for a while, in order not to bother the
people on the lists. At least not until I'm up to speed.

However, a few replies to what you write (I can't restrain
myself :-)), feel free to only answer them off-list:

>This is tied up with the many questions relating to subobjects
>and effective types. We've mostly deferred those up to now,
>to get the basic proposal clarified, but we're now turning attention
>to them.

Does the current PNVI-ae-udi semantics already have coherent
answers to these questions, and is it just a matter of
finding out what the people in the study group think about
those choices? Or will the semantics almost certainly need
modification for these issues anyway (for example for the
sub-objects), regardless of how people feel?

And how much of a formal version of this semantics is
already available? Just this online thing and an updated
PDF like N2364 to describe it? Or also something more formal
(as in a HOL4 or maybe Ott specification)?

For me "effective types", "type punning", "provenance",
"wobbly values", "lifetime-end zap", etc. all seem closely
related, and I would like a coherent model that *explains*
why these things are chosen the way they are. Probably you
feel like this too?

>The existence of several well-considered and compared alternatives
>is surely a good thing for the final candidate, not a negative :-)

I was talking about the fact that it does not yet seem
completely stable :-)

>> So the "udi" stands for "user-disambiguation". In the
>> section on "ambiguous provenance" (section 6 of N2362)
>> it is stated that if there is a cast from an integer to a
>> pointer that is also at the end of an object, then which
>> of it "is chosen by the programmer". What does this mean?
>
>It is left ambiguous until the first operation on the pointer that
>requires it to be one or the other.

This is a compile time or runtime thing? And how can it be
"required" to be a "one past" pointer? So why isn't it
always the "internal" version of the pointer then?

>> #include <stdio.h>
>> #include <stdint.h>
>>
>> int main()
>> {
>> int i = 1, j = 2;
>> uintptr_t pi1 = (uintptr_t) (&i + 1), pj = (uintptr_t) &j;
>> if (pi1 == pj) {
>> pi1 += 0*pj; pi1 -= 0*pj; /* fix provenance for PVI */
>> *(int *) pi1 = 3;
>> printf("%d,%d,%d\n", i, j, *(int *) pi1);
>> }
>> return 0;
>> }
>>
>> With sufficient optimization, gcc causes this to print
>> "1,2,3". Of all the options *only* in PNVI-ae-udi this is
>> allowed, right?
>
>I don't think that's allowed in any of the PNVI-* models.

What? You mean the optimization of GCC would need to be
*changed*, restricted, if PNVI-ae-udi++ would get into
the standard?

But I don't understand. In PNVI-ae-udi, the programmer can
choose (?) to have the coercion (int *) pi1 give a pointer
with the provenance of i, right? So then *(int *) pi1 = 3
would have undefined behavior? So why is the output "1,2,3"
not allowed in that case?

>You can try the example (with i and j flipped so that the default
>Cerberus allocation policy does what you want) in Cerberus at this
>link.

Why do i and j need to be flipped, if I want to know what
*all* possible behaviors of this program are? I would
guess that the program can: print "1,3,3" in case
((int *) pi1) gets the provenance of j, exhibit undefined
behavior which of course includes the other two behaviors in
case ((int *) pi1) gets the provenance of i, or print nothing
at all in case pi1 and pj differ. So this seems to hold
regardless of the order of i and j, why would the order of
i and j make a difference for the semantics of the program?

But I haven't tried the tool, the link got damaged in
the mail. So maybe I misunderstand what the tool gives you.

>Note that these are only proposed as *source language* models.
>The issues for a compiler intermediate language are somewhat
>different, eg as recently explored by Juneyoung Lee et al. We
>would hope that there is a refinement relation between the two models,
>not identity.

I was talking about a relation between source language
programs: one in which i += 0*(...) is present, and one
in which it isn't. I would like to be allowed to map both
programs to the same intermediate language representation
after optimizations, one that has undefined behavior.
But that has implications for the *source* language
semantics, because if the first program does *not* have
undefined behavior, this is not possible.

But, as PVI seems to be out of the window (for which I'm
grateful, the less "provenance" the better), it doesn't
matter.

>I've added you now.

Again: many thanks! And I'll mail you with many more
questions :-)

Freek

Received on 2019-04-12 22:57:25