C++ Logo


Advanced search

Re: [ub] "Provenance" and all that

From: Peter Sewell <Peter.Sewell_at_[hidden]>
Date: Fri, 12 Apr 2019 13:54:25 +0100
Hi Freek,

to answer your last point first: after the creation of the WG14 C memory
object model study group at the Brno meeting (which you were at),
I sent this around to WG14:

On 4 May 2018 at 15:34, Peter Sewell <Peter.Sewell_at_[hidden]> wrote:
Dear all,
In Brno, following a lengthy and useful discussion of provenance, we
created a new WG14 Study Group to look at the various outstanding
memory object model issues for C2x, with charter:
  This group will study the memory object model issues of C, including
  pointer provenance, uninitialised values, and related questions,
  aiming to reconcile mainstream usage, implementation practice, the
  ISO C standard text, and harmonisation with ISO C++. It will produce
  papers and proposals for C2x.
I hope many of you will contribute to this.  I've created a mailing
list for the group, with members-only posting but public archives,
and a github repository for working notes.  If you'd like to
contribute, please send me your email and (if you want) your github
id.  These are at:
You, along with anyone else here, would (of course!) be more than
welcome to join those mailing list discussions and (approx fortnightly)
teleconf discussions.  Next teleconf 4pm UK time on Monday 15th,
> To join the Meeting:
> https://bluejeans.com/242988866
> To join via Room System:
> Video Conferencing System: bjn.vc -or-
> Meeting ID : 242988866
> To join via phone :
> 1)  Dial:
>     +1.408.317.9254 (US (San Jose))
>     +1.866.226.4650 (US Toll Free)
>     (see all numbers - http://bluejeans.com/numbers)
> 2)  Enter Conference ID : 242988866
On 12/04/2019, Freek Wiedijk <freek_at_[hidden]> wrote:
> Dear all,
> I have not actively followed the discussion on this
> mailing list, and I have endlessly postponed looking at
> the Cambridge papers, as I feared they would irritate me
> (they did), but now I still would like to write up some
> opinions on these things:
> I think it is very important that the standard is as clear
> as possible about these issues, and I think that the only
> way to get this right, is to specify the rules using a
> proof system like HOL (as the Cambridge people are doing,
> I hope).  So I am very happy with the study group about this,
> and with the *aim* of the Cerberus effort.
> However, I do think that what is there is not ready for the
> standard yet.  I think something better should be possible.
> One thing that in the documents seems to be completely
> missing is unions.  For me pointers to fields of unions are
> also "concretely the same, abstractly different", so the
> kind of disambiguation of pointers that this is all about
> also need to be explained for unions.  I think.  If I take
> an integer representation of a field of a union, cast it
> back to the type of the other field, and then use that to
> access the other field, should that be guaranteed to work?
> Maybe, but currently this decision is not even mentioned.
This is tied up with the many questions relating to subobjects
and effective types.  We've mostly deferred those up to now,
to get the basic proposal clarified, but we're now turning attention
to them.  On Monday we'll be talking about this:
> There are various documents about this whole PVI/PNVI thing,
> and I honestly don't know what the current state of the
> proposals is.  The proposed changes to the standard text is
> about "PNVI-ae-udi", right?
> Are there alternatives that are
> still in the running?  Can someone list the options for me?
> In each document the list is different.
n2363 and n2364 also describe PNVI-plain, PNVI-ae, and PVI.
Of those, I think PNVI-plain is the only seriously interesting one,
and it has less support than PNVI-ae-udi.
> (That also seems to
> suggest that this is not ripe enough for the standard yet.)
The existence of several well-considered and compared alternatives
is surely a good thing for the final candidate, not a negative :-)
> So the "udi" stands for "user-disambiguation".  In the
> section on "ambiguous provenance" (section 6 of N2362)
> it is stated that if there is a cast from an integer to a
> pointer that is also at the end of an object, then which
> of it "is chosen by the programmer".  What does this mean?
It is left ambiguous until the first operation on the pointer that
requires it to be one or the other.
> I tried reading the purple texts in N2364, but it did not
> become clear to me.  If someone would explain this to me,
> I would be very grateful.
> Just to have something concrete in this rant too, look at
> the following example program:
> #include <stdio.h>
> #include <stdint.h>
> int main()
> {
>   int i = 1, j = 2;
>   uintptr_t pi1 = (uintptr_t) (&i + 1), pj = (uintptr_t) &j;
>   if (pi1 == pj) {
>     pi1 += 0*pj; pi1 -= 0*pj; /* fix provenance for PVI */
>     *(int *) pi1 = 3;
>     printf("%d,%d,%d\n", i, j, *(int *) pi1);
>   }
>   return 0;
> }
> With sufficient optimization, gcc causes this to print
> "1,2,3".  Of all the options *only* in PNVI-ae-udi this is
> allowed, right?  Or are there other options left for which
> this is the case?
I don't think that's allowed in any of the PNVI-* models.
You can try the example (with i and j flipped so that the default
Cerberus allocation policy does what you want) in Cerberus at this
link. (Sorry for the length - we were using the google link shortener
and haven't fixed an alternative since they discontinued that).
> I think a compiler should be able to optimize
Note that these are only proposed as *source language* models.
The issues for a compiler intermediate language are somewhat
different, eg as recently explored by Juneyoung Lee et al.  We
would hope that there is a refinement relation between the two models,
not identity.
>   i += "expression that at compile time can be seen to be 0"
> and
>   (int *)(uintptr_t)("some (int *) expression")
> in the obvious ways without having to worry that this
> optimization might lead to undefined behavior, where before
> the optimization the behavior was *not* undefined.  This to
> me is a strong reason to favor "PNVI-ae-udi".
> The notion of "provenance" originated in DR260, and instead
> of finding a better way to clarify the issues, this notion
> ("provenance") was taken seriously.  I very much regret that.
> I think something more abstract than "provenance" is needed
> to get the issues clear.
> So I understand that PVI is not being taken seriously
> anymore, but in case it comes back let's argue some more
> against it:
I think there's no need :-)
> What I do not like about PVI is that it allows you to
> put an arbitrary provenance ("re-provenance") on any
> integer/pointer, by allowing integers (zeroes) that have the
> appropriate provenances.  You can "clear" a provenance by
> adding a zero with a different provenance, and then "set"
> the provenance to something else by adding another zero.
> Like in the example above.  Note that it is always possible
> to clear the provenance of an integer ("un-provenance")
> -- whatever changes to the PVI are proposed -- by running
> through a loop and comparing to the "provenanced" integer.
> Anyway.
> I would like to be involved in the Cambridge effort, but
> I have the impression that they are not interested in my
> thoughts.  I always got "don't worry, father knows best"
> vibes back, when I tried for that.
Ahem.  That is certainly not our intention.
> Or maybe there is a
> mailing list about that, and Peter forgot to put me on it.
I sent around the invite above, last year, and added everyone who replied...
I've added you now.
> If so, please remedy that, so I can start spamming there :-)
> Freek
> _______________________________________________
> ub mailing list
> ub_at_[hidden]
> http://www.open-std.org/mailman/listinfo/ub

Received on 2019-04-12 14:54:26