sg12: [ub] "Provenance" and all that

From: Freek Wiedijk <freek_at_[hidden]>
Date: Fri, 12 Apr 2019 14:02:02 +0200

Dear all,

I have not actively followed the discussion on this
mailing list, and I have endlessly postponed looking at
the Cambridge papers, as I feared they would irritate me
(they did), but now I still would like to write up some
opinions on these things:

I think it is very important that the standard is as clear
as possible about these issues, and I think that the only
way to get this right, is to specify the rules using a
proof system like HOL (as the Cambridge people are doing,
I hope). So I am very happy with the study group about this,
and with the *aim* of the Cerberus effort.

However, I do think that what is there is not ready for the
standard yet. I think something better should be possible.

One thing that in the documents seems to be completely
missing is unions. For me pointers to fields of unions are
also "concretely the same, abstractly different", so the
kind of disambiguation of pointers that this is all about
also need to be explained for unions. I think. If I take
an integer representation of a field of a union, cast it
back to the type of the other field, and then use that to
access the other field, should that be guaranteed to work?
Maybe, but currently this decision is not even mentioned.

There are various documents about this whole PVI/PNVI thing,
and I honestly don't know what the current state of the
proposals is. The proposed changes to the standard text is
about "PNVI-ae-udi", right? Are there alternatives that are
still in the running? Can someone list the options for me?
In each document the list is different. (That also seems to
suggest that this is not ripe enough for the standard yet.)

So the "udi" stands for "user-disambiguation". In the
section on "ambiguous provenance" (section 6 of N2362)
it is stated that if there is a cast from an integer to a
pointer that is also at the end of an object, then which
of it "is chosen by the programmer". What does this mean?
I tried reading the purple texts in N2364, but it did not
become clear to me. If someone would explain this to me,
I would be very grateful.

Just to have something concrete in this rant too, look at
the following example program:

#include <stdio.h>
#include <stdint.h>

int main()
{
  int i = 1, j = 2;
  uintptr_t pi1 = (uintptr_t) (&i + 1), pj = (uintptr_t) &j;
  if (pi1 == pj) {
    pi1 += 0*pj; pi1 -= 0*pj; /* fix provenance for PVI */
    *(int *) pi1 = 3;
    printf("%d,%d,%d\n", i, j, *(int *) pi1);
  }
  return 0;
}

With sufficient optimization, gcc causes this to print
"1,2,3". Of all the options *only* in PNVI-ae-udi this is
allowed, right? Or are there other options left for which
this is the case?

I think a compiler should be able to optimize

  i += "expression that at compile time can be seen to be 0"

and

  (int *)(uintptr_t)("some (int *) expression")

in the obvious ways without having to worry that this
optimization might lead to undefined behavior, where before
the optimization the behavior was *not* undefined. This to
me is a strong reason to favor "PNVI-ae-udi".

The notion of "provenance" originated in DR260, and instead
of finding a better way to clarify the issues, this notion
("provenance") was taken seriously. I very much regret that.
I think something more abstract than "provenance" is needed
to get the issues clear.

So I understand that PVI is not being taken seriously
anymore, but in case it comes back let's argue some more
against it:

What I do not like about PVI is that it allows you to
put an arbitrary provenance ("re-provenance") on any
integer/pointer, by allowing integers (zeroes) that have the
appropriate provenances. You can "clear" a provenance by
adding a zero with a different provenance, and then "set"
the provenance to something else by adding another zero.
Like in the example above. Note that it is always possible
to clear the provenance of an integer ("un-provenance")
-- whatever changes to the PVI are proposed -- by running
through a loop and comparing to the "provenanced" integer.

Anyway.

I would like to be involved in the Cambridge effort, but
I have the impression that they are not interested in my
thoughts. I always got "don't worry, father knows best"
vibes back, when I tried for that. Or maybe there is a
mailing list about that, and Peter forgot to put me on it.
If so, please remedy that, so I can start spamming there :-)

Freek

Received on 2019-04-12 14:12:43