C++ Logo


Advanced search

Re: [ub] C provenance semantics proposal (without pointer equality)

From: David Vandevoorde <daveed_at_[hidden]>
Date: Fri, 12 Apr 2019 10:17:13 -0400
> On Apr 12, 2019, at 8:09 AM, Peter Sewell <Peter.Sewell_at_[hidden]> wrote:
> Hi all,
> perhaps I can reset this discussion, which got bogged down in largely
> independent questions about pointer equality, by summarising the
> basic provenance proposal. Try this below, extracted from n2363.
> Comments on this?
> best,
> Peter
> C pointer values are typically represented at runtime as simple
> concrete numeric values, but mainstream compilers routinely exploit
> information about the "provenance" of pointers to reason that they
> cannot alias, and hence to justify optimisations. This is
> long-standing practice, but exactly what it means (what programmers
> can rely on, and what provenance-based alias analysis is allowed to
> do), has never been nailed down. That's what the proposal does.
> The basic idea is to associate a *provenance* with every pointer
> value, identifying the original storage instance (or allocation, in
> other words) that the pointer is derived from. In more detail:
> - We take abstract-machine pointer values to be pairs (pi,a), adding a
> provenance pi, either @i where i is a storage instance ID, or the
> *empty* provenance, to their concrete address a.
> - On every storage instance creation (of objects with static, thread,
> automatic, and allocated storage duration), the abstract machine
> nondeterministically chooses a fresh storage instance ID i (unique
> across the entire execution), and the resulting pointer value
> carries that single storage instance ID as its provenance @i.
> - Provenance is preserved by pointer arithmetic that adds or subtracts
> an integer to a pointer.
> - At any access via a pointer value, its numeric address must be
> consistent with its provenance, with undefined behaviour
> otherwise. In particular:
> -- access via a pointer value which has provenance a single storage
> instance ID @i must be within the memory footprint of the
> corresponding original storage instance, which must still be
> live.
> -- all other accesses, including those via a pointer value with
> empty provenance, are undefined behaviour.
> Regarding such accesses as undefined behaviour is necessary to make
> optimisation based on provenance alias analysis sound: if the standard
> did define behaviour for programs that make provenance-violating
> accesses, e.g.~by adopting a concrete semantics, optimisation based on
> provenance-aware alias analysis would not be sound. In other words,
> the provenance lets one distinguish a one-past pointer from a pointer
> to the start of an adjacently-allocated object, which otherwise are
> indistinguishable.
> All this is for the C abstract machine as defined in the standard:
> compilers might rely on provenance in their alias analysis and
> optimisation, but one would not expect normal implementations to
> record or manipulate provenance at runtime (though dynamic or static
> analysis tools might).
> Then, to support low-level systems programming, C provides many other
> ways to construct and manipulate pointer values:
> - casts of pointers to integer types and back, possibly with integer
> arithmetic, e.g.~to force alignment, or to store information in
> unused bits of pointers;
> - copying pointer values with memcpy;
> - manipulation of the representation bytes of pointers, e.g.~via user
> code that copies them via char* or unsigned char* accesses;
> - type punning between pointer and integer values;
> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
> the pointer representation bytes, or pointer/integer casts and
> integer I/O;
> - copying pointer values with realloc; and
> - constructing pointer values that embody knowledge established from
> linking, and from constants that represent the addresses of
> memory-mapped devices.
> A satisfactory semantics has to address all these, together with the
> implications on optimisation. We've explored several, but our main
> proposal is "PNVI-ae-udi" (provenance not via integers,
> address-exposed, user-disambiguation).
> This semantics does not track provenance via integers. Instead, at
> integer-to-pointer cast points, it checks whether the given address
> points within a live object that has previously been *exposed* and, if
> so, recreates the corresponding provenance.
> A storage instance is deemed exposed by a cast of a pointer to it to
> an integer type, by a read (at non-pointer type) of the representation
> of the pointer, or by an output of the pointer using %p.
> The user-disambiguation refinement adds some complexity but supports
> roundtrip casts, from pointer to integer and back, of pointers that
> are one-past a storage instance.

Thanks for that summary.

I _think_ that that also covers XOR linked lists, right? (https://en.wikipedia.org/wiki/XOR_linked_list <https://en.wikipedia.org/wiki/XOR_linked_list>)


> On 02/04/2019, Peter Sewell <Peter.Sewell_at_[hidden]> wrote:
>> Dear UB folk,
>> continuing the discussion from last year at EuroLLVM, the GNU Tools
>> Cauldron,
>> and elsewhere, we (the WG14 C memory object model study group) now
>> have a detailed proposal for pointer provenance semantics, refining
>> the "provenance not via integers (PNVI)" model presented there.
>> This will be discussed at the ISO WG14 C standards committee at the
>> end of April, and comments from the community before then would
>> be very welcome. The proposal reconciles the needs of existing code
>> and the behaviour of existing compilers as well as we can, but it doesn't
>> exactly match any of the latter, so we'd especially like to know whether
>> it would be feasible to implement - our hope is that it would only require
>> minor changes. It's presented in three documents:
>> N2362 Moving to a provenance-aware memory model for C: proposal for C2x
>> by the memory object model study group. Jens Gustedt, Peter Sewell,
>> Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
>> This introduces the proposal and gives the proposed change to the standard
>> text, presented as change-highlighted pages of the standard
>> (though one might want to read the N2363 examples before going into that).
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
>> N2363 C provenance semantics: examples.
>> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin
>> Uecker.
>> This explains the proposal and its design choices with discussion of a
>> series of examples.
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>> N2364 C provenance semantics: detailed semantics.
>> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
>> This gives a detailed mathematical semantics for the proposal
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
>> In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
>> executable version of the semantics, with a web interface that
>> allows one to explore and visualise the behaviour of small test
>> programs, stepping through and seeing the abstract-machine
>> memory state including provenance information. N2363 compares
>> the results of this for the example programs with gcc, clang, and icc
>> results, though the tests are really intended as tests of the semantics
>> rather than compiler tests, so one has to interpret this with care.
>> best,
>> Peter (for the study group)
> _______________________________________________
> ub mailing list
> ub_at_[hidden]
> http://www.open-std.org/mailman/listinfo/ub

Received on 2019-04-12 16:17:17