Date: Fri, 12 Apr 2019 13:09:21 +0100
Hi all,
perhaps I can reset this discussion, which got bogged down in largely
independent questions about pointer equality, by summarising the
basic provenance proposal. Try this below, extracted from n2363.
Comments on this?
best,
Peter
C pointer values are typically represented at runtime as simple
concrete numeric values, but mainstream compilers routinely exploit
information about the "provenance" of pointers to reason that they
cannot alias, and hence to justify optimisations. This is
long-standing practice, but exactly what it means (what programmers
can rely on, and what provenance-based alias analysis is allowed to
do), has never been nailed down. That's what the proposal does.
The basic idea is to associate a *provenance* with every pointer
value, identifying the original storage instance (or allocation, in
other words) that the pointer is derived from. In more detail:
- We take abstract-machine pointer values to be pairs (pi,a), adding a
provenance pi, either @i where i is a storage instance ID, or the
*empty* provenance, to their concrete address a.
- On every storage instance creation (of objects with static, thread,
automatic, and allocated storage duration), the abstract machine
nondeterministically chooses a fresh storage instance ID i (unique
across the entire execution), and the resulting pointer value
carries that single storage instance ID as its provenance @i.
- Provenance is preserved by pointer arithmetic that adds or subtracts
an integer to a pointer.
- At any access via a pointer value, its numeric address must be
consistent with its provenance, with undefined behaviour
otherwise. In particular:
-- access via a pointer value which has provenance a single storage
instance ID @i must be within the memory footprint of the
corresponding original storage instance, which must still be
live.
-- all other accesses, including those via a pointer value with
empty provenance, are undefined behaviour.
Regarding such accesses as undefined behaviour is necessary to make
optimisation based on provenance alias analysis sound: if the standard
did define behaviour for programs that make provenance-violating
accesses, e.g.~by adopting a concrete semantics, optimisation based on
provenance-aware alias analysis would not be sound. In other words,
the provenance lets one distinguish a one-past pointer from a pointer
to the start of an adjacently-allocated object, which otherwise are
indistinguishable.
All this is for the C abstract machine as defined in the standard:
compilers might rely on provenance in their alias analysis and
optimisation, but one would not expect normal implementations to
record or manipulate provenance at runtime (though dynamic or static
analysis tools might).
Then, to support low-level systems programming, C provides many other
ways to construct and manipulate pointer values:
- casts of pointers to integer types and back, possibly with integer
arithmetic, e.g.~to force alignment, or to store information in
unused bits of pointers;
- copying pointer values with memcpy;
- manipulation of the representation bytes of pointers, e.g.~via user
code that copies them via char* or unsigned char* accesses;
- type punning between pointer and integer values;
- I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
the pointer representation bytes, or pointer/integer casts and
integer I/O;
- copying pointer values with realloc; and
- constructing pointer values that embody knowledge established from
linking, and from constants that represent the addresses of
memory-mapped devices.
A satisfactory semantics has to address all these, together with the
implications on optimisation. We've explored several, but our main
proposal is "PNVI-ae-udi" (provenance not via integers,
address-exposed, user-disambiguation).
This semantics does not track provenance via integers. Instead, at
integer-to-pointer cast points, it checks whether the given address
points within a live object that has previously been *exposed* and, if
so, recreates the corresponding provenance.
A storage instance is deemed exposed by a cast of a pointer to it to
an integer type, by a read (at non-pointer type) of the representation
of the pointer, or by an output of the pointer using %p.
The user-disambiguation refinement adds some complexity but supports
roundtrip casts, from pointer to integer and back, of pointers that
are one-past a storage instance.
On 02/04/2019, Peter Sewell <Peter.Sewell_at_[hidden]> wrote:
> Dear UB folk,
>
> continuing the discussion from last year at EuroLLVM, the GNU Tools
> Cauldron,
> and elsewhere, we (the WG14 C memory object model study group) now
> have a detailed proposal for pointer provenance semantics, refining
> the "provenance not via integers (PNVI)" model presented there.
> This will be discussed at the ISO WG14 C standards committee at the
> end of April, and comments from the community before then would
> be very welcome. The proposal reconciles the needs of existing code
> and the behaviour of existing compilers as well as we can, but it doesn't
> exactly match any of the latter, so we'd especially like to know whether
> it would be feasible to implement - our hope is that it would only require
> minor changes. It's presented in three documents:
>
> N2362 Moving to a provenance-aware memory model for C: proposal for C2x
> by the memory object model study group. Jens Gustedt, Peter Sewell,
> Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> This introduces the proposal and gives the proposed change to the standard
> text, presented as change-highlighted pages of the standard
> (though one might want to read the N2363 examples before going into that).
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
>
> N2363 C provenance semantics: examples.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin
> Uecker.
> This explains the proposal and its design choices with discussion of a
> series of examples.
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>
> N2364 C provenance semantics: detailed semantics.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> This gives a detailed mathematical semantics for the proposal
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
>
> In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> executable version of the semantics, with a web interface that
> allows one to explore and visualise the behaviour of small test
> programs, stepping through and seeing the abstract-machine
> memory state including provenance information. N2363 compares
> the results of this for the example programs with gcc, clang, and icc
> results, though the tests are really intended as tests of the semantics
> rather than compiler tests, so one has to interpret this with care.
>
> best,
> Peter (for the study group)
>
perhaps I can reset this discussion, which got bogged down in largely
independent questions about pointer equality, by summarising the
basic provenance proposal. Try this below, extracted from n2363.
Comments on this?
best,
Peter
C pointer values are typically represented at runtime as simple
concrete numeric values, but mainstream compilers routinely exploit
information about the "provenance" of pointers to reason that they
cannot alias, and hence to justify optimisations. This is
long-standing practice, but exactly what it means (what programmers
can rely on, and what provenance-based alias analysis is allowed to
do), has never been nailed down. That's what the proposal does.
The basic idea is to associate a *provenance* with every pointer
value, identifying the original storage instance (or allocation, in
other words) that the pointer is derived from. In more detail:
- We take abstract-machine pointer values to be pairs (pi,a), adding a
provenance pi, either @i where i is a storage instance ID, or the
*empty* provenance, to their concrete address a.
- On every storage instance creation (of objects with static, thread,
automatic, and allocated storage duration), the abstract machine
nondeterministically chooses a fresh storage instance ID i (unique
across the entire execution), and the resulting pointer value
carries that single storage instance ID as its provenance @i.
- Provenance is preserved by pointer arithmetic that adds or subtracts
an integer to a pointer.
- At any access via a pointer value, its numeric address must be
consistent with its provenance, with undefined behaviour
otherwise. In particular:
-- access via a pointer value which has provenance a single storage
instance ID @i must be within the memory footprint of the
corresponding original storage instance, which must still be
live.
-- all other accesses, including those via a pointer value with
empty provenance, are undefined behaviour.
Regarding such accesses as undefined behaviour is necessary to make
optimisation based on provenance alias analysis sound: if the standard
did define behaviour for programs that make provenance-violating
accesses, e.g.~by adopting a concrete semantics, optimisation based on
provenance-aware alias analysis would not be sound. In other words,
the provenance lets one distinguish a one-past pointer from a pointer
to the start of an adjacently-allocated object, which otherwise are
indistinguishable.
All this is for the C abstract machine as defined in the standard:
compilers might rely on provenance in their alias analysis and
optimisation, but one would not expect normal implementations to
record or manipulate provenance at runtime (though dynamic or static
analysis tools might).
Then, to support low-level systems programming, C provides many other
ways to construct and manipulate pointer values:
- casts of pointers to integer types and back, possibly with integer
arithmetic, e.g.~to force alignment, or to store information in
unused bits of pointers;
- copying pointer values with memcpy;
- manipulation of the representation bytes of pointers, e.g.~via user
code that copies them via char* or unsigned char* accesses;
- type punning between pointer and integer values;
- I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
the pointer representation bytes, or pointer/integer casts and
integer I/O;
- copying pointer values with realloc; and
- constructing pointer values that embody knowledge established from
linking, and from constants that represent the addresses of
memory-mapped devices.
A satisfactory semantics has to address all these, together with the
implications on optimisation. We've explored several, but our main
proposal is "PNVI-ae-udi" (provenance not via integers,
address-exposed, user-disambiguation).
This semantics does not track provenance via integers. Instead, at
integer-to-pointer cast points, it checks whether the given address
points within a live object that has previously been *exposed* and, if
so, recreates the corresponding provenance.
A storage instance is deemed exposed by a cast of a pointer to it to
an integer type, by a read (at non-pointer type) of the representation
of the pointer, or by an output of the pointer using %p.
The user-disambiguation refinement adds some complexity but supports
roundtrip casts, from pointer to integer and back, of pointers that
are one-past a storage instance.
On 02/04/2019, Peter Sewell <Peter.Sewell_at_[hidden]> wrote:
> Dear UB folk,
>
> continuing the discussion from last year at EuroLLVM, the GNU Tools
> Cauldron,
> and elsewhere, we (the WG14 C memory object model study group) now
> have a detailed proposal for pointer provenance semantics, refining
> the "provenance not via integers (PNVI)" model presented there.
> This will be discussed at the ISO WG14 C standards committee at the
> end of April, and comments from the community before then would
> be very welcome. The proposal reconciles the needs of existing code
> and the behaviour of existing compilers as well as we can, but it doesn't
> exactly match any of the latter, so we'd especially like to know whether
> it would be feasible to implement - our hope is that it would only require
> minor changes. It's presented in three documents:
>
> N2362 Moving to a provenance-aware memory model for C: proposal for C2x
> by the memory object model study group. Jens Gustedt, Peter Sewell,
> Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> This introduces the proposal and gives the proposed change to the standard
> text, presented as change-highlighted pages of the standard
> (though one might want to read the N2363 examples before going into that).
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
>
> N2363 C provenance semantics: examples.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin
> Uecker.
> This explains the proposal and its design choices with discussion of a
> series of examples.
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>
> N2364 C provenance semantics: detailed semantics.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> This gives a detailed mathematical semantics for the proposal
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
>
> In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> executable version of the semantics, with a web interface that
> allows one to explore and visualise the behaviour of small test
> programs, stepping through and seeing the abstract-machine
> memory state including provenance information. N2363 compares
> the results of this for the example programs with gcc, clang, and icc
> results, though the tests are really intended as tests of the semantics
> rather than compiler tests, so one has to interpret this with care.
>
> best,
> Peter (for the study group)
>
Received on 2019-04-12 14:09:22