sg12: Re: [ub] C provenance semantics proposal

From: Peter Sewell <Peter.Sewell_at_[hidden]>
Date: Mon, 15 Apr 2019 15:25:03 +0100

On Mon, 15 Apr 2019 at 15:05, Uecker, Martin
<Martin.Uecker_at_[hidden]> wrote:
>
>
> Dear David,
>
> thank you for you comments. I will answer some of your questions
> from my perspective.
>
> Am Montag, den 15.04.2019, 13:54 +0200 schrieb David Brown:
> > On 02/04/2019 10:16, Peter Sewell wrote:
>
> > In small systems embedded programming, it is extremely common to access
> > IO registers at fixed addresses using something like:
> >
> > #define REG1 (*(volatile uint32_t*) 0x1234)
> >
> > Then in code you write "REG1 = 123;" or "REG1 |= 0x0010;". Types are
> > sometimes given as simple types (like uint32_t), but can often be
> > complex stucts, unions, bitfields, etc. The addresses are almost always
> > compile-time constants - but may be derived from other addresses.
> > Accesses are usually, but not always, volatile.
>
> Yes, this is not explicitely addressed in the document but we
> were aware of this use case.

(more precisely, we talk about it in the examples document, but it's not
covered by the concrete proposal so far)

> > The linked documents mention that this sort of thing should still work,
> > but as far as I can see say it is up to the implementation and that
> > perhaps the implementation will have address ranges for which this sort
> > of access is considered defined behaviour, while other accesses made by
> > integer-to-pointer conversions have provenance tracking.
> >
> > This will not work, IMHO.
> >
> > A key point is that the compiler does not know the address ranges for IO
> > registers and ram. That information is only available at linker stage,
> > because the ranges can vary for different devices within the same
> > families. So either the compiler treats all such addressing as
> > undefined (thus breaking pretty much every existing small-system
> > embedded program), or it treats all such addressing as defined, losing
> > some of the advantages of provenance tracking.
>
> I think this is not a problem: While this is (and should be) up to
> the implementation, a compiler can treat such pointers as defined
> without losing any advantage of provenance tracking. The proposed
> rules allow the optimizer to assume that such pointer do not point
> to not-exposed storage (most importantly, local variables or heap
> storage whose address does not escape). An implementation could
> simply give a generic guarantee that device-specific addresses are
> always considered "exposed" knowing that this is always safe.
> For this the compiler does not need to know the specific address
> ranges, it just needs to assume that
> those address ranges do not
> alias local variables or heap storage.

exactly so, y

> > I think at a minimum these documents should make clear that all
> > /volatile/ accesses work using the simple, concrete model. All volatile
> > access are therefore defined behaviour. (Mixing volatile and
> > non-volatile accesses to the same concrete address may not be defined
> > behaviour - a normal write to address "x" (following provenance
> > semantics) followed by a volatile read from address "x" (with concrete
> > semantics) might not get the value first written to "x" if the
> > provenance rules were not obeyed.)
>
> The integer could potentially be the address of a local variable.
> Wouldn't this break some optimization?
>
> > It may also make sense to introduce new provenance ids such as
> > "@constant" for all addresses formed from compile-time constants, or
> > "@link" for link-time constants, and specifically allow accesses via
> > these. I have not thought through the details enough here, and I think
> > it would be important to talk with some of the embedded tool vendors
> > (such as the "gnu arm embedded" folk, CodeSourcery, and of course
> > commercial embedded toolchain vendors like IAR) to make sure that we get
> > something solid here. I would particularly like to avoid
> > "implementation dependent" solutions, and have something that all
> > embedded developers can rely on with any toolchain.
>
> In this case, maybe we could add some wording that device-specific
> addresses should always be considered to be exposed from the
> beginning on. What exactly those device-specific addresses are,
> would still be implementation-defined.

Or, in other words, we'd need there to be some implementation-defined
bounds on where heap and stack storage could be, so that
we could require constant addresses to be different (otherwise UB).

> > Another big issue for small-systems embedded systems is "home made
> > malloc". Often you do not want to use standard malloc/free, but have
> > system-specific implementations that have different characteristics or
> > guarantees. This could include multiple pools for different sizes,
> > tracked heaps, different multi-threading characteristics, etc.
>
> Yes, this is one reason why I do not like proposals which rely on
> allocation address indeterminism. Such assumptions may not be true
> with home-made allocators.

Allocation-address nondeterminism as we have it now is simply saying
that programmers cannot depend on *anything* about the allocator,
except that it provides aligned non-overlapping memory. We could add
the possibility of implementation-defined facts that programmers could
rely on, if there's a compelling use case, but I can't imagine (so far)
anything beyond very simple facts, such as the overall range of memory
that might be used. Even with a custom allocator, code presumably
shouldn't normally depend on (eg) predicatable offsets between different
allocations.

> > At the oment, the "effective type" rules in C make this somewhere between
> > ugly, impossible and implementation-specific. The changes to the memory
> > models and the C standards proposed here are an opportunity to improve
> > the situation and allow a portable and standard way of getting pointers
> > with "no declared type" - they are also an opportunity to make matters
> > worse. Again, I have not worked through the details of the consequences
> > of the current proposals here - but I very much hope that the paper
> > authors could do so and include this in their examples to ensure that
> > toolchain vendors interpret things the same way.
>
> Thank you for your comment. We already has some discussions about this.
> "effective type" rules are out-of-scope for the current proposal but we
> will revisit this soon. I agree that it should be possible to write
> your own allocation functions.

likewise

best,
Peter

> > Small-systems embedded programming is a key area of C coding, and of
> > vital importance to many real-world devices all around us. But it is an
> > area that is often under-represented in the C standards work. I hope
> > you can give it due consideration here.
>
> I agree.
>
> Best,
> Martin

Received on 2019-04-15 16:25:16