C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] C trap representations and unspecified values versus C++ indeterminate values

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sat, 10 Oct 2020 10:08:41 -0400
On Sat, Oct 10, 2020 at 3:52 AM Uecker, Martin <
Martin.Uecker_at_[hidden]> wrote:

> Am Samstag, den 10.10.2020, 00:32 -0400 schrieb Hubert Tong via Liaison:
> > On Fri, Oct 9, 2020 at 7:14 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> >
> > >
> > > Let me add a small bit of context here.
> > >
> > > On 10/10/2020 00.04, Hubert Tong via Liaison wrote:
> > > > Older thread appears to end here:
> > > > http://open-std.org/JTC1/SC22/WG14/17199, "(SC22WG14.17199)
> > >
> > > terminology: indeterminate value"
> > > >
> > > > UB from uninitialized values emanates from indeterminate values in
> C++
> > >
> > > and trap representations in C.
>
> There is also a special provision for automatic variables whose
> address was never taken.
>
Thanks, I never noticed 6.3.2.1 paragraph 3 before.


>
> > > > C further has unspecified values that become unspecified at a
> specific
> > >
> > > point in time but can be copied without mutation of the value. C++
> received
> > > a National Body comment for C++14 where such cases were removed:
> > > https://wg21.link/cwg1787.
> > >
> > > So, you're saying that the issue CWG 1787 highlighted for C++ is
> > > equally an issue in C, and should be addressed in a similar fashion.
> > >
> >
> > Yes, the specific example suffers from the combination of there not being
> > trap representations for unsigned char and that unspecified values, once
> > "discovered" are a specific value of the type.
> >
> > The bytes of the object representation for a union that are not also
> bytes
> > of the object representation of the active member also become unspecified
> > values in C, erasing any trap representations (because unspecified values
> > in C are not trap representations).
>
> I am not sure what this means. Representation bytes can always
> be read using an lvalue of character type but if they are
> unspecified could then be part of a trap representation for some
> union member.
>
In general, the treatment of indeterminate values of character/std::byte
type in C++ is closer to being a trap representation than to an unspecified
value. Trying to observe the value is nigh impossible.

The C description explicitly makes the values not indeterminate in the C++
sense.

The mechanism also differs between C and C++: The injection of
indeterminate values in C++ occurs via ending the lifetime of the
previously active member (this is not particularly made clear). The
injection of unspecified values in C occurs as a side effect of modifying
the storage associated with the union. Thus, for C, writing to a union
member may mutate bytes that C++ does not mutate.


>
> > It appears C++ is less aggressive with respect to initialized padding
> bytes
> > in structure and union types.
> >
> >
> > >
> > > > It is further an issue that the definition of trap representation
> does
>
> My main issue is that "indeterminate value" has a non-sensical
> definition in C. I am not sure what C++ does here. But the
> term is questionable by itself has it is meant to describe
> object states that do not represent a value at all.
>
> > > not admit trap representations for the exact-width integer types
> (int16_t,
> > > etc.) because the lack of padding bits, combined with two's complement
> > > representation, means that every possible object representation
> represents
> > > a value of the type.
>
> Yes.
>
> With provenance, there is the idea that same representation could have
> different meaning in different context. I am not sure whether this
> plays a role here.
>
>
> > > So, C says that uninitialized automatic variables have indeterminate
> value.
> > > That's fine for some types, because the implementation can choose to
> use
> > > a trap representation as the particular value used for that case,
> > > which makes any access undefined behavior.
> > > However, this avenue can't be taken for e.g. int16_t, because there are
> > > no bits left for a trap representation to be a possibility.
>
> (if the address is not taken, reading it is UB).
>
> >
> > Yes, for the C++-compatible range of int16_t, there are no bits left for
> a
> > trap representation to be a possibility.
> >
> >
> > > But it's really helpful for optimizers to assume that uninitialized
> > > automatic variables can't be read.
> > >
> > > > I think it would really help if the C committee could record a
> decision
> > >
> > > to move towards harmonizing with C++ on this.
> > >
> > > Yes. In particular since optimizers are likely to be the same for C
> and
> > > C++, anyway.
> > >
> >
> > Yes, and we're at risk of getting a hodge podge mix of the C and C++
> rules
> > in implementations.
>
> Yes, but keep in mind, that there are also many C compilers that
> do not support C++.
>
>
> Best,
> Martin
>
>

Received on 2020-10-10 09:09:00