Date: Wed, 17 Feb 2021 19:51:02 +0000
Am Mittwoch, den 17.02.2021, 19:13 +0100 schrieb Jens Maurer:
> On 17/02/2021 14.46, Uecker, Martin via Liaison wrote:
> > in C++ "indeterminate values" exist also for expressions
> > and not just for objects (C does not really have
> > indeterminate values although the current terminology
> > implies this).
>
> Yes.
>
> > It seems that operations involving "indeterminate values"
> > are UB, except for some specific exceptions involving
> > unsigned characters. For example it seems you can
> > initialize a variable of unsigned character
> > type with an indeterminate value and this is then
> > not UB. (N4860 6.7.4)
> >
> > I am a bit puzzled what the purpose is of these rules.
> > Allowing some operations on uninitialized
> > variables could make sense to not make programs UB
> > which do propagate them accidentally in some code path
> > but then never use the value. But then why is this
> > limited to unsigned characters?
>
> We started out with the rule that reading uninitialized
> objects has undefined behavior. The C2x working draft
> has a similar rule in section 6.3.2.1
> "If the lvalue designates an object of automatic storage duration
> that could have been declared with the register storage class
> (...), and that object is uninitialized (...), the behavior is
> undefined." (This appears to make no exception for "unsigned char".)
>
> Then came C++ core issue 616
> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#616
>
> which allowed reading uninitialized "unsigned char" memory objects.
> It's been a while, but I believe the motivation was to be able
> to use memcpy on structs with padding bytes (which remain
> uninitialized in C++). "memcpy", in an abstract sense, copies
> the object representation (i.e. the "array of unsigned char"
> that provides storage for the object).
>
> However, this seemed to pessimize optimizations:
>
> unsigned char c; // not initialized
> (void) &c; // make sure it's not in a register
> unsigned char d = c; // fixate the value of c
> unsigned char e = d;
> assert(d == e);
>
> While we don't know the value of d, the approach that only
> objects have indeterminate values, not expressions, made us
> guarantee d == e for this case (the value of c was fixated
> into d when read).
Yes, this is what C has now, although we would say the value
is unspecified.
> So, we had C++ core issue 1787
> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
> which introduced the current elaborated scheme
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3914.html
>
> that established the current, more elaborate, approach.
> (The text was meanwhile reshuffled editorially.)
Ok, this is the version I have question about.
The defect (?) is that "it would be more helpful for
optimizers". But it breaks even minimum guarantees needed
fo basic reasoning about such values, i.e. that if
you copy it you get the same (what is encoded in the
'assert' above). Wouldn't it make sense to make
it completely undefined? In other words, how are
these residual rules still useful?
> > Accessing bytes in memory which are possibly uninitialized
> > is a useful feature and this works safely in C when
> > using character types (which do not have trap representations).
>
> I fail to reconcile this statement with the quote from the C2x
> working draft (see above), which does not seem to make an
> exception for character types.
This is only for automatic variables whose address
was not taken (those might not live in memory at all).
If you want to use a function like memcpy / memcmp you
have to take the address, so this works.
> > If the byte is uninitialized the value is unspecified
> > but consistent.
> > In C++ this does not seem to work as
> > indeterminate values read then cause UB later.
>
> In C++, you are allowed to copy indeterminate unsigned char values.
> Any other dealing with indeterminate values is undefined behavior.
"copy" in the sense of using it to make some other
variable also indeterminate.
Best,
Martin
> On 17/02/2021 14.46, Uecker, Martin via Liaison wrote:
> > in C++ "indeterminate values" exist also for expressions
> > and not just for objects (C does not really have
> > indeterminate values although the current terminology
> > implies this).
>
> Yes.
>
> > It seems that operations involving "indeterminate values"
> > are UB, except for some specific exceptions involving
> > unsigned characters. For example it seems you can
> > initialize a variable of unsigned character
> > type with an indeterminate value and this is then
> > not UB. (N4860 6.7.4)
> >
> > I am a bit puzzled what the purpose is of these rules.
> > Allowing some operations on uninitialized
> > variables could make sense to not make programs UB
> > which do propagate them accidentally in some code path
> > but then never use the value. But then why is this
> > limited to unsigned characters?
>
> We started out with the rule that reading uninitialized
> objects has undefined behavior. The C2x working draft
> has a similar rule in section 6.3.2.1
> "If the lvalue designates an object of automatic storage duration
> that could have been declared with the register storage class
> (...), and that object is uninitialized (...), the behavior is
> undefined." (This appears to make no exception for "unsigned char".)
>
> Then came C++ core issue 616
> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#616
>
> which allowed reading uninitialized "unsigned char" memory objects.
> It's been a while, but I believe the motivation was to be able
> to use memcpy on structs with padding bytes (which remain
> uninitialized in C++). "memcpy", in an abstract sense, copies
> the object representation (i.e. the "array of unsigned char"
> that provides storage for the object).
>
> However, this seemed to pessimize optimizations:
>
> unsigned char c; // not initialized
> (void) &c; // make sure it's not in a register
> unsigned char d = c; // fixate the value of c
> unsigned char e = d;
> assert(d == e);
>
> While we don't know the value of d, the approach that only
> objects have indeterminate values, not expressions, made us
> guarantee d == e for this case (the value of c was fixated
> into d when read).
Yes, this is what C has now, although we would say the value
is unspecified.
> So, we had C++ core issue 1787
> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
> which introduced the current elaborated scheme
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3914.html
>
> that established the current, more elaborate, approach.
> (The text was meanwhile reshuffled editorially.)
Ok, this is the version I have question about.
The defect (?) is that "it would be more helpful for
optimizers". But it breaks even minimum guarantees needed
fo basic reasoning about such values, i.e. that if
you copy it you get the same (what is encoded in the
'assert' above). Wouldn't it make sense to make
it completely undefined? In other words, how are
these residual rules still useful?
> > Accessing bytes in memory which are possibly uninitialized
> > is a useful feature and this works safely in C when
> > using character types (which do not have trap representations).
>
> I fail to reconcile this statement with the quote from the C2x
> working draft (see above), which does not seem to make an
> exception for character types.
This is only for automatic variables whose address
was not taken (those might not live in memory at all).
If you want to use a function like memcpy / memcmp you
have to take the address, so this works.
> > If the byte is uninitialized the value is unspecified
> > but consistent.
> > In C++ this does not seem to work as
> > indeterminate values read then cause UB later.
>
> In C++, you are allowed to copy indeterminate unsigned char values.
> Any other dealing with indeterminate values is undefined behavior.
"copy" in the sense of using it to make some other
variable also indeterminate.
Best,
Martin
Received on 2021-02-17 13:51:09