Date: Wed, 17 Feb 2021 19:13:23 +0100
On 17/02/2021 14.46, Uecker, Martin via Liaison wrote:
> in C++ "indeterminate values" exist also for expressions
> and not just for objects (C does not really have
> indeterminate values although the current terminology
> implies this).
Yes.
> It seems that operations involving "indeterminate values"
> are UB, except for some specific exceptions involving
> unsigned characters. For example it seems you can
> initialize a variable of unsigned character
> type with an indeterminate value and this is then
> not UB. (N4860 6.7.4)
>
> I am a bit puzzled what the purpose is of these rules.
> Allowing some operations on uninitialized
> variables could make sense to not make programs UB
> which do propagate them accidentally in some code path
> but then never use the value. But then why is this
> limited to unsigned characters?
We started out with the rule that reading uninitialized
objects has undefined behavior. The C2x working draft
has a similar rule in section 6.3.2.1
"If the lvalue designates an object of automatic storage duration
that could have been declared with the register storage class
(...), and that object is uninitialized (...), the behavior is
undefined." (This appears to make no exception for "unsigned char".)
Then came C++ core issue 616
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#616
which allowed reading uninitialized "unsigned char" memory objects.
It's been a while, but I believe the motivation was to be able
to use memcpy on structs with padding bytes (which remain
uninitialized in C++). "memcpy", in an abstract sense, copies
the object representation (i.e. the "array of unsigned char"
that provides storage for the object).
However, this seemed to pessimize optimizations:
unsigned char c; // not initialized
(void) &c; // make sure it's not in a register
unsigned char d = c; // fixate the value of c
unsigned char e = d;
assert(d == e);
While we don't know the value of d, the approach that only
objects have indeterminate values, not expressions, made us
guarantee d == e for this case (the value of c was fixated
into d when read).
So, we had C++ core issue 1787
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
which introduced the current elaborated scheme
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3914.html
that established the current, more elaborate, approach.
(The text was meanwhile reshuffled editorially.)
> Accessing bytes in memory which are possibly uninitialized
> is a useful feature and this works safely in C when
> using character types (which do not have trap representations).
I fail to reconcile this statement with the quote from the C2x
working draft (see above), which does not seem to make an
exception for character types.
> If the byte is uninitialized the value is unspecified
> but consistent.
> In C++ this does not seem to work as
> indeterminate values read then cause UB later.
In C++, you are allowed to copy indeterminate unsigned char values.
Any other dealing with indeterminate values is undefined behavior.
Jens
> in C++ "indeterminate values" exist also for expressions
> and not just for objects (C does not really have
> indeterminate values although the current terminology
> implies this).
Yes.
> It seems that operations involving "indeterminate values"
> are UB, except for some specific exceptions involving
> unsigned characters. For example it seems you can
> initialize a variable of unsigned character
> type with an indeterminate value and this is then
> not UB. (N4860 6.7.4)
>
> I am a bit puzzled what the purpose is of these rules.
> Allowing some operations on uninitialized
> variables could make sense to not make programs UB
> which do propagate them accidentally in some code path
> but then never use the value. But then why is this
> limited to unsigned characters?
We started out with the rule that reading uninitialized
objects has undefined behavior. The C2x working draft
has a similar rule in section 6.3.2.1
"If the lvalue designates an object of automatic storage duration
that could have been declared with the register storage class
(...), and that object is uninitialized (...), the behavior is
undefined." (This appears to make no exception for "unsigned char".)
Then came C++ core issue 616
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#616
which allowed reading uninitialized "unsigned char" memory objects.
It's been a while, but I believe the motivation was to be able
to use memcpy on structs with padding bytes (which remain
uninitialized in C++). "memcpy", in an abstract sense, copies
the object representation (i.e. the "array of unsigned char"
that provides storage for the object).
However, this seemed to pessimize optimizations:
unsigned char c; // not initialized
(void) &c; // make sure it's not in a register
unsigned char d = c; // fixate the value of c
unsigned char e = d;
assert(d == e);
While we don't know the value of d, the approach that only
objects have indeterminate values, not expressions, made us
guarantee d == e for this case (the value of c was fixated
into d when read).
So, we had C++ core issue 1787
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
which introduced the current elaborated scheme
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3914.html
that established the current, more elaborate, approach.
(The text was meanwhile reshuffled editorially.)
> Accessing bytes in memory which are possibly uninitialized
> is a useful feature and this works safely in C when
> using character types (which do not have trap representations).
I fail to reconcile this statement with the quote from the C2x
working draft (see above), which does not seem to make an
exception for character types.
> If the byte is uninitialized the value is unspecified
> but consistent.
> In C++ this does not seem to work as
> indeterminate values read then cause UB later.
In C++, you are allowed to copy indeterminate unsigned char values.
Any other dealing with indeterminate values is undefined behavior.
Jens
Received on 2021-02-17 12:13:31