On Thu, Mar 3, 2022 at 10:04 AM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
Dear Hubert,

On Thu, Mar 3, 2022 at 1:18 AM Hubert Tong via Liaison <liaison@lists.isocpp.org> wrote:
The wording in N2900 uses the static initialization wording for initialization with {}. For the case of a union, the static initialization is described as:
the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits

Firstly, as not a purely interoperability concern, the definition of padding here is unclear. Does this refer to padding added to the end of the union to achieve the required alignment in cases where the largest member does not have the greatest alignment requirement? Does it refer to the union of the trailing padding in the union type and the intersection of all the alignment padding and the padding bits within integer subobjects of the union members?

It seems the presence of a separate paragraph in 6.2.6, "Representation of types" to talk about bytes that correspond to other members of a union after talking in the preceding paragraph about padding bytes mean that the bytes corresponding to other members are not automatically considered padding. This interpretation is the one that is consistent with the idea that what constitutes padding is a static property of a type and not a dynamic property of an object.


     My understanding is that padding sits outside of **any** members of the union, and is solely for the bytes that do not belong to any subobject, named or unnamed, of the union. Therefore it's a static property of the type. The wording in a few different places supports you same interpretation: other union members not being initialized are not considered part of a union's padding.

The net of this is that we're getting initialization for the first member and trailing padding beyond the largest union member for alignment purposes because "as if static storage duration" does not provide as many guarantees as people think... (not so much of a problem for structs without nested unions).

I might need to open a defect for Clang's `-ftrivial-auto-var-init=pattern` to properly restrict the padding that is zeroed, but I really think the situation is a bit nonsensical.

 
Secondly, the semantics specified does not match what C++ does for the same syntax.

From [dcl.init.aggr]:
If the aggregate is a union and the initializer list is empty, then
— if any variant member has a default member initializer, that member is initialized from its default
member initializer;
— otherwise, the first member of the union (if any) is copy-initialized from an empty initializer list.

Only the first member is initialized. The "padding" is left uninitialized.

     This is an intentional departure. = { 0 } does not initialize padding in C and for a long time created issues where users would have to use memset(...) to initialize their structures to get guaranteed-zero padding. Unfortunately this caused specific issues in e.g. kernel code and similar, where padding bits and bytes could leak data when structures were initialized "the normal way" and handed to end-users.

     The intent was for = {} to replace both memset and = {0} as a safer initialization alternative with better security properties. It is also more semantically correct because a structure with pointer or floating point members will get the correct bit representation for either the null pointer constant or the various floating point types, which may not be an all-bits-zero representation. (IEEE floating point and decimal floating point have a compatible all-bits-zero-means-numeric-zero interpretation, but that does not cover the totality of (arcane, but nonetheless supported) floating point formats.) This is an improvement for code that wishes to have better cross-platform, standard properties versus doing a blind "memset" and then needing to individually set each property.

It seems the goal was achieved for structs (or mostly so) and not achieved for unions. It is still a question (because "any padding" may refer to "padding of the aggregate/union" instead of "padding within") whether integer subobjects may have non-zero padding bits from such an initialization.
 

     The change is compatible with C++ because C++ leaves the padding bits unspecified, and choosing a behavior is compatible when there is no specified behavior in the alternatives.

It is compatible in that a user only relying on the C++ semantics is fine. It is not compatible in the way where "correct" C code is also "correct" C++ code.


Sincerely,
JeanHeyd
 
_______________________________________________
Liaison mailing list
Liaison@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
Link to this post: http://lists.isocpp.org/liaison/2022/03/1015.php