ISOCPP liaison List: Re: [wg14/wg21 liaison] N2900 Initialization with {} and potential semantic divergence from C++ for new-to-C syntax

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Thu, 3 Mar 2022 16:52:36 -0500

On Thu, Mar 3, 2022 at 10:04 AM JeanHeyd Meneide <phdofthehouse_at_[hidden]>
wrote:

> Dear Hubert,
>
> On Thu, Mar 3, 2022 at 1:18 AM Hubert Tong via Liaison <
> liaison_at_[hidden]> wrote:
>
>> The wording in N2900 uses the static initialization wording for
>> initialization with {}. For the case of a union, the static initialization
>> is described as:
>>
>>> the first named member is initialized (recursively) according to these
>>> rules, and any padding is initialized to zero bits
>>>
>>
>> Firstly, as not a purely interoperability concern, the definition of
>> padding here is unclear. Does this refer to padding added to the end of the
>> union to achieve the required alignment in cases where the largest member
>> does not have the greatest alignment requirement? Does it refer to the
>> union of the trailing padding in the union type and the intersection of all
>> the alignment padding and the padding bits within integer subobjects of the
>> union members?
>>
>> It seems the presence of a separate paragraph in 6.2.6, "Representation
>> of types" to talk about bytes that correspond to other members of a union
>> after talking in the preceding paragraph about padding bytes mean that the
>> bytes corresponding to other members are *not* automatically considered
>> padding. This interpretation is the one that is consistent with the idea
>> that what constitutes padding is a static property of a type and not a
>> dynamic property of an object.
>>
>>
> My understanding is that padding sits outside of **any** members of
> the union, and is solely for the bytes that do not belong to any subobject,
> named or unnamed, of the union. Therefore it's a static property of the
> type. The wording in a few different places supports you same
> interpretation: other union members not being initialized are not
> considered part of a union's padding.
>

The net of this is that we're getting initialization for the first member
and trailing padding beyond the largest union member for alignment purposes
because "as if static storage duration" does not provide as many guarantees
as people think... (not so much of a problem for structs without nested
unions).

I might need to open a defect for Clang's `-ftrivial-auto-var-init=pattern`
to properly restrict the padding that is zeroed, but I really think the
situation is a bit nonsensical.

>
>> Secondly, the semantics specified does not match what C++ does for the
>> same syntax.
>>
>> From [dcl.init.aggr]:
>> If the aggregate is a union and the initializer list is empty, then
>> — if any variant member has a default member initializer, that member is
>> initialized from its default
>> member initializer;
>> — otherwise, the first member of the union (if any) is copy-initialized
>> from an empty initializer list.
>>
>> Only the first member is initialized. The "padding" is left uninitialized.
>>
>
> This is an intentional departure. = { 0 } does not initialize padding
> in C and for a long time created issues where users would have to use
> memset(...) to initialize their structures to get guaranteed-zero padding.
> Unfortunately this caused specific issues in e.g. kernel code and similar,
> where padding bits and bytes could leak data when structures were
> initialized "the normal way" and handed to end-users.
>
> The intent was for = {} to replace both memset and = {0} as a safer
> initialization alternative with better security properties. It is also more
> semantically correct because a structure with pointer or floating point
> members will get the correct bit representation for either the null pointer
> constant or the various floating point types, which may not be an
> all-bits-zero representation. (IEEE floating point and decimal floating
> point have a compatible all-bits-zero-means-numeric-zero interpretation,
> but that does not cover the totality of (arcane, but nonetheless supported)
> floating point formats.) This is an improvement for code that wishes to
> have better cross-platform, standard properties versus doing a blind
> "memset" and then needing to individually set each property.
>

It seems the goal was achieved for structs (or mostly so) and not achieved
for unions. It is still a question (because "any padding" may refer to
"padding of the aggregate/union" instead of "padding within") whether
integer subobjects may have non-zero padding bits from such an
initialization.

>
> The change is compatible with C++ because C++ leaves the padding bits
> unspecified, and choosing a behavior is compatible when there is no
> specified behavior in the alternatives.
>

It is compatible in that a user only relying on the C++ semantics is fine.
It is not compatible in the way where "correct" C code is also "correct"
C++ code.

> Sincerely,
> JeanHeyd
>
>
>> _______________________________________________
>> Liaison mailing list
>> Liaison_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
>> Link to this post: http://lists.isocpp.org/liaison/2022/03/1015.php
>>
>

Received on 2022-03-03 21:53:05