ISOCPP liaison List: Re: [wg14/wg21 liaison] N2900 Initialization with {} and potential semantic divergence from C++ for new-to-C syntax

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 3 Mar 2022 19:23:06 -0500

Dear Hubert,

On Thu, Mar 3, 2022 at 4:53 PM Hubert Tong <hubert.reinterpretcast_at_[hidden]>
wrote:

> …
> The net of this is that we're getting initialization for the first member
> and trailing padding beyond the largest union member for alignment purposes
> because "as if static storage duration" does not provide as many guarantees
> as people think... (not so much of a problem for structs without nested
> unions).
>
> I might need to open a defect for Clang's
> `-ftrivial-auto-var-init=pattern` to properly restrict the padding that is
> zeroed, but I really think the situation is a bit nonsensical.
>

     I agree that the situation is non-ideal. However, I did do my best to
represent your concerns to the Committee. In particular, I provided
Optional Change 0 as part of the proposal. It was Optional and not part of
the core proposal (therefore, it was voted on separately). The reason is
because the wording in the previous revision had attempted to mandate it,
and **most** individuals in the C Committee were miffed by such a departure
from how unions were normally treated: it has always been first-member-init.

      In an attempt to alleviate their concerns, I wrote Optional Change 0 (
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm#wording-6.7.9p10-extra)
as follows:

> …
> — if it is a union and the initializer is the empty initializer, the
largest member is initialized (recursively) according to these rules, then
the first named member is initialized (recursively) according to these
rules, and any padding is initialized to zero bits; > — if it is a union and
the initializer is not an empty initializer , the first named member is
initialized (recursively) according to these rules, and any padding is
initialized to zero bits; > …

      We took a vote on it. It did not pass:

> No consensus to initialize whole union then largest member for {} (N2900)
into C23.

     Again, I sincerely tried to make what you had pointed out as part of
the semantics so that unions would be usefully included. I think folks in
WG14 would be okay if there was a separate syntax that deliberately
initialized "as-if everything including padding bits are static-init, and
then performed the correct initialization of the proper members". I think
if you presented the security benefits to them and gave them that
alternative syntax, they would like it. I think that implementing it in a
handful of compilers and advertising it would make it easy to get into the
next revision of C, and consequently C++ (both Committees need/want
something of this nature).

     I understand it is not ideal to have the opt-in syntax be the secure
syntax. I can't really change C or C++'s momentum in this direction,
though. Many members are adamant we do not change the semantics of existing
initialization, and stick to existing principles, no matter how dangerous
or no matter how proven they are to leak information/cause problems.

…
> It seems the goal was achieved for structs (or mostly so) and not achieved
> for unions. It is still a question (because "any padding" may refer to
> "padding of the aggregate/union" instead of "padding within") whether
> integer subobjects may have non-zero padding bits from such an
> initialization.
>

     My interpretation is that it was "padding of the aggregate/union", but
bits inside (e.g., for an integer) *could* be non-zero (since the static
init rules say it is the value of literal 0 cast to the integer type, and
if that literal zero cast to the integer type produces an integer that has
non-zero padding bits, that's fine because the value-representation bits
are already in the proper configuration to presented the casted-literal-0).

     I think clarifying that all the padding bits within the structure, or
the first member of the union, or within any arithmetic or pointer object,
should be 0 when doing static initialization. I think that is in the spirit
of what the wording is trying to achieve. I do not know if they were
considering the padding bits of integers when they added the padding bit
fixes for structures/unions in C99 for static initialization. This does
become more important where things like _Bool, which by-default have
padding bits because the value representation only takes 1 bit and _Bool
will occupy at least 8 bits.

     I should ask some of the older WG14 members if they remember what was
going on at the time for C99 and that change. Maybe they did not add the
clause to arithmetic and pointer types in C99 because they just did not
consider arithmetic types with padding bits as something that could be
real/important. Or maybe they believed that no architecture would interpret
"initialized from conversion from literal 0" as giving padding bits.

>
> …
> It is compatible in that a user only relying on the C++ semantics is fine.
> It is not compatible in the way where "correct" C code is also "correct"
> C++ code.
>

     Agreed, but this is beyond the scope of my work. I could ask C++ if
they were interested in such a change (padding bits are zero), and the
benefits it could bring to aggregates (though, perhaps not to unions, as
you detailed previously). I think it would be a positive change.
Alternatively, we could revert what C has done and specifically exempt = {}
from initializing padding bits. It would worsen security for structures and
for people who put largest-member-first in their unions. I think that would
be a negative change, but the positive is that both languages would be
equally "too bad for you, security person". (Which doesn't sound like a
benefit, but is at least some form of "equal is always fair", and therefore
dependable.)

Sincerely,
JeanHeyd

>

Received on 2022-03-04 00:23:19