ISOCPP liaison List: Re: [wg14/wg21 liaison] N2900 Initialization with {} and potential semantic divergence from C++ for new-to-C syntax

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Fri, 4 Mar 2022 01:02:52 -0500

On Thu, Mar 3, 2022 at 7:23 PM JeanHeyd Meneide <phdofthehouse_at_[hidden]>
wrote:

> Dear Hubert,
>
> On Thu, Mar 3, 2022 at 4:53 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
>> …
>> The net of this is that we're getting initialization for the first member
>> and trailing padding beyond the largest union member for alignment purposes
>> because "as if static storage duration" does not provide as many guarantees
>> as people think... (not so much of a problem for structs without nested
>> unions).
>>
>> I might need to open a defect for Clang's
>> `-ftrivial-auto-var-init=pattern` to properly restrict the padding that is
>> zeroed, but I really think the situation is a bit nonsensical.
>>
>
> I agree that the situation is non-ideal. However, I did do my best to
> represent your concerns to the Committee. In particular, I provided
> Optional Change 0 as part of the proposal. It was Optional and not part of
> the core proposal (therefore, it was voted on separately). The reason is
> because the wording in the previous revision had attempted to mandate it,
> and **most** individuals in the C Committee were miffed by such a departure
> from how unions were normally treated: it has always been first-member-init.
>

I agree that Optional Change 0 is one way of approaching the gap. It sounds
from what you describe above that the primary objection to it was how it
does so. An alternative method would be to adjust the static initialization
wording (with WG 21 coordination) to clarify that (a) the representation of
the value converted from 0 that is chosen for each scalar type is
consistent for all such initialization of an object of that type, (b) all
padding bits between members of a structure and all trailing padding in a
structure is zeroed, and (c) all bytes of a union that do not overlap with
the object representation of the first member of the union are zeroed.

>
> In an attempt to alleviate their concerns, I wrote Optional Change 0
> (
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm#wording-6.7.9p10-extra)
> as follows:
>
> > …
> > — if it is a union and the initializer is the empty initializer, the
> largest member is initialized (recursively) according to these rules, then
> the first named member is initialized (recursively) according to these
> rules, and any padding is initialized to zero bits; > — if it is a union and
> the initializer is not an empty initializer , the first named member is
> initialized (recursively) according to these rules, and any padding is
> initialized to zero bits; > …
>
> We took a vote on it. It did not pass:
>
> > No consensus to initialize whole union then largest member for {}
> (N2900) into C23.
>
> Again, I sincerely tried to make what you had pointed out as part of
> the semantics so that unions would be usefully included. I think folks in
> WG14 would be okay if there was a separate syntax that deliberately
> initialized "as-if everything including padding bits are static-init, and
> then performed the correct initialization of the proper members". I think
> if you presented the security benefits to them and gave them that
> alternative syntax, they would like it. I think that implementing it in a
> handful of compilers and advertising it would make it easy to get into the
> next revision of C, and consequently C++ (both Committees need/want
> something of this nature).
>

That would be a separate feature though. It is also perhaps less likely to
be expressed as a modification of the initialization than it is to be as a
modifier of the type because assignment has similar problems (e.g., when
assigning to "fresh" heap-allocated memory).

>
> I understand it is not ideal to have the opt-in syntax be the secure
> syntax. I can't really change C or C++'s momentum in this direction,
> though. Many members are adamant we do not change the semantics of existing
> initialization, and stick to existing principles, no matter how dangerous
> or no matter how proven they are to leak information/cause problems.
>

Perhaps the reaction was for the scope of the change implied by Optional
Change 0, which I agree is scary if the union's first member disagreed with
its largest member as to what the byte values in the object representation
needs to be. A wording change that reinforces what people probably are
assuming about static initialization might get a better reception.

>
> …
>> It seems the goal was achieved for structs (or mostly so) and not
>> achieved for unions. It is still a question (because "any padding" may
>> refer to "padding of the aggregate/union" instead of "padding within")
>> whether integer subobjects may have non-zero padding bits from such an
>> initialization.
>>
>
> My interpretation is that it was "padding of the aggregate/union",
> but bits inside (e.g., for an integer) *could* be non-zero (since the
> static init rules say it is the value of literal 0 cast to the integer
> type, and if that literal zero cast to the integer type produces an integer
> that has non-zero padding bits, that's fine because the
> value-representation bits are already in the proper configuration to
> presented the casted-literal-0).
>
> I think clarifying that all the padding bits within the structure, or
> the first member of the union, or within any arithmetic or pointer object,
> should be 0 when doing static initialization. I think that is in the spirit
> of what the wording is trying to achieve. I do not know if they were
> considering the padding bits of integers when they added the padding bit
> fixes for structures/unions in C99 for static initialization. This does
> become more important where things like _Bool, which by-default have
> padding bits because the value representation only takes 1 bit and _Bool
> will occupy at least 8 bits.
>
> I should ask some of the older WG14 members if they remember what was
> going on at the time for C99 and that change. Maybe they did not add the
> clause to arithmetic and pointer types in C99 because they just did not
> consider arithmetic types with padding bits as something that could be
> real/important. Or maybe they believed that no architecture would interpret
> "initialized from conversion from literal 0" as giving padding bits.
>

Aside from the consistency suggested by my change suggestion earlier, the
line between being a goal and a non-goal gets fuzzy past asking for the
all-bits zero representation if that is a representation of the value
converted from 0.

>
>>
>> …
>> It is compatible in that a user only relying on the C++ semantics is
>> fine. It is not compatible in the way where "correct" C code is also
>> "correct" C++ code.
>>
>
> Agreed, but this is beyond the scope of my work. I could ask C++ if
> they were interested in such a change (padding bits are zero), and the
> benefits it could bring to aggregates (though, perhaps not to unions, as
> you detailed previously). I think it would be a positive change.
> Alternatively, we could revert what C has done and specifically exempt = {}
> from initializing padding bits. It would worsen security for structures and
> for people who put largest-member-first in their unions. I think that would
> be a negative change, but the positive is that both languages would be
> equally "too bad for you, security person". (Which doesn't sound like a
> benefit, but is at least some form of "equal is always fair", and therefore
> dependable.)
>

Suggesting such a change for C++ is likely going to produce objections due
to reductions in efficiency for code that is "correct" today.

I would not consider this {} behaviour a solution to a "sanitized" handoff
to a different security context scenario anyway. Even assigning the "right
values" after such an initialization is not fool-proof. It's perfectly
conforming to omit zeroing of padding bytes if an assignment to a member
(which causes padding bytes to take unspecified values) occurs immediately
after. It's also perfectly conforming to copy data adjacent to the source
(that is within the same memory allocation) into the padding.

I think the pragmatic approach would be to make {} in C perform (for other
than static or thread storage duration) what C++ does. `{ static }` or
something as new syntax might be appropriate for saying "as if initialized
with 'zero padding'"; however, someone who needs that is probably doing
something suspect (wants security but not doing enough to have security or
is reading from an inactive member of a union). A false sense of security
is probably harmful to actual security.

My thinking on the subject of whether a convenient syntax for getting the
as-if static storage duration initialization may have changed since I first
engaged with you (a year or more ago) regarding this paper. The fact that
the C syntax being made to do it does something different than what C++
does for the same syntax makes me inclined not to have that syntax for this
as-if static purpose.

TL;DR: I think {} should be better aligned to the status quo for C++. I
think static initialization wording should guarantee what people expect and
implementations provide. I think the security scenarios mentioned where {}
having the "as-if static" behavior is said to help should be addressed by a
different solution. I do realize that (with what I am suggesting) only the
first sentence ties closely with the paper; however, I hope you don't mind
also following up on the second since our discussion on that topic so far
has been productive. I am up for discussing the third with you if you wish
to explore it further.

> Sincerely,
> JeanHeyd
>
>>

Received on 2022-03-04 06:03:23