On Thu, Mar 3, 2022 at 7:23 PM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
Dear Hubert,

On Thu, Mar 3, 2022 at 4:53 PM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

The net of this is that we're getting initialization for the first member and trailing padding beyond the largest union member for alignment purposes because "as if static storage duration" does not provide as many guarantees as people think... (not so much of a problem for structs without nested unions).

I might need to open a defect for Clang's `-ftrivial-auto-var-init=pattern` to properly restrict the padding that is zeroed, but I really think the situation is a bit nonsensical.

     I agree that the situation is non-ideal. However, I did do my best to represent your concerns to the Committee. In particular, I provided Optional Change 0 as part of the proposal. It was Optional and not part of the core proposal (therefore, it was voted on separately). The reason is because the wording in the previous revision had attempted to mandate it, and **most** individuals in the C Committee were miffed by such a departure from how unions were normally treated: it has always been first-member-init.

I agree that Optional Change 0 is one way of approaching the gap. It sounds from what you describe above that the primary objection to it was how it does so. An alternative method would be to adjust the static initialization wording (with WG 21 coordination) to clarify that (a) the representation of the value converted from 0 that is chosen for each scalar type is consistent for all such initialization of an object of that type, (b) all padding bits between members of a structure and all trailing padding in a structure is zeroed, and (c) all bytes of a union that do not overlap with the object representation of the first member of the union are zeroed.
 

      In an attempt to alleviate their concerns, I wrote Optional Change 0 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm#wording-6.7.9p10-extra) as follows:

> …

> — if it is a union and the initializer is the empty initializer, the largest member is initialized (recursively) according to these rules, then the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
> — if it is a union and the initializer is not an empty initializer , the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
> …

      We took a vote on it. It did not pass:

> No consensus to initialize whole union then largest member for {} (N2900) into C23.

     Again, I sincerely tried to make what you had pointed out as part of the semantics so that unions would be usefully included. I think folks in WG14 would be okay if there was a separate syntax that deliberately initialized "as-if everything including padding bits are static-init, and then performed the correct initialization of the proper members". I think if you presented the security benefits to them and gave them that alternative syntax, they would like it. I think that implementing it in a handful of compilers and advertising it would make it easy to get into the next revision of C, and consequently C++ (both Committees need/want something of this nature).

That would be a separate feature though. It is also perhaps less likely to be expressed as a modification of the initialization than it is to be as a modifier of the type because assignment has similar problems (e.g., when assigning to "fresh" heap-allocated memory).
 

     I understand it is not ideal to have the opt-in syntax be the secure syntax. I can't really change C or C++'s momentum in this direction, though. Many members are adamant we do not change the semantics of existing initialization, and stick to existing principles, no matter how dangerous or no matter how proven they are to leak information/cause problems.

Perhaps the reaction was for the scope of the change implied by Optional Change 0, which I agree is scary if the union's first member disagreed with its largest member as to what the byte values in the object representation needs to be. A wording change that reinforces what people probably are assuming about static initialization might get a better reception.
 


It seems the goal was achieved for structs (or mostly so) and not achieved for unions. It is still a question (because "any padding" may refer to "padding of the aggregate/union" instead of "padding within") whether integer subobjects may have non-zero padding bits from such an initialization.

     My interpretation is that it was "padding of the aggregate/union", but bits inside (e.g., for an integer) *could* be non-zero (since the static init rules say it is the value of literal 0 cast to the integer type, and if that literal zero cast to the integer type produces an integer that has non-zero padding bits, that's fine because the value-representation bits are already in the proper configuration to presented the casted-literal-0).

     I think clarifying that all the padding bits within the structure, or the first member of the union, or within any arithmetic or pointer object, should be 0 when doing static initialization. I think that is in the spirit of what the wording is trying to achieve. I do not know if they were considering the padding bits of integers when they added the padding bit fixes for structures/unions in C99 for static initialization. This does become more important where things like _Bool, which by-default have padding bits because the value representation only takes 1 bit and _Bool will occupy at least 8 bits.

     I should ask some of the older WG14 members if they remember what was going on at the time for C99 and that change. Maybe they did not add the clause to arithmetic and pointer types in C99 because they just did not consider arithmetic types with padding bits as something that could be real/important. Or maybe they believed that no architecture would interpret "initialized from conversion from literal 0" as giving padding bits.

Aside from the consistency suggested by my change suggestion earlier, the line between being a goal and a non-goal gets fuzzy past asking for the all-bits zero representation if that is a representation of the value converted from 0.
 
 

It is compatible in that a user only relying on the C++ semantics is fine. It is not compatible in the way where "correct" C code is also "correct" C++ code.

     Agreed, but this is beyond the scope of my work. I could ask C++ if they were interested in such a change (padding bits are zero), and the benefits it could bring to aggregates (though, perhaps not to unions, as you detailed previously). I think it would be a positive change. Alternatively, we could revert what C has done and specifically exempt = {} from initializing padding bits. It would worsen security for structures and for people who put largest-member-first in their unions. I think that would be a negative change, but the positive is that both languages would be equally "too bad for you, security person". (Which doesn't sound like a benefit, but is at least some form of "equal is always fair", and therefore dependable.)

Suggesting such a change for C++ is likely going to produce objections due to reductions in efficiency for code that is "correct" today.

I would not consider this {} behaviour a solution to a "sanitized" handoff to a different security context scenario anyway. Even assigning the "right values" after such an initialization is not fool-proof. It's perfectly conforming to omit zeroing of padding bytes if an assignment to a member (which causes padding bytes to take unspecified values) occurs immediately after. It's also perfectly conforming to copy data adjacent to the source (that is within the same memory allocation) into the padding.

I think the pragmatic approach would be to make {} in C perform (for other than static or thread storage duration) what C++ does. `{ static }` or something as new syntax might be appropriate for saying "as if initialized with 'zero padding'"; however, someone who needs that is probably doing something suspect (wants security but not doing enough to have security or is reading from an inactive member of a union). A false sense of security is probably harmful to actual security.

My thinking on the subject of whether a convenient syntax for getting the as-if static storage duration initialization may have changed since I first engaged with you (a year or more ago) regarding this paper. The fact that the C syntax being made to do it does something different than what C++ does for the same syntax makes me inclined not to have that syntax for this as-if static purpose.

TL;DR: I think {} should be better aligned to the status quo for C++. I think static initialization wording should guarantee what people expect and implementations provide. I think the security scenarios mentioned where {} having the "as-if static" behavior is said to help should be addressed by a different solution. I do realize that (with what I am suggesting) only the first sentence ties closely with the paper; however, I hope you don't mind also following up on the second since our discussion on that topic so far has been productive. I am up for discussing the third with you if you wish to explore it further.


Sincerely,
JeanHeyd