> I think some kind of "pad to zero" concept could have uses beyond this
> case, and be useful in its own right. You could have :
>
> T padded_to_zero(const T& x)
>
> that would return a copy of x, where the all padding bits and bytes are
> set to zero. And you could have :
>
> pad_to_zero(T& x)
>
> that would zero out the padding on an existing object.
>
>
> That does not make any sense in the C++ object model. Padding bits are
> bits that are not in the value representation, and they don't get
> copied/preserved when values are copied.

They are not copied or preserved with most copies, but they /are/
accessible via things like memcpy and access using character or
std::byte pointers. So sometimes they are relevant despite not being
part of the value of the object.

You can copy them with memcpy (only at run-time), but examining the value results in undefined behavior.

> If you store the result of
> padded_to_zero in a variable and copy that variable into a second one,
> the padding bits of these variables may differ.

Yes - /if/ you copy the object as a value, rather than the underlying
representation.

Suppose we have a structure with padding :

struct S { uint8_t a; uint32_t c; };

Assuming common alignments (and ignoring any other implementation
dependent details for simplicity), there be 3 bytes of padding between
"a" and "c" - somewhat akin to a hidden field "unsigned char b[3];".

If you create an object S s1, then it's "b" field is unspecified. If
you write "S s2 = s1;", then s2's "b" field is also unspecified, and you
have no reason to suppose that s2.b and s1.b would be equal (even if it
made sense to compare them). "memcmp(&s1, &s2, sizeof(S))" is not
guaranteed to give 0.

However, if you write "memcpy(&s2, &s1, sizeof(S));", then I think it is
reasonable to expect "memcmp(&s1, &s2, sizeof(S))" to give 0 - even
though the padding bytes are still unspecified - because you have copied
a block of bytes, not values of type S.

Am I correct to expect this? Or do all guarantees or information about
the padding bytes "disappear" as soon as the memcpy() is complete?

You are expecting it because it "works in practice", but it's still undefined behavior. memcpy nowadays is a compiler intrinsic, and small copies like of eight bytes may be internally represented as just passing an i64 around, which does god knows what with padding bits when that i64 lives in a register.

It's also reasonable to expect the compiler to spot a branch based on indeterminate bits within memcmp and optimize your code away entirely. It's also reasonable to expect UBSan to terminate your program when you memcmp a padded type, or for the compiler to treat such a call as equivalent to std::unreachable() and not even emit IR for it.

When I write structs myself that would normally contain padding, and
where I want to have consistency (because transferring structs into and
out of code, hashes, and data serialisation is not uncommon), I add
explicit "padding" or "dummy" fields so that I know everything is
well-defined. But it would be much more convenient to be able to do
this kind of thing in a more automated manner. But if there is no way
to make sense of this within the C++ object model, so be it.

I can see how such a thing would be useful, although that seems like a separate problem to me. There is also the issue of fundamental types that contain padding, like _BitItn or long double, and those would need to be broken down into smaller types by any utility that splits your types into padding and non-padding bits.

There is even the problem of types like _BitInt(3) where the padding doesn't nicely bundle into whole bytes. It is unclear how those could be handled by pad_to_zero, other than to bit-cast and zero the padding bits, which is exactly what I'm proposing.