Date: Fri, 16 Jan 2026 17:55:42 +0300
On 16 Jan 2026 17:20, Jan Schultke via Std-Proposals wrote:
> > I think some kind of "pad to zero" concept could have uses
> beyond this
> > case, and be useful in its own right. You could have :
> >
> > T padded_to_zero(const T& x)
> >
> > that would return a copy of x, where the all padding bits and
> bytes are
> > set to zero. And you could have :
> >
> > pad_to_zero(T& x)
> >
> > that would zero out the padding on an existing object.
> >
> >
> > That does not make any sense in the C++ object model. Padding bits
> are
> > bits that are not in the value representation, and they don't get
> > copied/preserved when values are copied.
>
> They are not copied or preserved with most copies, but they /are/
> accessible via things like memcpy and access using character or
> std::byte pointers. So sometimes they are relevant despite not being
> part of the value of the object.
>
> You can copy them with memcpy (only at run-time), but examining the
> value results in undefined behavior.
This should not be the case. I believe, the reason for it being an UB
was that integers used to allow trap values, which are no longer
allowed. Certainly, not for raw bytes.
> > If you store the result of
> > padded_to_zero in a variable and copy that variable into a second
> one,
> > the padding bits of these variables may differ.
>
> Yes - /if/ you copy the object as a value, rather than the underlying
> representation.
>
> Suppose we have a structure with padding :
>
> struct S { uint8_t a; uint32_t c; };
>
> Assuming common alignments (and ignoring any other implementation
> dependent details for simplicity), there be 3 bytes of padding between
> "a" and "c" - somewhat akin to a hidden field "unsigned char b[3];".
>
> If you create an object S s1, then it's "b" field is unspecified. If
> you write "S s2 = s1;", then s2's "b" field is also unspecified, and
> you
> have no reason to suppose that s2.b and s1.b would be equal (even if it
> made sense to compare them). "memcmp(&s1, &s2, sizeof(S))" is not
> guaranteed to give 0.
>
> However, if you write "memcpy(&s2, &s1, sizeof(S));", then I think
> it is
> reasonable to expect "memcmp(&s1, &s2, sizeof(S))" to give 0 - even
> though the padding bytes are still unspecified - because you have
> copied
> a block of bytes, not values of type S.
>
> Am I correct to expect this? Or do all guarantees or information about
> the padding bytes "disappear" as soon as the memcpy() is complete?
>
> You are expecting it because it "works in practice", but it's still
> undefined behavior. memcpy nowadays is a compiler intrinsic, and small
> copies like of eight bytes may be internally represented as just passing
> an i64 around, which does god knows what with padding bits when that i64
> lives in a register.
memcpy has well defined behavior, regardless of how it's implemented. If
the program says memcpy then the source is copied as is to the target,
with padding bits and everything. I do not see a fundamental reason why
then the target should not be allowed to be used. It may have
unspecified value where the padding bits mapped to the value
representation bits, but that isn't the reason to flag the rest bits or
the program undefined.
And BTW, passing data via registers is not "doing god knows what", as
the data is stored in registers as is. Whether the compiler generates
code that does something other than just moving the data, is irrelevant
because it is not allowed to modify the value representation as part of
memcpy, and padding bits in the source are unspecified anyway.
> It's also reasonable to expect the compiler to spot a branch based on
> indeterminate bits within memcmp and optimize your code away entirely.
> It's also reasonable to expect UBSan to terminate your program when you
> memcmp a padded type, or for the compiler to treat such a call as
> equivalent to std::unreachable() and not even emit IR for it.
You are talking about tools that implement the standard. The tools will
be updated once the standard changes, which is what we're discussing here.
> There is even the problem of types like _BitInt(3) where the padding
> doesn't nicely bundle into whole bytes. It is unclear how those could be
> handled by pad_to_zero, other than to bit-cast and zero the padding
> bits, which is exactly what I'm proposing.
Padding bits are padding bits, whether they span the whole byte or not.
Thus, clear_padding/pad_to_zero would mask them to zero (e.g. with a
bitwise AND instruction).
> > I think some kind of "pad to zero" concept could have uses
> beyond this
> > case, and be useful in its own right. You could have :
> >
> > T padded_to_zero(const T& x)
> >
> > that would return a copy of x, where the all padding bits and
> bytes are
> > set to zero. And you could have :
> >
> > pad_to_zero(T& x)
> >
> > that would zero out the padding on an existing object.
> >
> >
> > That does not make any sense in the C++ object model. Padding bits
> are
> > bits that are not in the value representation, and they don't get
> > copied/preserved when values are copied.
>
> They are not copied or preserved with most copies, but they /are/
> accessible via things like memcpy and access using character or
> std::byte pointers. So sometimes they are relevant despite not being
> part of the value of the object.
>
> You can copy them with memcpy (only at run-time), but examining the
> value results in undefined behavior.
This should not be the case. I believe, the reason for it being an UB
was that integers used to allow trap values, which are no longer
allowed. Certainly, not for raw bytes.
> > If you store the result of
> > padded_to_zero in a variable and copy that variable into a second
> one,
> > the padding bits of these variables may differ.
>
> Yes - /if/ you copy the object as a value, rather than the underlying
> representation.
>
> Suppose we have a structure with padding :
>
> struct S { uint8_t a; uint32_t c; };
>
> Assuming common alignments (and ignoring any other implementation
> dependent details for simplicity), there be 3 bytes of padding between
> "a" and "c" - somewhat akin to a hidden field "unsigned char b[3];".
>
> If you create an object S s1, then it's "b" field is unspecified. If
> you write "S s2 = s1;", then s2's "b" field is also unspecified, and
> you
> have no reason to suppose that s2.b and s1.b would be equal (even if it
> made sense to compare them). "memcmp(&s1, &s2, sizeof(S))" is not
> guaranteed to give 0.
>
> However, if you write "memcpy(&s2, &s1, sizeof(S));", then I think
> it is
> reasonable to expect "memcmp(&s1, &s2, sizeof(S))" to give 0 - even
> though the padding bytes are still unspecified - because you have
> copied
> a block of bytes, not values of type S.
>
> Am I correct to expect this? Or do all guarantees or information about
> the padding bytes "disappear" as soon as the memcpy() is complete?
>
> You are expecting it because it "works in practice", but it's still
> undefined behavior. memcpy nowadays is a compiler intrinsic, and small
> copies like of eight bytes may be internally represented as just passing
> an i64 around, which does god knows what with padding bits when that i64
> lives in a register.
memcpy has well defined behavior, regardless of how it's implemented. If
the program says memcpy then the source is copied as is to the target,
with padding bits and everything. I do not see a fundamental reason why
then the target should not be allowed to be used. It may have
unspecified value where the padding bits mapped to the value
representation bits, but that isn't the reason to flag the rest bits or
the program undefined.
And BTW, passing data via registers is not "doing god knows what", as
the data is stored in registers as is. Whether the compiler generates
code that does something other than just moving the data, is irrelevant
because it is not allowed to modify the value representation as part of
memcpy, and padding bits in the source are unspecified anyway.
> It's also reasonable to expect the compiler to spot a branch based on
> indeterminate bits within memcmp and optimize your code away entirely.
> It's also reasonable to expect UBSan to terminate your program when you
> memcmp a padded type, or for the compiler to treat such a call as
> equivalent to std::unreachable() and not even emit IR for it.
You are talking about tools that implement the standard. The tools will
be updated once the standard changes, which is what we're discussing here.
> There is even the problem of types like _BitInt(3) where the padding
> doesn't nicely bundle into whole bytes. It is unclear how those could be
> handled by pad_to_zero, other than to bit-cast and zero the padding
> bits, which is exactly what I'm proposing.
Padding bits are padding bits, whether they span the whole byte or not.
Thus, clear_padding/pad_to_zero would mask them to zero (e.g. with a
bitwise AND instruction).
Received on 2026-01-16 14:55:46
