Date: Fri, 16 Jan 2026 16:13:42 +0100
I think the only solution is to create or use wider types, where the padding bits are part of the value representation.
-----Ursprüngliche Nachricht-----
Von:Jan Schultke via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Fr 16.01.2026 16:18
Betreff:Re: [std-proposals] Fixing std::bit_cast padding bit issues
An:std-proposals_at_[hidden];
CC:Jan Schultke <janschultke_at_[hidden]>;
On Fri, 16 Jan 2026 at 15:55, Andrey Semashev via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote:
On 16 Jan 2026 17:20, Jan Schultke via Std-Proposals wrote:
> > I think some kind of "pad to zero" concept could have uses
> beyond this
> > case, and be useful in its own right. You could have :
> >
> > T padded_to_zero(const T& x)
> >
> > that would return a copy of x, where the all padding bits and
> bytes are
> > set to zero. And you could have :
> >
> > pad_to_zero(T& x)
> >
> > that would zero out the padding on an existing object.
> >
> >
> > That does not make any sense in the C++ object model. Padding bits
> are
> > bits that are not in the value representation, and they don't get
> > copied/preserved when values are copied.
>
> They are not copied or preserved with most copies, but they /are/
> accessible via things like memcpy and access using character or
> std::byte pointers. So sometimes they are relevant despite not being
> part of the value of the object.
>
> You can copy them with memcpy (only at run-time), but examining the
> value results in undefined behavior.
This should not be the case. I believe, the reason for it being an UB
was that integers used to allow trap values, which are no longer
allowed. Certainly, not for raw bytes.
The reason is also that granting the implementation the freedom to not copy around padding bytes is an optimization opportunity, and relying on the values of padding bytes is a massive (security-critical) bug anyway. There are still floating-point types where certain representations result in CPU traps when used in an operation, if you find that to be more convincing of a reason.
memcpy has well defined behavior, regardless of how it's implemented. If
the program says memcpy then the source is copied as is to the target,
with padding bits and everything. I do not see a fundamental reason why
then the target should not be allowed to be used.
Because the source could also not be used. For all intents and purposes, padding bits are eternally indeterminate quantum bits, and querying their value results in your CPU exploding.
And BTW, passing data via registers is not "doing god knows what", as
the data is stored in registers as is.
It is doing god knows what. When a type has four value bytes and four padding bytes, moving it around may be done via 32-bit register move. Whether that preserves the upper 32 bits or wipes them (like on x86) is a CPU architecture implementation detail, and not anything you can portably rely on.
Whether the compiler generates code that does something other than just moving the data, is irrelevant
because it is not allowed to modify the value representation as part of memcpy, and padding bits in the source are unspecified anyway.
Padding bits are not part of the value representation, and the implementation is free to only copy some of the bytes with memcpy because there exists no way of observing the values of padding bytes anyway. That is, under as-if rule optimization.
> It's also reasonable to expect the compiler to spot a branch based on
> indeterminate bits within memcmp and optimize your code away entirely.
> It's also reasonable to expect UBSan to terminate your program when you
> memcmp a padded type, or for the compiler to treat such a call as
> equivalent to std::unreachable() and not even emit IR for it.
You are talking about tools that implement the standard. The tools will
be updated once the standard changes, which is what we're discussing here.
Yes, but it's entirely unrealistic to expect implementations to make massive changes to how they handle padding bits, seeing that the current model works reasonably well. To my understanding, LLVM does not even have an object representation for objects during constant evaluation; it is generated on the fly when you std::bit_cast.
> There is even the problem of types like _BitInt(3) where the padding
> doesn't nicely bundle into whole bytes. It is unclear how those could be
> handled by pad_to_zero, other than to bit-cast and zero the padding
> bits, which is exactly what I'm proposing.
Padding bits are padding bits, whether they span the whole byte or not.
Thus, clear_padding/pad_to_zero would mask them to zero (e.g. with a
bitwise AND instruction).
That's just undefined behavior, and it probably should stay undefined behavior.
Internally, the result of reading indeterminate/padding bits may be an LLVM poison value, and to my understanding, multiplying with zero or doing a bitwise AND with zero does not "unpoison" values. That is what you're conceptually asking for though.
In general, I cannot empathize with this desire to manipulate and query padding bits. The current object model treats them as almost not existing, and it's a model that enables numerous optimizations and allows for UBSan to protect people from their own stupidity and from writing security-critical bugs.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2026-01-16 15:29:44
