C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Fixing std::bit_cast padding bit issues

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Fri, 16 Jan 2026 15:06:44 +0100
There is no guarantee that memcpy, memcmp and so on, act on actual memory. (And this is even more true for padding bits or bytes)   They can act on registers or even just be considered by the compiler for the program flow:   E.g. if you have an int i and memcpy to int j and then use j only for deciding (if) between executing two program blocks, the actual copy may never happen. It is optimized away.   -----Ursprüngliche Nachricht----- Von:David Brown via Std-Proposals <std-proposals_at_[hidden]> Gesendet:Fr 16.01.2026 14:37 Betreff:Re: [std-proposals] Fixing std::bit_cast padding bit issues An:Jan Schultke <janschultke_at_[hidden]>; CC:David Brown <david.brown_at_[hidden]>; std-proposals_at_[hidden]; On 16/01/2026 13:42, Jan Schultke wrote: > >     As I understand it (and I am not sure here, and happy to be corrected), >     padding has unspecified values, but when you copy them with bit_cast<> >     you get indeterminate bits.  And then if those bits are read, you >     have UB. > > > Yes, that's correct. > Thanks.  These details are not always easy to get entirely correct, so I appreciate the confirmation. >     Could it be possible to change bit_cast<> so that these copied bits >     remain unspecified?  I believe an aim of the way bit_cast<> is defined >     is to let it work without having to copy any padding (bits or bytes), >     thus the returned value could have indeterminate bits.  If it is >     possible to re-classify these bits as unspecified rather than >     indeterminate, without loss of efficiency, then that would be an >     improvement, I think.  (There may be some hardware platforms that track >     determinate / indeterminate status of data.) > > > The obvious problem is constant evaluation. In the aforementioned case > of bit-casting long double to __int128, you don't want to end up with > non-deterministic integer bits at compile time. We could say that you > don't have a constant expression in the degenerate case, or that the > bits are guaranteed to be zero during constant evaluation, but any > solution here creates some deviation in behavior between run-time and > constant evaluation. > I agree - consistency is important here. >     I think some kind of "pad to zero" concept could have uses beyond this >     case, and be useful in its own right.  You could have : > >              T padded_to_zero(const T& x) > >     that would return a copy of x, where the all padding bits and bytes are >     set to zero.  And you could have : > >              pad_to_zero(T& x) > >     that would zero out the padding on an existing object. > > > That does not make any sense in the C++ object model. Padding bits are > bits that are not in the value representation, and they don't get > copied/preserved when values are copied. They are not copied or preserved with most copies, but they /are/ accessible via things like memcpy and access using character or std::byte pointers.  So sometimes they are relevant despite not being part of the value of the object. > If you store the result of > padded_to_zero in a variable and copy that variable into a second one, > the padding bits of these variables may differ. Yes - /if/ you copy the object as a value, rather than the underlying representation. Suppose we have a structure with padding : struct S { uint8_t a; uint32_t c; }; Assuming common alignments (and ignoring any other implementation dependent details for simplicity), there be 3 bytes of padding between "a" and "c" - somewhat akin to a hidden field "unsigned char b[3];". If you create an object S s1, then it's "b" field is unspecified.  If you write "S s2 = s1;", then s2's "b" field is also unspecified, and you have no reason to suppose that s2.b and s1.b would be equal (even if it made sense to compare them).  "memcmp(&s1, &s2, sizeof(S))" is not guaranteed to give 0. However, if you write "memcpy(&s2, &s1, sizeof(S));", then I think it is reasonable to expect "memcmp(&s1, &s2, sizeof(S))" to give 0 - even though the padding bytes are still unspecified - because you have copied a block of bytes, not values of type S. Am I correct to expect this?  Or do all guarantees or information about the padding bytes "disappear" as soon as the memcpy() is complete? So if we have "S s1" and write "pad_to_zero(s1)", the result will be the same as if we had written "memset((unsigned char*) &s1 + 1, 0, 3)" to zero out the hidden "b" field. If we then write "S s2 = s1;", then of course the padding bytes could be different. But if we write "uint32_t x = sha256((const unsigned char*) &s1, sizeof(S));", then the function that is accessing the underlying storage via a character pointer (or as std::byte) would see the struct with zeros in the padding bytes. > It's not even clear how > you can store the result of padded_to_zero in a variable in a way that > keeps padding bits, considering that the implementation does not copy > the padding bits from the function's result value into the variable at > this time. > > For all intents and purposes, padding bits do not exist, and you cannot > manipulate them or examine their value. > >     The key point with these would be that you would have a consistent >     underlying representation of the object, for use with things like >     hashing, digital signatures, memcmp-style comparisons, etc. > > > Using memcmp on a type with padding bits always results in undefined > behavior, and I would not want it to be any other way. > When I write structs myself that would normally contain padding, and where I want to have consistency (because transferring structs into and out of code, hashes, and data serialisation is not uncommon), I add explicit "padding" or "dummy" fields so that I know everything is well-defined.  But it would be much more convenient to be able to do this kind of thing in a more automated manner.  But if there is no way to make sense of this within the C++ object model, so be it. >     But it would not solve the indeterminate bits in bit_cast<>, without >     changes to the specification of bit_cast<>.  padded_to_zero(x) would >     still have the same padding bits and bytes, and these would still be >     indeterminate after the bit_cast<>. > > > Yes, we need to make changes to bit_cast to deal with these issues. > I can certainly appreciate your desired changes to bit_cast, whether or not any kind of "pad_to_zero" or "padded_to_zero" could exist. -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2026-01-16 14:22:46