Date: Sun, 18 Jan 2026 12:07:40 +0100
On 18/01/2026 00:24, Thiago Macieira wrote:
> On Saturday, 17 January 2026 12:03:13 Pacific Standard Time David Brown wrote:
>> But the padding /can/ be observed - and not just via bit_cast. Every
>> object in C++ can be viewed as an array of unsigned char (or std::byte)
>> - by memcpy, or by casting a pointer to the object to a pointer to
>> unsigned char.
>>
>> And there are plenty of situations where you would want the padding to
>> be seen as zeros, or at least to be consistent. Taking a hash of some
>> kind is the obvious case - whether it be for use in a hash-map
>> structure, for integrity checking (CRC, md5sum), or for secure hashing.
>> You might also want it to be zeroed out before passing the object
>> outside the program - storing it in a file, or transmitting it on a
>> network. Again, it is often helpful for data to be consistent, and more
>> paranoid people might want to avoid any risk of leakage of unintended
>> data. Often it is most efficient and convenient to access the data here
>> using std::byte or unsigned char pointers - and that means the padding
>> is visible.
>
> Most of which require a bit_cast to a byte array in the first place.
>
Do they? People have been doing this sort of thing with memcpy and
unsigned char pointers from /long/ before C++20 bit_cast<>, both in C
and C++ (AFAIK C++ defers to the C standard for memcpy). bit_cast<>
might be a more modern way to deal with this kind of thing, and it will
likely be a better route towards consistent compile-time and run-time
behaviour, but accessing representations via unsigned char pointer has
always been fundamental to low-level work in C and C++. And the unknown
contents of padding has been a PITA all that time. Rightly or wrongly,
people treat coping the padding via character pointers as getting
unspecified data, not as leading to undefined behaviour.
> The only case that doesn't is using write()/send(), which is equivalent to
> memcpy() to permanent storage. That isn't known to be a problem these days
> because it's a known quantity and most of the issues that stem from it are
> known. Serialisation protocols are careful to store reproducible data without
> padding bits in the first place, instead of zeroing them - padding in storage
> is inefficient.
Of course for structs that you want to pass around, you aim to minimise
the padding - sometimes people use compiler-specific "packed" extensions
for the purpose. And some people, like me, add explicit "padding"
fields so that they are under full control. (And general serialisation
obviously can have unlimited additional complications - we are talking
about simple POD-style structs here.)
>
> That said, if there is a need, std::clear_padding() could be implemented by:
>
> using Tmp = std::array<unsigned char, sizeof(T)>;
> Tmp tmp = std::bit_cast<Tmp>(src); // or std::bit_cast_clear_padding
> src = std::bit_cast<T>(tmp);
>
> We can hope the compilers see this passage through a temporary and simply
> clear the bits.
Yes, and if bit_cast<> were defined to clear padding bits and bytes,
then I would be happy with that - precisely because such an
implementation would be possible. And for hashing or simple
serialisation, you'd only need the first half of that code.
It would be nice to have such functions in the standard, because that
would increase the likelihood of compilers optimising it well, but it's
not essential. The most important point, to me, is that it is essential
there is a way to access the representation of objects as raw bytes,
without concern that reading padding leads to UB. And it is extremely
useful to have a way to get at the raw bytes with any padding set to
zero. Ideally, this should be possible by zeroing padding in an
existing object, rather than just by copying, for efficiency reasons.
(Doing it by copying also needs to be possible, in the case of a const
original object.)
>
>> Converting a type into a similar type but with the padding bits and
>> bytes visible and zeroed out is, I think, an interesting idea - but I
>> fear it would be very difficult to specify accurately. This new type
>> could pick up the data fields of the original type, but what about its
>> methods, static data, other functions with overloads, its ancestors and
>> descendants, and anything else relevant to the type? I think all you
>> could reasonably do is convert to an array of unsigned char (or
>> std::byte) of appropriate size.
>
> That's entirely out-of-scope and the boat has already sailed on that, once
> std::bit_cast was created.
>
>
> On Saturday, 17 January 2026 12:03:13 Pacific Standard Time David Brown wrote:
>> But the padding /can/ be observed - and not just via bit_cast. Every
>> object in C++ can be viewed as an array of unsigned char (or std::byte)
>> - by memcpy, or by casting a pointer to the object to a pointer to
>> unsigned char.
>>
>> And there are plenty of situations where you would want the padding to
>> be seen as zeros, or at least to be consistent. Taking a hash of some
>> kind is the obvious case - whether it be for use in a hash-map
>> structure, for integrity checking (CRC, md5sum), or for secure hashing.
>> You might also want it to be zeroed out before passing the object
>> outside the program - storing it in a file, or transmitting it on a
>> network. Again, it is often helpful for data to be consistent, and more
>> paranoid people might want to avoid any risk of leakage of unintended
>> data. Often it is most efficient and convenient to access the data here
>> using std::byte or unsigned char pointers - and that means the padding
>> is visible.
>
> Most of which require a bit_cast to a byte array in the first place.
>
Do they? People have been doing this sort of thing with memcpy and
unsigned char pointers from /long/ before C++20 bit_cast<>, both in C
and C++ (AFAIK C++ defers to the C standard for memcpy). bit_cast<>
might be a more modern way to deal with this kind of thing, and it will
likely be a better route towards consistent compile-time and run-time
behaviour, but accessing representations via unsigned char pointer has
always been fundamental to low-level work in C and C++. And the unknown
contents of padding has been a PITA all that time. Rightly or wrongly,
people treat coping the padding via character pointers as getting
unspecified data, not as leading to undefined behaviour.
> The only case that doesn't is using write()/send(), which is equivalent to
> memcpy() to permanent storage. That isn't known to be a problem these days
> because it's a known quantity and most of the issues that stem from it are
> known. Serialisation protocols are careful to store reproducible data without
> padding bits in the first place, instead of zeroing them - padding in storage
> is inefficient.
Of course for structs that you want to pass around, you aim to minimise
the padding - sometimes people use compiler-specific "packed" extensions
for the purpose. And some people, like me, add explicit "padding"
fields so that they are under full control. (And general serialisation
obviously can have unlimited additional complications - we are talking
about simple POD-style structs here.)
>
> That said, if there is a need, std::clear_padding() could be implemented by:
>
> using Tmp = std::array<unsigned char, sizeof(T)>;
> Tmp tmp = std::bit_cast<Tmp>(src); // or std::bit_cast_clear_padding
> src = std::bit_cast<T>(tmp);
>
> We can hope the compilers see this passage through a temporary and simply
> clear the bits.
Yes, and if bit_cast<> were defined to clear padding bits and bytes,
then I would be happy with that - precisely because such an
implementation would be possible. And for hashing or simple
serialisation, you'd only need the first half of that code.
It would be nice to have such functions in the standard, because that
would increase the likelihood of compilers optimising it well, but it's
not essential. The most important point, to me, is that it is essential
there is a way to access the representation of objects as raw bytes,
without concern that reading padding leads to UB. And it is extremely
useful to have a way to get at the raw bytes with any padding set to
zero. Ideally, this should be possible by zeroing padding in an
existing object, rather than just by copying, for efficiency reasons.
(Doing it by copying also needs to be possible, in the case of a const
original object.)
>
>> Converting a type into a similar type but with the padding bits and
>> bytes visible and zeroed out is, I think, an interesting idea - but I
>> fear it would be very difficult to specify accurately. This new type
>> could pick up the data fields of the original type, but what about its
>> methods, static data, other functions with overloads, its ancestors and
>> descendants, and anything else relevant to the type? I think all you
>> could reasonably do is convert to an array of unsigned char (or
>> std::byte) of appropriate size.
>
> That's entirely out-of-scope and the boat has already sailed on that, once
> std::bit_cast was created.
>
>
Received on 2026-01-18 11:07:44
