ISOCPP std-proposals List: Re: [std-proposals] CHAR

From: David Brown <david.brown_at_[hidden]>
Date: Thu, 17 Jul 2025 11:42:24 +0200

On 17/07/2025 02:30, Thiago Macieira via Std-Proposals wrote:
> On Wednesday, 16 July 2025 15:05:17 Pacific Daylight Time Frederick Virchanza
> Gotham via Std-Proposals wrote:
>> On Wed, Jul 16, 2025 at 12:09 PM Jan Schultke wrote:
>>> I don't see how a separate octet type would help. It couldn't actually
>>> be narrower than a byte anyway; it could just have some padding bits.
>>> The strategy on non-8-bit-platforms is to just leave some bits unused
>>> for operations that use char/unsigned char, and that seems fine for
>>> networking too. You don't actually need an octet type.
>>
>> You're allowed to have padding bits in every unsigned integer type
>> other than unsigned char. Unsigned char must have zero padding bits.
>
> I actually can't find that in the C++ standard. It is in the C one, though:
>
> "6.2.6.2 Integer types
>
> For unsigned integer types other than unsigned char, the bits of the object
> representation shall be divided into two groups: value bits and padding bits
> (there need not be any of the latter)."
>

In the C++ standard (I happen to have C++20 open at the moment, but I
don't believe this has changed) under "6.8 Types [basic.types]", we have:

"""
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object of type T is the set of
bits that participate in representing a value of type T. Bits in the
object representation that are not part of the value representation are
padding bits. For trivially copyable types, the value representation is
a set of bits in the object representation that determines a value,
which is one discrete element of an implementation-defined set of values.38

38) The intent is that the memory model of C++ is compatible with that
of ISO/IEC 9899 Programming Language C.
"""

Just after that, in the discussion of fundamental types, it says "[Note:
The signed and unsigned integer types satisfy the constraints given in
ISO C 5.2.4.2.1. — end note]".

So Frederick's one sentence summary is accurate. But the C++ standards
are, IMHO, obtuse and sloppy here. It's fine to refer to an external
standard in a non-nominal footnote ("The intent is that this is like
C"), but inappropriate to do so when defining the C++ language -
especially when the C standard is a living document that comes in
different versions, where new versions supersede older ones. If, say,
C26 is published with a change that there can be no padding bits in any
standard integer types, then that would retroactively change all earlier
C++ standards in the same way. Ridiculous!

>> I remember 20 years ago, most people got the number of value bits in
>> an integer type by doing:
>>
>> CHAR_BIT * sizeof( unsigned_integer_type )
>>
>> But really you had to do:
>>
>> CHAR_BIT * IMAX_BITS( (unsigned_integer_type) -1 )
>>
>> just in case you're dealing with a 36-Bit int that has 4 padding bits.
>> But maybe those machines went out with the dinosaurs.
>
> In C++, that's just std::numeric_limits<T>::digits.
>
>> I always thought that the biggest thing about having a fluidic
>> CHAR_BIT was to accommodate supercomputers that only have 1 integer
>> type:
>>
>> CHAR_BIT == 64
>> 1 == sizeof(char) == sizeof(short) == sizeof(int) == sizeof(long) ==
>> sizeof(long long)
>
> Indeed.
>

I would disagree about this being "the biggest thing". Yes, the early
Cray-1 supercomputers had 64-bit char. No, they did not have 64-bit
"long long", because "long long" did not exist at the time (the Cray
machines were made between 1975 and into the start of the 1990's, while
"long long" is from C99).

AFAIK (I may be missing something) there have been no other
supercomputers with 64-bit char. For "big" systems, char sizes other
than 8-bit died out with the end of the dinosaur mainframe age - long
before C++ or C99 existed. Those systems were, however, the reason C
originally supported signed integer formats other than two's complement
(that is now gone from C and C++). While some of these mainframes still
exist (either in their original forms, or emulated), they are not
programmed in modern C or C++, and are thus not important for
consideration for changes to future standard versions.

The only targets that have CHAR_BIT > 8 that are possibly relevant are
DSPs and other very niche and application-specific embedded processors.
None of these are used with anything close to modern C++. But those are
far and away the biggest use of CHAR_BIT greater than 8.

Similarly, the biggest use of types with padding bits (other than
"bool") are microcontrollers with unusual internal sizes. For example,
I have used devices with 40-bit accumulators for specific functions, and
some msp430 processors have 20-bit registers (most operations on them
are 16-bit, but they support 20-bit address sizes). On those devices,
CHAR_BIT is 8, and if you want to load or store the full 20-bit
registers, you do so as 32-bit width with 12 padding bits. However, on
the C and C++ implementation, there are no 20-bit fundamental types.

So personally, I think it would be fine to have a future C++ standard
say that CHAR_BIT is always 8, that fundamental integer types are always
power-of-two multiples of 8, and that there are no padding bits. I
can't see that being a limitation for any realistic processor which has
any chance of ever being used with C++29 (or whatever).

Perhaps a compromise would be to say that hosted implementations require
8-bit char and no padding bits - only freestanding implementations could
be unusual here. Then most of the library could be written on the
assumption of being on byte-oriented systems.

Received on 2025-07-17 09:42:33