std-proposals: Re: Restrict possible values of CHAR

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Mon, 28 Oct 2019 11:47:26 -0400

On Thu, Oct 24, 2019 at 7:51 PM Lyberta via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Arthur O'Dwyer via Std-Proposals:
> > Lyberta, did your survey turn up any C++ implementations where CHAR_BIT
> !=
> > 8? If so, what version of C++ were they — C++03, 11, 14, 17?
>
> Clang has been recently ported in 16 bit byte architecture:
>
>
> https://www.embecosm.com/2017/04/18/non-8-bit-char-support-in-clang-and-llvm/
>

Yes, but notice the final paragraph of that article:

One missing piece is a target to act as a test for this new behavior. At
Embecosm, we have been working on AAP
<https://www.embecosm.com/resources/appnotes/#EAN13> for just this purpose.
At the moment AAP has 8-bit byte addressed memory, however, the purpose of
the architecture is to work as a test case for interesting features, so in
order to support non 8-bit characters we are creating a version of the
architecture which is 16-bit word addressed.

That is: They did all this work to support 16-bit bytes in the abstract,
and then the only thing that was left to do was find an actual machine with
16-bit bytes. They had no real machine in mind for LLVM to target, so they
had to invent one. Except that the one they invented currently *also* has
8-bit bytes. They are in the process of modifying their contrived machine,
which they invented to serve as the only model of their contrived
abstraction.
(For "are", read "were"; the article is from April 2017.)

So it's possible that 16-bit-byte machines exist, but the Embecosm article
is not an example of any such machine.

I also wonder what such a machine would do with `char8_t`, or character
processing in general. (I mean, it could just waste half of the space in
each 16-bit word; but it seems like it would be easier to just store two
C++ "bytes" in each 16-bit word.)

–Arthur

Received on 2019-10-28 10:49:56