C++ Logo

sg16

Advanced search

Re: Comments on P2513R0 char8_t Compatibility and Portability Fixes

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 24 Jan 2022 11:03:47 +0100
The paper makes an excellent case for

unsigned char foo [] = u8""; and char foo [] = u8"";

However, the paper does give no justification for signed char. UTF-8 code
units are numbers between [0-255]. As such the paper is proposing a
conversion from unsigned char to char and does not specify how
these signed char elements should be initialized in cases of overflow.

An important point was that u8 literals would always be valid, because all
codepoints in the sequence would have representation in the storage. And
this is the case in phase 5.
Neither

   - We will make the program ill-formed if there is an overflow (in effect
   only allowing ASCII)
   - We will just copy the bits over and now there are negative utf8 code
   units

Seem satisfactory solution.

And while it is easy to find many examples to motivate the paper in
general, the case for signed char isn't motivated *at all*.

I would very much be in the paper if the wording was changed to

> Additionally, an array of char or unsigned char may be initialized by a
UTF-8 string literal, or by such a string literal enclosed in braces

Thanks,

Corentin




On Sat, Jan 22, 2022 at 10:31 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> Hi,
>
> Here are my comments:
>
> - Section 3.2, fifth to last word should not have an apostrophe.
>
> - The title promises "fixes", but I can see only a single fix in the
> wording: Allow initialization of an ordinary character array with
> a UTF-8 string literal. Where are the several fixes?
>
> - Wording:
>
> "Additionally, an array of ordinary character type may be initialized by a
> UTF-8
> string literal, or by a char8_t-typed string-literal enclosed in braces."
>
> I agree that "may" (giving permission) is the better verb here compared to
> "can" in the preceding, existing text.
>
> However, we discuss here "UTF-8 string literal", and a few words later we
> talk
> about a "char8_t-typed string-literal". Is there any intended difference
> between
> these? If so, I need help in seeing the difference. If not, just say
> ", or by such a string literal enclosed in braces."
>
> Jens
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-01-24 10:03:59