C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-lib-ext] Sending P1885R8 Naming Text Encodings to Demystify Them directly to electronic polling for C++23

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 14 Oct 2021 23:55:40 +0200
On 14/10/2021 23.32, Bryce Adelstein Lelbach aka wash via Lib-Ext wrote:
> Okay, LEWG, let's try this one again.
>
> P1885R8 Naming Text Encodings to Demystify Them has been once again
> sent to us by SG16.
>
> It has been reviewed multiple times by LEWG. The last time we saw it,
> we said we would send it to electronic polls, pending a revision that
> included answers to some questions raised during the reviews.
>
> I would like to see if we have support for sending P1885 directly to
> electronic polling for C++23. If you support this motion, please reply
> with a +1. If you do not support this motion and wish to spend more
> time discussing this paper in LEWG, please reply with -1.

The paper is missing a normative definition of "encoding scheme"
with particular attention to the fact that an octet is not a
C++ byte. From such a definition, I would hope to gain clarity
how UTF-16 should be handled on a platform with CHAR_BITS == 16.
(The definition in ISO 10646 is carefully crafted to apply only
to Unicode encodings.)

It feels that SUBSTITUTE_UTF_ENCODING(E) (or the surrounding
text) should say something about the handling of UCS2 and UCS4,
which IANA appears to define as big-endian. It seems we want
platform endianness (similar to UTF-16 and UTF-32) here, too.

Whether LWG will be happy to fix that on their level,
I don't know.

Jens

Received on 2021-10-14 16:55:46