C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-lib-ext] Sending P1885R8 Naming Text Encodings to Demystify Them directly to electronic polling for C++23

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 15 Oct 2021 15:20:57 +0200
On Fri, Oct 15, 2021 at 1:14 PM Ville Voutilainen <
ville.voutilainen_at_[hidden]> wrote:

> On Fri, 15 Oct 2021 at 13:55, Jens Maurer via Lib-Ext
> <lib-ext_at_[hidden]> wrote:
> >
> > On 15/10/2021 12.41, Bryce Adelstein Lelbach aka wash wrote:
> > > Jens, this does not sound like a library design matter.
> > >
> > > Can we please stop holding this paper up in LEWG unless there are
> library design questions?
> > > If there are questions about the specifics of wording or text/Unicode
> details, there are groups that can deal with that (LWG and SG16).
> > > Just because LEWG says we approve this paper does not mean it
> automatically goes into the standard, it just means we are happy with the
> library design.
> >
> > I am raising concerns I have about the current state of the paper.
> >
> > If the chair of LEWG deems those concerns not to be relevant at the
> > level of LEWG, I'm fine with that, and I'll raise them again in LWG
> > and/or plenary, as need be.
>
> The following bit has a design question in it:
>
> > > The paper is missing a normative definition of "encoding scheme"
> > > with particular attention to the fact that an octet is not a
> > > C++ byte. From such a definition, I would hope to gain clarity
> > > how UTF-16 should be handled on a platform with CHAR_BITS == 16.
>

I was not expecting this to be up for polling this week, and I have limited
time,
but the intent of SG16 was made clear. I asked SG16 last time if they had
further concerns with the design and they did not.
The intent/model chosen is "can be reinterpreted to char*, fed to iconv and
iconv will do something sensible.
Unfortunately that does not help with the char_bits=16 case, as we do not
have existing practices with text library on systems with char_bits !=8.
The definition of encoding scheme used is independent of char_bits, as long
as the bit pattern of the object representation is consistent with the
specification of an encoding.

Jens, did you raise that question at the last SG16 meeting?

The last line is a design question. SG16 may be better-equipped to
> process it than LEWG, of course,
> but while such questions stand, the paper should not be forwarded to a
> stage that verifies that the specification
> matches the intent, when we don't know the intent.
>
> > > It feels that SUBSTITUTE_UTF_ENCODING(E) (or the surrounding
> > > text) should say something about the handling of UCS2 and UCS4,
> > > which IANA appears to define as big-endian. It seems we want
> > > platform endianness (similar to UTF-16 and UTF-32) here, too
>

No, this is a wording matter.
We established that we want the platform endianness in SG16, independently
of what IANA says.
I'm not sure LEWG wants to revisit that decision.
I was planning this would get discussed at an upcoming LEWG meeting, but I
would not expect people that are not SG16 regulars to have opinions here.

BTW, the last version of the paper is here
https://isocpp.org/files/papers/P1885R8.pdf




> Same here.
>

Received on 2021-10-15 08:21:10