sg16: Re: [SG16] [isocpp-lib-ext] Sending P1885R8 Naming Text Encodings to Demystify Them directly to electronic polling for C++23

From: Peter Brett <pbrett_at_[hidden]>
Date: Wed, 20 Oct 2021 10:56:24 +0000

> -----Original Message-----
> From: Lib-Ext <lib-ext-bounces_at_lists.isocpp.org> On Behalf Of Jens Maurer
> via Lib-Ext
> Sent: 20 October 2021 10:32
> To: Corentin <corentin.jabot_at_gmail.com>
>
> Except that "representation" is bytes in C++, not octets.
>
> What exactly does iconv do with the encoding "UTF-16" ?
> According to the source code,
> https://urldefense.com/v3/__https://sourceware.org/git/?p=glibc.git;a=blob;f
> =iconvdata*utf-
> 16.c;h=13a2a056b7175c937439cccf110b99f0684e0f9c;hb=HEAD__;Lw!!EHscmS1ygiU1lA
> !UByjfUSTv6XBuc4ohp2iAePmntjABa3l5LPubOmc6kv9hFvlEc0r5UstZH9Cwg$
> (line 52), it does perform BOM interpretation
> (and appears to assume native endianness otherwise, in violation
> of ISO 10646 (which specifies big endian as the default)).
>
> It seems we've already partly given up on iconv pluggability,
> because we require the user to change UTF-16 to UTF-16LE/BE depending
> on the platform endianness to sidestep any BOM interpretation,
> for example for the string literal
>
> "\uffef something"

The relevant SG16 poll:

    Notwithstanding the specification in ISO10646, we suggest to
    return UTF-{16,32} from literal() or wide_literal() with the
    understanding that string literals in the compiled program may not
    actually begin with a BOM and that library facilities [e.g. iconv()]
    may consume a BOM if present.

    SF F N A SA
     0 8 1 0 0

    Attendance: 10

    Consensus: Strong consensus in favour.

    Author's position(s): F, F

My recollection about the SG16 discussion was that participants viewed
the iconv() use case as being a primary motivation for the P1885 feature,
but did not consider it to be a problem for iconv() to consume byte order
marks.

Best regards,

                      Peter

Received on 2021-10-20 05:56:43