Date: Wed, 20 Oct 2021 10:56:24 +0000
> -----Original Message-----
> From: Lib-Ext <lib-ext-bounces_at_lists.isocpp.org> On Behalf Of Jens Maurer
> via Lib-Ext
> Sent: 20 October 2021 10:32
> To: Corentin <corentin.jabot_at_gmail.com>
>
> Except that "representation" is bytes in C++, not octets.
>
> What exactly does iconv do with the encoding "UTF-16" ?
> According to the source code,
> https://urldefense.com/v3/__https://sourceware.org/git/?p=glibc.git;a=blob;f
> =iconvdata*utf-
> 16.c;h=13a2a056b7175c937439cccf110b99f0684e0f9c;hb=HEAD__;Lw!!EHscmS1ygiU1lA
> !UByjfUSTv6XBuc4ohp2iAePmntjABa3l5LPubOmc6kv9hFvlEc0r5UstZH9Cwg$
> (line 52), it does perform BOM interpretation
> (and appears to assume native endianness otherwise, in violation
> of ISO 10646 (which specifies big endian as the default)).
>
> It seems we've already partly given up on iconv pluggability,
> because we require the user to change UTF-16 to UTF-16LE/BE depending
> on the platform endianness to sidestep any BOM interpretation,
> for example for the string literal
>
> "\uffef something"
The relevant SG16 poll:
Notwithstanding the specification in ISO10646, we suggest to
return UTF-{16,32} from literal() or wide_literal() with the
understanding that string literals in the compiled program may not
actually begin with a BOM and that library facilities [e.g. iconv()]
may consume a BOM if present.
SF F N A SA
0 8 1 0 0
Attendance: 10
Consensus: Strong consensus in favour.
Author's position(s): F, F
My recollection about the SG16 discussion was that participants viewed
the iconv() use case as being a primary motivation for the P1885 feature,
but did not consider it to be a problem for iconv() to consume byte order
marks.
Best regards,
Peter
> From: Lib-Ext <lib-ext-bounces_at_lists.isocpp.org> On Behalf Of Jens Maurer
> via Lib-Ext
> Sent: 20 October 2021 10:32
> To: Corentin <corentin.jabot_at_gmail.com>
>
> Except that "representation" is bytes in C++, not octets.
>
> What exactly does iconv do with the encoding "UTF-16" ?
> According to the source code,
> https://urldefense.com/v3/__https://sourceware.org/git/?p=glibc.git;a=blob;f
> =iconvdata*utf-
> 16.c;h=13a2a056b7175c937439cccf110b99f0684e0f9c;hb=HEAD__;Lw!!EHscmS1ygiU1lA
> !UByjfUSTv6XBuc4ohp2iAePmntjABa3l5LPubOmc6kv9hFvlEc0r5UstZH9Cwg$
> (line 52), it does perform BOM interpretation
> (and appears to assume native endianness otherwise, in violation
> of ISO 10646 (which specifies big endian as the default)).
>
> It seems we've already partly given up on iconv pluggability,
> because we require the user to change UTF-16 to UTF-16LE/BE depending
> on the platform endianness to sidestep any BOM interpretation,
> for example for the string literal
>
> "\uffef something"
The relevant SG16 poll:
Notwithstanding the specification in ISO10646, we suggest to
return UTF-{16,32} from literal() or wide_literal() with the
understanding that string literals in the compiled program may not
actually begin with a BOM and that library facilities [e.g. iconv()]
may consume a BOM if present.
SF F N A SA
0 8 1 0 0
Attendance: 10
Consensus: Strong consensus in favour.
Author's position(s): F, F
My recollection about the SG16 discussion was that participants viewed
the iconv() use case as being a primary motivation for the P1885 feature,
but did not consider it to be a problem for iconv() to consume byte order
marks.
Best regards,
Peter
Received on 2021-10-20 05:56:43