Date: Mon, 19 Apr 2021 11:14:10 -0400
On 4/19/21 3:40 AM, Jens Gustedt via Liaison wrote:
> Tom,
>
> on Mon, 19 Apr 2021 00:51:26 -0400 you (Tom Honermann
> <tom_at_[hidden]>) wrote:
>
>> I'm not aware of anything in C++ that forces better Unicode support.
> In C++ the `u` and `U` prefixes guarantee UTF-16 and UTF-32
> encoding. In C they don't. Here you only have guaranties of 16 and 32
> bit encodings and you need to query a feature test macro to know if
> effectively they have the Unicode encodings.
That is technically correct, but surveys have (so far) failed to
identify any implementations that do not use UTF-16 and UTF-32 for `u`
and `U` prefixed literals respectively (and the intent in the original
papers was clear that these are intended for UTF-16 and UTF-32
respectively). SG16 does intend to bring a paper to WG14 to provide
this guarantee as well. In fact, my records
<https://github.com/sg16-unicode/sg16/issues/54> state that JeanHeyd
already submitted such a paper, but I don't see it in the WG14 document
log. Perhaps I was overly ambitious in slapping a "paper submitted"
label on that issue.
Tom.
>
>> WG21's SG16 is working to improve support for Unicode, but not with
>> the intent to exclude support for legacy character sets.
> I know, but this is somewhat better encapsulated into `wchar_t`. In C
> that may spread to the other wide character and wide string types.
>
> Jens
>
>
> _______________________________________________
> Liaison mailing list
> Liaison_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
> Link to this post: http://lists.isocpp.org/liaison/2021/04/0464.php
> Tom,
>
> on Mon, 19 Apr 2021 00:51:26 -0400 you (Tom Honermann
> <tom_at_[hidden]>) wrote:
>
>> I'm not aware of anything in C++ that forces better Unicode support.
> In C++ the `u` and `U` prefixes guarantee UTF-16 and UTF-32
> encoding. In C they don't. Here you only have guaranties of 16 and 32
> bit encodings and you need to query a feature test macro to know if
> effectively they have the Unicode encodings.
That is technically correct, but surveys have (so far) failed to
identify any implementations that do not use UTF-16 and UTF-32 for `u`
and `U` prefixed literals respectively (and the intent in the original
papers was clear that these are intended for UTF-16 and UTF-32
respectively). SG16 does intend to bring a paper to WG14 to provide
this guarantee as well. In fact, my records
<https://github.com/sg16-unicode/sg16/issues/54> state that JeanHeyd
already submitted such a paper, but I don't see it in the WG14 document
log. Perhaps I was overly ambitious in slapping a "paper submitted"
label on that issue.
Tom.
>
>> WG21's SG16 is working to improve support for Unicode, but not with
>> the intent to exclude support for legacy character sets.
> I know, but this is somewhat better encapsulated into `wchar_t`. In C
> that may spread to the other wide character and wide string types.
>
> Jens
>
>
> _______________________________________________
> Liaison mailing list
> Liaison_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
> Link to this post: http://lists.isocpp.org/liaison/2021/04/0464.php
Received on 2021-04-19 10:14:14