C++ Logo

sg16

Advanced search

Re: Should we get rid of UCS-2 and ISO 10646:2003

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sat, 3 Dec 2022 16:49:54 +0100
On Sat, Dec 3, 2022 at 3:48 PM Peter Bindels <peterbindels_at_[hidden]> wrote:

> The first seems good, the second one seems to always have been wrong? It
> now reads that you can transcode from utf16 to utf16, where invalid utf16
> is invalid ?!
>
> I think the second UCS2 mention should have already read UTF8. But please
> do check.
>

That what it does, apparently
https://en.cppreference.com/w/cpp/locale/codecvt_utf16

But I think saying undefined was wrong on my part.
It should say "

   - The facet shall convert between UTF-8 multibyte sequences and
   (depending on the size of Elem)
   - elements of the BMP encoded as UTF-16, or
   - UTF-32

What does "within the program" mean?


>
> Otherwise, :+1:
>
> On Sat, Dec 3, 2022 at 3:11 PM Corentin via SG16 <sg16_at_[hidden]>
> wrote:
>
>> Hey folks.
>> We only mention UCS-2 in [depr.locale.stdcvt.req]
>> http://eel.is/c++draft/depr#locale.stdcvt.req
>>
>> Referring to UCS-2 forces to carry a reference to ISO 10646:2003, as this
>> encoding has long been obsoleted.
>>
>> We could remove it entirely by rewording slightly, without changing any
>> effect.
>>
>> For the facet codecvt_­utf8:
>>
>> - The facet shall convert between UTF-8 multibyte sequences and UCS-2
>> or UTF-32 (depending on the size of Elem) within the program.
>>
>> =>
>>
>> - The facet shall convert between UTF-8 multibyte sequences and
>> UTF-16 or UTF-32 (depending on the size of Elem) within the program.
>> - When converting to UTF-16 if any UTF-8 sequence encodes a codepoint
>> outside of the BMP, the behavior is undefined.
>>
>> For the facet codecvt_­utf16:
>>
>> - The facet shall convert between UTF-16 multibyte sequences and
>> UCS-2 or UTF-32 (depending on the size of Elem) within the program
>>
>> =>
>>
>> - The facet shall convert between UTF-16 multibyte sequences and
>> UTF-16 or UTF-32 (depending on the size of Elem) within the program.
>> - When converting to UTF-16 if any UTF-8 sequence encodes a codepoint
>> outside of the BMP, the behavior is undefined.
>>
>>
>> What do you think?
>>
>> Corentin
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2022-12-03 15:50:07