Date: Fri, 24 Sep 2021 14:53:25 +0200
On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 24/09/2021 10.17, Corentin wrote:
> > Jens, Hubert.
> > Are you satisfied with the added recommended practice sections, and
> other changes?
>
> No.
>
> Looking at https://isocpp.org/files/papers/D1885R8.pdf
>
> "[ Note: The name of each enumerator of the enumeration text_encoding::id
> is derived from
> the alias of each primary name that begins with ”cs”, as follows"
>
> "that begins with" refers to the "primary name".
>
> Also, the entity we're talking about here is "encoding", not really
> primary name.
> Maybe "... derived from the corresponding alias that begins with "cs", ..."
>
>
> csUnicode is renamed text_encoding::id::UCS2
>
> "is renamed to"
>
> or maybe better "is mapped to"
>
Sure
>
>
> I still feel the wording contains insufficient guidance for implementers
> to do
> the right thing.
> Consider a little-endian platform with UTF-16 wchar_t. What should
> wide_literal()
> return? UTF16 or UTF16LE ?
>
> Now consider a big-endian platform with UCS-2 wchar_t (because they never
> caught
> up to recent Unicode extensions). There's only UCS-2, although maybe
> something
> like UCS2BE might be the much more appropriate choice.
> Same question for UTF-32 = UCS-4 wchar_t.
> Should this be UCS4 or UTF32 or UTF32BE/LE?
UTF-32 and UCS4 are not exactly the same thing, even if in practice they
are (UTF-32 makes codepoints over 0x10FFFF invalid),
and in practice everybody uses and expects UTF-32.
UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
Same for UTF16/UCS2/UTF-16LE/UTF16-BE
UCS2BE is completely made up so that helps neither implementer nor users
We could add some recommendation that UTF16/UTF32 are prefered over the
names that specify an endianness specifically as this is a Unicode
specificity, and users will expect UTF-16
and I'm certainly willing to do so but... I'm not sure we want to describe
in the standard every implementation.
>
> Jens
>
> On 24/09/2021 10.17, Corentin wrote:
> > Jens, Hubert.
> > Are you satisfied with the added recommended practice sections, and
> other changes?
>
> No.
>
> Looking at https://isocpp.org/files/papers/D1885R8.pdf
>
> "[ Note: The name of each enumerator of the enumeration text_encoding::id
> is derived from
> the alias of each primary name that begins with ”cs”, as follows"
>
> "that begins with" refers to the "primary name".
>
> Also, the entity we're talking about here is "encoding", not really
> primary name.
> Maybe "... derived from the corresponding alias that begins with "cs", ..."
>
>
> csUnicode is renamed text_encoding::id::UCS2
>
> "is renamed to"
>
> or maybe better "is mapped to"
>
Sure
>
>
> I still feel the wording contains insufficient guidance for implementers
> to do
> the right thing.
> Consider a little-endian platform with UTF-16 wchar_t. What should
> wide_literal()
> return? UTF16 or UTF16LE ?
>
> Now consider a big-endian platform with UCS-2 wchar_t (because they never
> caught
> up to recent Unicode extensions). There's only UCS-2, although maybe
> something
> like UCS2BE might be the much more appropriate choice.
> Same question for UTF-32 = UCS-4 wchar_t.
> Should this be UCS4 or UTF32 or UTF32BE/LE?
UTF-32 and UCS4 are not exactly the same thing, even if in practice they
are (UTF-32 makes codepoints over 0x10FFFF invalid),
and in practice everybody uses and expects UTF-32.
UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
Same for UTF16/UCS2/UTF-16LE/UTF16-BE
UCS2BE is completely made up so that helps neither implementer nor users
We could add some recommendation that UTF16/UTF32 are prefered over the
names that specify an endianness specifically as this is a Unicode
specificity, and users will expect UTF-16
and I'm certainly willing to do so but... I'm not sure we want to describe
in the standard every implementation.
>
> Jens
>
Received on 2021-09-24 07:53:39