On 24/09/2021 10.17, Corentin wrote:
> Jens, Hubert.
> Are you satisfied with the added recommended practice sections, and other changes?
No.
Looking at https://isocpp.org/files/papers/D1885R8.pdf
"[ Note: The name of each enumerator of the enumeration text_encoding::id is derived from
the alias of each primary name that begins with ”cs”, as follows"
"that begins with" refers to the "primary name".
Also, the entity we're talking about here is "encoding", not really primary name.
Maybe "... derived from the corresponding alias that begins with "cs", ..."
csUnicode is renamed text_encoding::id::UCS2
"is renamed to"
or maybe better "is mapped to"
Sure
I still feel the wording contains insufficient guidance for implementers to do
the right thing.
Consider a little-endian platform with UTF-16 wchar_t. What should wide_literal()
return? UTF16 or UTF16LE ?
Now consider a big-endian platform with UCS-2 wchar_t (because they never caught
up to recent Unicode extensions). There's only UCS-2, although maybe something
like UCS2BE might be the much more appropriate choice.
Same question for UTF-32 = UCS-4 wchar_t.
Should this be UCS4 or UTF32 or UTF32BE/LE?
UTF-32 and UCS4 are not exactly the same thing, even if in practice they are (UTF-32 makes codepoints over 0x10FFFF invalid),
and in practice everybody uses and expects UTF-32.
UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
Same for UTF16/UCS2/UTF-16LE/UTF16-BE
UCS2BE is completely made up so that helps neither implementer nor users
We could add some recommendation that UTF16/UTF32 are prefered over the names that specify an endianness specifically as this is a Unicode specificity, and users will expect UTF-16
and I'm certainly willing to do so but... I'm not sure we want to describe in the standard every implementation.