Date: Fri, 24 Sep 2021 15:10:35 +0200
On 24/09/2021 14.53, Corentin wrote:
>
>
> On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> I still feel the wording contains insufficient guidance for implementers to do
> the right thing.
>
>
> Consider a little-endian platform with UTF-16 wchar_t. What should wide_literal()
> return? UTF16 or UTF16LE ?
>
> Now consider a big-endian platform with UCS-2 wchar_t (because they never caught
> up to recent Unicode extensions). There's only UCS-2, although maybe something
> like UCS2BE might be the much more appropriate choice.
>
>
> Same question for UTF-32 = UCS-4 wchar_t.
> Should this be UCS4 or UTF32 or UTF32BE/LE?
>
>
>
> UTF-32 and UCS4 are not exactly the same thing, even if in practice they are (UTF-32 makes codepoints over 0x10FFFF invalid),
Ok, but the Unicode character set doesn't contain characters with code points
above 0x10FFFF, so that seems like a differentiation without distinction.
> and in practice everybody uses and expects UTF-32.
>
> UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
I still don't know what the recommendation for implementations is on
that platform. Should they choose UTF32 or UTF32BE (or LE, as appropriate)?
> Same for UTF16/UCS2/UTF-16LE/UTF16-BE
>
> UCS2BE is completely made up so that helps neither implementer nor users
Right, maybe that's missing from the list of encodings.
> We could add some recommendation that UTF16/UTF32 are prefered over the names that specify an endianness specifically as this is a Unicode specificity,
How so? I'd expect any larger-than-byte encoding to have the problem of
being endianness-dependent by virtue of the platform having an endianness.
> and users will expect UTF-16
> and I'm certainly willing to do so but... I'm not sure we want to describe in the standard every implementation.
The standard establishes rules for conformance. We want those rules to be
sufficiently firm so that useful programs can be written, and we want them
to be sufficiently loose to allow implementations to be efficient.
An interface specification talking about a fact of a platform/compiler is
unhelpful if two different implementations can give two different answers
for the same situation.
Jens
>
>
> On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> I still feel the wording contains insufficient guidance for implementers to do
> the right thing.
>
>
> Consider a little-endian platform with UTF-16 wchar_t. What should wide_literal()
> return? UTF16 or UTF16LE ?
>
> Now consider a big-endian platform with UCS-2 wchar_t (because they never caught
> up to recent Unicode extensions). There's only UCS-2, although maybe something
> like UCS2BE might be the much more appropriate choice.
>
>
> Same question for UTF-32 = UCS-4 wchar_t.
> Should this be UCS4 or UTF32 or UTF32BE/LE?
>
>
>
> UTF-32 and UCS4 are not exactly the same thing, even if in practice they are (UTF-32 makes codepoints over 0x10FFFF invalid),
Ok, but the Unicode character set doesn't contain characters with code points
above 0x10FFFF, so that seems like a differentiation without distinction.
> and in practice everybody uses and expects UTF-32.
>
> UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
I still don't know what the recommendation for implementations is on
that platform. Should they choose UTF32 or UTF32BE (or LE, as appropriate)?
> Same for UTF16/UCS2/UTF-16LE/UTF16-BE
>
> UCS2BE is completely made up so that helps neither implementer nor users
Right, maybe that's missing from the list of encodings.
> We could add some recommendation that UTF16/UTF32 are prefered over the names that specify an endianness specifically as this is a Unicode specificity,
How so? I'd expect any larger-than-byte encoding to have the problem of
being endianness-dependent by virtue of the platform having an endianness.
> and users will expect UTF-16
> and I'm certainly willing to do so but... I'm not sure we want to describe in the standard every implementation.
The standard establishes rules for conformance. We want those rules to be
sufficiently firm so that useful programs can be written, and we want them
to be sufficiently loose to allow implementations to be efficient.
An interface specification talking about a fact of a platform/compiler is
unhelpful if two different implementations can give two different answers
for the same situation.
Jens
Received on 2021-09-24 08:10:48