C++ Logo

sg16

Advanced search

Re: [SG16] P1885 polling

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 24 Sep 2021 15:20:02 +0200
On Fri, Sep 24, 2021 at 3:10 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 24/09/2021 14.53, Corentin wrote:
> >
> >
> > On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]>> wrote:
> >
> > I still feel the wording contains insufficient guidance for
> implementers to do
> > the right thing.
> >
> >
> > Consider a little-endian platform with UTF-16 wchar_t. What should
> wide_literal()
> > return? UTF16 or UTF16LE ?
> >
> > Now consider a big-endian platform with UCS-2 wchar_t (because they
> never caught
> > up to recent Unicode extensions). There's only UCS-2, although
> maybe something
> > like UCS2BE might be the much more appropriate choice.
> >
> >
> > Same question for UTF-32 = UCS-4 wchar_t.
> > Should this be UCS4 or UTF32 or UTF32BE/LE?
> >
> >
> >
> > UTF-32 and UCS4 are not exactly the same thing, even if in practice they
> are (UTF-32 makes codepoints over 0x10FFFF invalid),
>
> Ok, but the Unicode character set doesn't contain characters with code
> points
> above 0x10FFFF, so that seems like a differentiation without distinction.
>

Indeed


>
> > and in practice everybody uses and expects UTF-32.
> >
> > UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
>
> I still don't know what the recommendation for implementations is on
> that platform. Should they choose UTF32 or UTF32BE (or LE, as
> appropriate)?
>

Cf mail I just sent

>
> > Same for UTF16/UCS2/UTF-16LE/UTF16-BE
> >
> > UCS2BE is completely made up so that helps neither implementer nor users
>
> Right, maybe that's missing from the list of encodings.
>
> > We could add some recommendation that UTF16/UTF32 are prefered over the
> names that specify an endianness specifically as this is a Unicode
> specificity,
>
> How so? I'd expect any larger-than-byte encoding to have the problem of
> being endianness-dependent by virtue of the platform having an endianness.
>
> > and users will expect UTF-16
> > and I'm certainly willing to do so but... I'm not sure we want to
> describe in the standard every implementation.
>
> The standard establishes rules for conformance. We want those rules to be
> sufficiently firm so that useful programs can be written, and we want them
> to be sufficiently loose to allow implementations to be efficient.
>
> An interface specification talking about a fact of a platform/compiler is
> unhelpful if two different implementations can give two different answers
> for the same situation.
>

I agree with that, but in this specific context there are decades of mess
to contend with.


> Jens
>

Received on 2021-09-24 08:20:16