sg16: Re: [SG16] P1885 polling

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 24 Sep 2021 16:54:57 +0200

On 24/09/2021 15.58, Corentin wrote:
>
>
> On Fri, Sep 24, 2021 at 3:24 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 24/09/2021 15.16, Corentin wrote:
> >
> >
> > On Fri, Sep 24, 2021 at 2:53 PM Corentin <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]> <mailto:corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>>> wrote:
> >
> >
> >
> > On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>>> wrote:
> >
> > On 24/09/2021 10.17, Corentin wrote:
> > > Jens, Hubert.
> > > Are you satisfied with the added recommended practice sections, and other changes?
> >
> > No.
> >
> > Looking at https://isocpp.org/files/papers/D1885R8.pdf <https://isocpp.org/files/papers/D1885R8.pdf> <https://isocpp.org/files/papers/D1885R8.pdf <https://isocpp.org/files/papers/D1885R8.pdf>>
> >
> > "[ Note: The name of each enumerator of the enumeration text_encoding::id is derived from
> > the alias of each primary name that begins with ”cs”, as follows"
> >
> > "that begins with" refers to the "primary name".
> >
> > Also, the entity we're talking about here is "encoding", not really primary name.
> > Maybe "... derived from the corresponding alias that begins with "cs", ..."
> >
> >
> > csUnicode is renamed text_encoding::id::UCS2
> >
> > "is renamed to"
> >
> > or maybe better "is mapped to"
> >
> >
> > Sure
> >
> >
> >
> >
> > I still feel the wording contains insufficient guidance for implementers to do
> > the right thing.
> >
> >
> > Consider a little-endian platform with UTF-16 wchar_t. What should wide_literal()
> > return? UTF16 or UTF16LE ?
> >
> > Now consider a big-endian platform with UCS-2 wchar_t (because they never caught
> > up to recent Unicode extensions). There's only UCS-2, although maybe something
> > like UCS2BE might be the much more appropriate choice.
> >
> >
> > Same question for UTF-32 = UCS-4 wchar_t.
> > Should this be UCS4 or UTF32 or UTF32BE/LE?
> >
> >
> >
> > UTF-32 and UCS4 are not exactly the same thing, even if in practice they are (UTF-32 makes codepoints over 0x10FFFF invalid),
> > and in practice everybody uses and expects UTF-32.
> >
> > UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
> > Same for UTF16/UCS2/UTF-16LE/UTF16-BE
> >
> > UCS2BE is completely made up so that helps neither implementer nor users
> > We could add some recommendation that UTF16/UTF32 are prefered over the names that specify an endianness specifically as this is a Unicode specificity, and users will expect UTF-16
> > and I'm certainly willing to do so but... I'm not sure we want to describe in the standard every implementation.
> >
> >
> >
> > If I summarize, I think people are asking for a front-matter recommended practices
> >
> > We have a sentence that says
> >
> > "How a text_encoding object is determined to be representative of a character encoding implemented in the translation or execution environment is implementation-defined."
> >
> > We could add beneath
> >
> > Recommended Practices
> >
> > * Implementations should prefer returning UTF-16 over UTF-16BE or UTF-16LE
> > * Implementations should prefer returning UTF-32 over UTF-32BE or UTF-32LE
> > * Implementations should otherwise not consider registered encodings interchangeable (Example: Shift_JIS and Windows-31J denote different encoding)
> > * Implementations should not refer to a registered encoding to describe another similar yet different non-registered encoding, unless there is antecedent to do that on that implementation (Example: Big 5)
> > * Implementations should not refer to an encoding specified as single-byte to refer to describe a wide encoding
> >
> > Is that reasonable?
>
> Yes, that sounds like progress to me in the clarity of specification.
> (People might disagree on whether that particular set of recommendations
> is what they want.)
>
> Further questions: UCS2 says "network byte order".
> Do we want to recommend that "network byte order" be ignored
> here and for UCS4, consistent with the preference of UTF-16
> over the byte-order dependent variants?
>
>
>
> I think you mean that?
>
>> the 2-octet Basic Multilingual Plane, aka Unicode this needs to specify network byte order: the standard does not specify (it is a 16-bit integer space)

Yes.

> To the extent that i can parse that sentence, I am not sure it has any weight
>
>
>
> We should review the encoding list again whether there are any other
> wide encodings that have (possibly implied) byte order assumptions.
>
>
> From what I understand, only the BE/LE versions of unicode specify a byte order. We can add that to the wording too

What other wide encodings are in the list? And what is the expectation
when viewing those as a byte stream?

Jens

Received on 2021-09-24 09:55:25