C++ Logo

sg16

Advanced search

Re: [SG16] P1885 polling

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 24 Sep 2021 15:16:00 +0200
On Fri, Sep 24, 2021 at 2:53 PM Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Fri, Sep 24, 2021 at 2:05 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 24/09/2021 10.17, Corentin wrote:
>> > Jens, Hubert.
>> > Are you satisfied with the added recommended practice sections, and
>> other changes?
>>
>> No.
>>
>> Looking at https://isocpp.org/files/papers/D1885R8.pdf
>>
>> "[ Note: The name of each enumerator of the enumeration text_encoding::id
>> is derived from
>> the alias of each primary name that begins with ”cs”, as follows"
>>
>> "that begins with" refers to the "primary name".
>>
>> Also, the entity we're talking about here is "encoding", not really
>> primary name.
>> Maybe "... derived from the corresponding alias that begins with "cs",
>> ..."
>>
>>
>> csUnicode is renamed text_encoding::id::UCS2
>>
>> "is renamed to"
>>
>> or maybe better "is mapped to"
>>
>
> Sure
>
>
>>
>>
>> I still feel the wording contains insufficient guidance for implementers
>> to do
>> the right thing.
>
>
>> Consider a little-endian platform with UTF-16 wchar_t. What should
>> wide_literal()
>> return? UTF16 or UTF16LE ?
>>
>> Now consider a big-endian platform with UCS-2 wchar_t (because they never
>> caught
>> up to recent Unicode extensions). There's only UCS-2, although maybe
>> something
>> like UCS2BE might be the much more appropriate choice.
>
>
>> Same question for UTF-32 = UCS-4 wchar_t.
>> Should this be UCS4 or UTF32 or UTF32BE/LE?
>
>
>
> UTF-32 and UCS4 are not exactly the same thing, even if in practice they
> are (UTF-32 makes codepoints over 0x10FFFF invalid),
> and in practice everybody uses and expects UTF-32.
>
> UTF32 is an alias for either UTF32BE or UTF32LE, both are correct.
> Same for UTF16/UCS2/UTF-16LE/UTF16-BE
>
> UCS2BE is completely made up so that helps neither implementer nor users
> We could add some recommendation that UTF16/UTF32 are prefered over the
> names that specify an endianness specifically as this is a Unicode
> specificity, and users will expect UTF-16
> and I'm certainly willing to do so but... I'm not sure we want to describe
> in the standard every implementation.
>


If I summarize, I think people are asking for a front-matter
recommended practices

We have a sentence that says

"How a text_encoding object is determined to be representative of a
character encoding implemented in the translation or execution environment
is implementation-defined."

We could add beneath

Recommended Practices

   - Implementations should prefer returning UTF-16 over UTF-16BE or
   UTF-16LE
   - Implementations should prefer returning UTF-32 over UTF-32BE or
   UTF-32LE
   - Implementations should otherwise not consider registered encodings
   interchangeable (Example: Shift_JIS and Windows-31J denote different
   encoding)
   - Implementations should not refer to a registered encoding to describe
   another similar yet different non-registered encoding, unless there is
   antecedent to do that on that implementation (Example: Big 5)
   - Implementations should not refer to an encoding specified as
   single-byte to refer to describe a wide encoding

Is that reasonable?



>
>
>
>>
>> Jens
>>
>

Received on 2021-09-24 08:16:15