C++ Logo


Advanced search

Re: [SG16] Structure of EBCDIC MBCS and wide EBCDIC

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 14 Oct 2021 14:28:26 +0200
On 14/10/2021 13.38, Corentin Jabot wrote:
> On Thu, Oct 14, 2021 at 1:19 PM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> On 14/10/2021 03.05, Tom Honermann wrote:
> > Thank you, Jens and Hubert for this further discussion.
> >
> > I think these are important points for the paper to address. However, I don't think they materially affect the design intent, so I'm not inclined to revisit the SG16 consensus. Please let me know if you feel this is new information that warrants another trip through SG16.
> We need (preferably normative) text that says that wide_literal()
> (and possibly wide_environment()) talk about the object representation.
> The "object representation" part was discussed earlier, but I haven't
> seen an update with that. (Maybe I've missed it.)
> That was done https://isocpp.org/files/papers/P1885R8.pdf <https://isocpp.org/files/papers/P1885R8.pdf>

Ah, thanks.

This looks broken (note extra "Returns"):

Let E be Returns: a text_encoding object representing the encoding scheme of the object
representation of ordinary string literals [lex.charset].

(Same for wide.)

"the encoding scheme of the object representation of wide string literals"

That sounds like the object representation needs an encoding scheme.
Maybe "used by" or "manifested by" instead of the first "of"?


talks about "encoding scheme" (good), but then talks about "encoding".
It should always talk about "encoding scheme".
(I need to check whether POSIX locales associate any encoding schemes
at all.)

"is not affected by calls
to the POSIX functions setenv and other functions which can modify the environment"

"POSIX function" (singular; we're only talking about one function here)

similar for "wide"

"[ Note: This comparison is identical to the ”Charset Alias Matching” algorithm described in the
Unicode Technical Standard 22. — end note ]"

This and the following example needs to move to immediately before
the "substitute_utf_encoding" meta-function.

"an implementation-defined text_encoding object"

What exactly is implementation-defined here? Can we drop the "implementation-defined"?

SUBSTITUTE_UTF_ENCODING(E) addresses the UTF-16/32 case by blindly
replacing the return value; good. Do we want to say something
special about UCS2 and UCS4, too? Those appear to be big-endian
in IANA, but we probably want them to be returned regardless of
platform endianness.

"The encodings returned from wide_literal and wide_environments should"

encodings -> encoding schemes
This appears multiple times; please use "encoding schemes" everywhere
(if you mean it).

Strictly speaking, ISO 10646 specifies the term "encoding scheme"
only for the UTF-* encodings; we intend to apply the term in
a slightly generalized fashion, meaning "representation of a
character in a sequence of bytes" applicable to any encoding.
I'd suggest to put that into our "Terms and definitions" clause,
with proper cross-references to ISO 10646 and a note that says
this is looking at the object representation for wchar_t.


Received on 2021-10-14 07:28:33