C++ Logo

sg16

Advanced search

Re: [SG16] Structure of EBCDIC MBCS and wide EBCDIC

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 14 Oct 2021 09:02:15 +0200
On 14/10/2021 03.05, Tom Honermann wrote:
> Thank you, Jens and Hubert for this further discussion.
>
> I think these are important points for the paper to address. However, I don't think they materially affect the design intent, so I'm not inclined to revisit the SG16 consensus. Please let me know if you feel this is new information that warrants another trip through SG16.

We need (preferably normative) text that says that wide_literal()
(and possibly wide_environment()) talk about the object representation.
The "object representation" part was discussed earlier, but I haven't
seen an update with that. (Maybe I've missed it.)

The current talk about plain encoding leads us to the Core section,
where the object representation is exactly not a concern.

I think we can also enhance wide_literal() and wide_environment()
to say normatively that UTF16BE/LE and UTF32BE/LE is never returned,
but instead UTF-16/UTF-32 with a special deviation from the ISO 10646
definition. (We also discussed this earlier.)

Since the latter changes what is in a normative reference, I think we
need to show this alteration of semantics in normative text.

Side note: What's the "environment" in wide_environment() to start with?
Where is it defined? How does it affect the semantics of the rest of the
standard?

Tom, if you believe this doesn't need a re-review in SG16, that's fine,
but of course that risks LEWG sending the paper back to SG16.

> Corentin, I suggest doing the following:
[...]
> * Add guidelines for registering wide encodings with IANA; e.g., recommended naming conventions and native endian encodings (potentially in addition to BE/LE encodings that might be used for octet based interchange).

Nobody suggest to register anything with IANA, I believe, so I don't know what
you mean here.

> * Add normative encouragement that, e.g., UTF-16 should not be returned for wide_literal() and wide_environment() when sizeof(wchar_t) is other than 1 or 2.

I think we can positively prohibit that (either explicitly or implicitly), because
sizeof(wchar_t) > 2 means the object representation can't ever be UTF-16.

Jens

Received on 2021-10-14 06:19:10