sg16: Re: [SG16] Agenda for the 2021-10-06 SG16 telecon

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sun, 3 Oct 2021 11:24:08 +0200

On 03/10/2021 11.03, Corentin Jabot wrote:
> On Sat, Oct 2, 2021 at 12:55 AM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> And which ones should wide_literal() return?
[...]
> Now. utf16le/be are always encoding schemes, and a conforming implementation can return that if they want to. Is it useful for users?

After all this discussion, we should simply take away the freedom for (wide_)literal()
to return UTF16LE or UTF16BE, because those are category errors in the IANA table.

If it's UTF-16, an implementation should return UTF16 (modulo the general
non-conformance here; see separate discussion).

And the paper should explain in its prose text (or maybe even in the normative
part) that (wide_)literal() returns an
encoding form, not an encoding scheme, in the Unicode sense.
(With proper cross-references so that people understand we're talking about
a define term. Except that Unicode appears to define UCS encoding form/scheme
only, which doesn't really help us.

Tangential question: Some legacy Asian encodings are two bytes, but seem
to specify an exact order of these bytes, and are actually a byte-
based encoding. Sample quote:

"In Hitachi KEIS[25] (Kanji-processing Extended Information System), the
sequence 0x0A 0x41 switches to single-byte mode and the sequence 0x0A 0x42
switches to double-byte mode."

https://en.wikipedia.org/wiki/Japanese_language_in_EBCDIC

What happens if such an encoding is used with wchar_t ?
Or does that simply not happen?

Jens

Received on 2021-10-03 04:24:15