Date: Sun, 3 Oct 2021 11:24:08 +0200
On 03/10/2021 11.03, Corentin Jabot wrote:
> On Sat, Oct 2, 2021 at 12:55 AM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> And which ones should wide_literal() return?
[...]
> Now. utf16le/be are always encoding schemes, and a conforming implementation can return that if they want to. Is it useful for users?
After all this discussion, we should simply take away the freedom for (wide_)literal()
to return UTF16LE or UTF16BE, because those are category errors in the IANA table.
If it's UTF-16, an implementation should return UTF16 (modulo the general
non-conformance here; see separate discussion).
And the paper should explain in its prose text (or maybe even in the normative
part) that (wide_)literal() returns an
encoding form, not an encoding scheme, in the Unicode sense.
(With proper cross-references so that people understand we're talking about
a define term. Except that Unicode appears to define UCS encoding form/scheme
only, which doesn't really help us.
Tangential question: Some legacy Asian encodings are two bytes, but seem
to specify an exact order of these bytes, and are actually a byte-
based encoding. Sample quote:
"In Hitachi KEIS[25] (Kanji-processing Extended Information System), the
sequence 0x0A 0x41 switches to single-byte mode and the sequence 0x0A 0x42
switches to double-byte mode."
https://en.wikipedia.org/wiki/Japanese_language_in_EBCDIC
What happens if such an encoding is used with wchar_t ?
Or does that simply not happen?
Jens
> On Sat, Oct 2, 2021 at 12:55 AM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> And which ones should wide_literal() return?
[...]
> Now. utf16le/be are always encoding schemes, and a conforming implementation can return that if they want to. Is it useful for users?
After all this discussion, we should simply take away the freedom for (wide_)literal()
to return UTF16LE or UTF16BE, because those are category errors in the IANA table.
If it's UTF-16, an implementation should return UTF16 (modulo the general
non-conformance here; see separate discussion).
And the paper should explain in its prose text (or maybe even in the normative
part) that (wide_)literal() returns an
encoding form, not an encoding scheme, in the Unicode sense.
(With proper cross-references so that people understand we're talking about
a define term. Except that Unicode appears to define UCS encoding form/scheme
only, which doesn't really help us.
Tangential question: Some legacy Asian encodings are two bytes, but seem
to specify an exact order of these bytes, and are actually a byte-
based encoding. Sample quote:
"In Hitachi KEIS[25] (Kanji-processing Extended Information System), the
sequence 0x0A 0x41 switches to single-byte mode and the sequence 0x0A 0x42
switches to double-byte mode."
https://en.wikipedia.org/wiki/Japanese_language_in_EBCDIC
What happens if such an encoding is used with wchar_t ?
Or does that simply not happen?
Jens
Received on 2021-10-03 04:24:15