C++ Logo


Advanced search

Re: [SG16] Structure of EBCDIC MBCS and wide EBCDIC

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 14 Oct 2021 16:46:36 +0200
On 14/10/2021 15.53, Hubert Tong wrote:
> On Thu, Oct 14, 2021 at 8:28 AM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> Strictly speaking, ISO 10646 specifies the term "encoding scheme"
> only for the UTF-* encodings; we intend to apply the term in
> a slightly generalized fashion, meaning "representation of a
> character in a sequence of bytes" applicable to any encoding.
> I'd suggest to put that into our "Terms and definitions" clause,
> with proper cross-references to ISO 10646 and a note that says
> this is looking at the object representation for wchar_t.
> ISO/IEC 15445:2000 also defines "character encoding scheme":
> (Source: RFC1866) A function whose domain is the set of sequences of octets, and whose range is the set of sequences of characters from a character repertoire; that is, a sequence of octets and a character encoding scheme determining a sequence of characters.

Another quote from RFC 1866, which was obsoleted by RFC 2854:

"NOTE - To support non-western writing systems, HTML
        user agents are encouraged to support
        `ISO-10646-UCS-2' or similar character encoding

UCS-2, really?
Agreed, it's from 1995, but maybe that's not an ideal reference
for us, then.

Two other concerns here:

We don't refer to ISO 15445 (yet), and "sequence of octets" is different
from "sequence of bytes", so if we use that definition, we need to explain
how octets map to bytes, in particular for cases like CHAR_BITS == 16.


Received on 2021-10-14 09:46:47