C++ Logo


Advanced search

Re: [SG16] Agreeing with Corentin's point re: problem with strict use of abstract characters

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 15 Jun 2020 10:41:31 +0200
On Mon, 15 Jun 2020 at 09:00, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 15/06/2020 00.06, Hubert Tong wrote:
> > The presence of a UCN for a C1 (non-EBCDIC) control character in a
> supposedly-EBCDIC string is not immediately indicative of an error.
> In this example, is the UCN intending to mean the conventionally mapped
> EBCDIC control character, or something else?
> Beyond EBCDIC control characters, do we know of any other situation
> where input-to-Unicode mapping is not semantics-preserving or lossy?
> It would be good to keep a list in one of the upcoming papers, for
> the permanent record.

There are 3 scenarios I can think of:

   - The control characters for EBCDIC , but also other encodings that have
   more control characters beyond what exists in ascii, all of that maps to
   C0/C1 in an application specific manner
   - Some (~20) GB 10 830 characters map to the unicode private use area
   which also doesn't "preserve semantic"
   - Some Big5 characters do not have a unicode mapping at all ( that is
   exclusively place and people names, and for example doesn't concern the
   windows big 5 code pages)

> Jens

Received on 2020-06-15 03:44:52