C++ Logo


Advanced search

Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-06-15 03:41:31

On Mon, 15 Jun 2020 at 09:00, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 15/06/2020 00.06, Hubert Tong wrote:
> > The presence of a UCN for a C1 (non-EBCDIC) control character in a
> supposedly-EBCDIC string is not immediately indicative of an error.
> In this example, is the UCN intending to mean the conventionally mapped
> EBCDIC control character, or something else?
> Beyond EBCDIC control characters, do we know of any other situation
> where input-to-Unicode mapping is not semantics-preserving or lossy?
> It would be good to keep a list in one of the upcoming papers, for
> the permanent record.

There are 3 scenarios I can think of:

   - The control characters for EBCDIC , but also other encodings that have
   more control characters beyond what exists in ascii, all of that maps to
   C0/C1 in an application specific manner
   - Some (~20) GB 10 830 characters map to the unicode private use area
   which also doesn't "preserve semantic"
   - Some Big5 characters do not have a unicode mapping at all ( that is
   exclusively place and people names, and for example doesn't concern the
   windows big 5 code pages)

> Jens

SG16 list run by sg16-owner@lists.isocpp.org