Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-06-15 03:41:31
On Mon, 15 Jun 2020 at 09:00, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 15/06/2020 00.06, Hubert Tong wrote:
> > The presence of a UCN for a C1 (non-EBCDIC) control character in a
> supposedly-EBCDIC string is not immediately indicative of an error.
> In this example, is the UCN intending to mean the conventionally mapped
> EBCDIC control character, or something else?
> Beyond EBCDIC control characters, do we know of any other situation
> where input-to-Unicode mapping is not semantics-preserving or lossy?
> It would be good to keep a list in one of the upcoming papers, for
> the permanent record.
There are 3 scenarios I can think of:
- The control characters for EBCDIC , but also other encodings that have
more control characters beyond what exists in ascii, all of that maps to
C0/C1 in an application specific manner
- Some (~20) GB 10 830 characters map to the unicode private use area
which also doesn't "preserve semantic"
- Some Big5 characters do not have a unicode mapping at all ( that is
exclusively place and people names, and for example doesn't concern the
windows big 5 code pages)
SG16 list run by firstname.lastname@example.org