On Mon, 15 Jun 2020 at 09:00, Jens Maurer <Jens.Maurer@gmx.net> wrote:

On 15/06/2020 00.06, Hubert Tong wrote:
> The presence of a UCN for a C1 (non-EBCDIC) control character in a supposedly-EBCDIC string is not immediately indicative of an error.
In this example, is the UCN intending to mean the conventionally mapped
EBCDIC control character, or something else?

Beyond EBCDIC control characters, do we know of any other situation
where input-to-Unicode mapping is not semantics-preserving or lossy?
It would be good to keep a list in one of the upcoming papers, for
the permanent record.

There are 3 scenarios I can think of:

The control characters for EBCDIC , but also other encodings that have more control characters beyond what exists in ascii, all of that maps to C0/C1 in an application specific manner
Some (~20) GB 10 830 characters map to the unicode private use area which also doesn't "preserve semantic"
Some Big5 characters do not have a unicode mapping at all ( that is exclusively place and people names, and for example doesn't concern the windows big 5 code pages)