On 6/15/20 11:38 AM, Corentin wrote:
On 6/15/20 7:14 AM, Corentin via SG16 wrote:
Hubert has specifically requested better support for
unmappable characters, so I don't agree with the
parenthetical.
I don't think that's a fair characterisation. Again there
is a mapping for all characters in ebcdic. That mapping is
prescriptive rather than semantic, but both Unicode and IBM
agree on that mapping ( the codepoints they map to do not
have associated semantic whatsoever and are meant to be used
that way). The wording trick will be to make sure we don't
prevent that mapping.
The claim that Unicode and IBM agree on this mapping seems
overreaching to me. Yes, there is a specification for how EBCDIC
code pages can be mapped to Unicode code points in a way that
preserves round tripping. I don't think that should be read as an
endorsement for conflating the semantic meanings of those
characters that represent distinct abstract characters
before/after such a mapping. I believe there have been requests
to be able to differentiate the presence of one of these control
characters in the source input and the mapped Unicode code point
being written as a UCN.
The Unicode characters they map to do no have associated semantic
There are 65 code points set aside in the Unicode Standard for compatibility with the C0
and C1 control codes defined in the ISO/IEC 2022 framework. The ranges of these code
points are U+0000..U+001F, U+007F, and U+0080..U+009F, which correspond to the 8-
bit controls 0016 to 1F16 (C0 controls), 7F16 (delete), and 8016 to 9F16 (C1 controls),
respectively. For example, the 8-bit legacy control code character tabulation (or tab) is the
byte value 0916; the Unicode Standard encodes the corresponding control code at U+0009.
The Unicode Standard provides for the intact interchange of these code points, neither
adding to nor subtracting from their semantics. The semantics of the control codes are
generally determined by the application with which they are used. However, in the absence
of specific application uses, they may be interpreted according to the control function
semantics specified in ISO/IEC 6429:1992.
In general, the use of control codes constitutes a higher-level protocol and is beyond the
scope of the Unicode Standard. For example, the use of ISO/IEC 6429 control sequences
for controlling bidirectional formatting would be a legitimate higher-level protocol layered
on top of the plain text of the Unicode Standard. Higher-level protocols are not specified
by the Unicode Standard; their existence cannot be assumed without a separate agreement
between the parties interchanging such data.