C++ Logo

sg16

Advanced search

[SG16] The Unicode Standard vs 10646 (which is defective)

From: Steve Downey <sdowney_at_[hidden]>
Date: Sat, 6 Nov 2021 00:24:47 -0400
>From 24.1 "Character Names List" of the Unicode Standard 14.0 (the upstream
document that seems to be well maintained)

Normative Aliases
A normative character name alias is a formal, unique, and stable alternate
name for a character. In limited circumstances, characters are given
normative character name aliases where there is a defect in the character
name. These normative aliases do not replace the character name, but rather
allow users to refer formally to the character without requiring the use of
a defective name. For more information, see Section 4.8, Name.

Normative aliases which provide information about corrections to defective
character names or which provide alternate names in wide use for a Unicode
format character are printed in the character names list, preceded by a
special symbol ". Normative aliases serving other purposes, if listed, are
shown by convention in all caps, following an “=”. Normative aliases of
type “figment” for control codes are not listed. Normative aliases which
represent commonly used abbreviations for control codes or format
characters are shown in all caps, enclosed in parentheses. In contrast,
informative aliases are shown in lowercase. For the definitive list of
normative aliases, also including their type and suitable for machine
parsing, see NameAliases.txt in the UCD.


So, according to this, the parts in parenthesis are abbreviations, the ALL
CAPS are normative aliases, which includes the ones listed for control
codes.
Some of this is captured in the NamesList.txt, and some of it is captured
in the software that normatively (for the unicode standard) processes that
file.
I am not going to claim that we can read that out of 10646. I think 10646
is not actually fit for purpose. The description of the code charts is
insufficient, and in any case is not machine readable which is actually
required for fidelity here.
I am intending to use the "normative aliases" for control codes as
described in the Unicode standard to produce a table to be included in our
standard. I believe this captures the intent of what we agreed.

Pedantically, 10646 references
http://www.unicode.org/versions/Unicode9.0.0/ch04.pdf normatively which
refers to section 24.1 of the unicode standard via undated reference to
describe character names in the Unicode Database. Which is not terribly
sane.

Received on 2021-11-05 23:25:00