Date: Sat, 6 Nov 2021 14:00:57 +0100
On 06/11/2021 05.24, Steve Downey via SG16 wrote:
> From 24.1 "Character Names List" of the Unicode Standard 14.0 (the upstream document that seems to be well maintained)
>
> Normative Aliases
> A normative character name alias is a formal, unique, and stable alternate name for a character. In limited circumstances, characters are given normative character name aliases where there is a defect in the character name. These normative aliases do not replace the character name, but rather allow users to refer formally to the character without requiring the use of a defective name. For more information, see Section 4.8, Name.
>
> Normative aliases which provide information about corrections to defective character names or which provide alternate names in wide use for a Unicode format character are printed in the character names list, preceded by a special symbol ". Normative aliases serving other purposes, if listed, are shown by convention in all caps, following an “=”. Normative aliases of type “figment” for control codes are not listed. Normative aliases which represent commonly used abbreviations for control codes or format characters are shown in all caps, enclosed in parentheses. In contrast, informative aliases are shown in lowercase. For the definitive list of normative aliases, also including their type and suitable for machine parsing, see NameAliases.txt in the UCD.
>
>
> So, according to this, the parts in parenthesis are abbreviations, the ALL CAPS are normative aliases, which includes the ones listed for control codes.
> Some of this is captured in the NamesList.txt, and some of it is captured in the software that normatively (for the unicode standard) processes that file.
Apparently.
Unicode 14 NameAliases.txt says
000A;LINE FEED;control
000A;NEW LINE;control
000A;END OF LINE;control
which seems to say that those three aliases are of the same kind.
Yet, Unicode 14 CodeCharts.pdf says
000A <control>
= LINE FEED (LF)
= new line (NL)
= end of line (EOL)
which appears to say that "new line" and "end of line" are second-
class (informative) aliases, because they are lowercase.
We need to make a decision whether we want to avail C++ of all
three aliases, or just the first one.
One more issue:
Unicode 14 NameAliases.txt says
# Note that no formal name alias for the ISO 6429 "BELL" is
# provided for U+0007, because of the existing name collision
# with U+1F514 BELL.
0007;ALERT;control
0007;BEL;abbreviation
Yet, Unicode 14 CodeCharts.pdf says
0007 <control>
= BELL
and about a thousand pages later
1F514 BELL
→ 0FC4 tibetan symbol dril bu
→ 2407 symbol for bell
→ 1F56D ringing bell
I've sent an e-mail to unicode_at_[hidden]
> I am not going to claim that we can read that out of 10646. I think 10646 is not actually fit for purpose. The description of the code charts is insufficient, and in any case is not machine readable which is actually required for fidelity here.
> I am intending to use the "normative aliases" for control codes as described in the Unicode standard to produce a table to be included in our standard. I believe this captures the intent of what we agreed.
I'd suggest use all "control" aliases from NameAliases.txt.
Jens
> From 24.1 "Character Names List" of the Unicode Standard 14.0 (the upstream document that seems to be well maintained)
>
> Normative Aliases
> A normative character name alias is a formal, unique, and stable alternate name for a character. In limited circumstances, characters are given normative character name aliases where there is a defect in the character name. These normative aliases do not replace the character name, but rather allow users to refer formally to the character without requiring the use of a defective name. For more information, see Section 4.8, Name.
>
> Normative aliases which provide information about corrections to defective character names or which provide alternate names in wide use for a Unicode format character are printed in the character names list, preceded by a special symbol ". Normative aliases serving other purposes, if listed, are shown by convention in all caps, following an “=”. Normative aliases of type “figment” for control codes are not listed. Normative aliases which represent commonly used abbreviations for control codes or format characters are shown in all caps, enclosed in parentheses. In contrast, informative aliases are shown in lowercase. For the definitive list of normative aliases, also including their type and suitable for machine parsing, see NameAliases.txt in the UCD.
>
>
> So, according to this, the parts in parenthesis are abbreviations, the ALL CAPS are normative aliases, which includes the ones listed for control codes.
> Some of this is captured in the NamesList.txt, and some of it is captured in the software that normatively (for the unicode standard) processes that file.
Apparently.
Unicode 14 NameAliases.txt says
000A;LINE FEED;control
000A;NEW LINE;control
000A;END OF LINE;control
which seems to say that those three aliases are of the same kind.
Yet, Unicode 14 CodeCharts.pdf says
000A <control>
= LINE FEED (LF)
= new line (NL)
= end of line (EOL)
which appears to say that "new line" and "end of line" are second-
class (informative) aliases, because they are lowercase.
We need to make a decision whether we want to avail C++ of all
three aliases, or just the first one.
One more issue:
Unicode 14 NameAliases.txt says
# Note that no formal name alias for the ISO 6429 "BELL" is
# provided for U+0007, because of the existing name collision
# with U+1F514 BELL.
0007;ALERT;control
0007;BEL;abbreviation
Yet, Unicode 14 CodeCharts.pdf says
0007 <control>
= BELL
and about a thousand pages later
1F514 BELL
→ 0FC4 tibetan symbol dril bu
→ 2407 symbol for bell
→ 1F56D ringing bell
I've sent an e-mail to unicode_at_[hidden]
> I am not going to claim that we can read that out of 10646. I think 10646 is not actually fit for purpose. The description of the code charts is insufficient, and in any case is not machine readable which is actually required for fidelity here.
> I am intending to use the "normative aliases" for control codes as described in the Unicode standard to produce a table to be included in our standard. I believe this captures the intent of what we agreed.
I'd suggest use all "control" aliases from NameAliases.txt.
Jens
Received on 2021-11-06 08:01:07