C++ Logo


Advanced search

Re: [SG16] The Unicode Standard vs 10646 (which is defective)

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sat, 6 Nov 2021 13:03:43 +0100
On 06/11/2021 09.46, Corentin Jabot via SG16 wrote:
> I would personally prefer to take a reference to unicode (in addition of ISO 10646), or to another Unicode document describing names and aliases, rather than
> putting on the c++ standard the burden to maintain a list of aliases.

I don't think there is a burden. We're talking about the aliases for
the control codes (only), i.e. C0 and C1; the typo-correction aliases
are reasonably well specified in ISO 10646.

I'm not seeing any changes in this area that would require maintenance
on our side.

Regardless, the bibliography should refer to Unicode and we should
have a note that explains the provenance of our names.

> Is that something we can consider?

I think ISO is pretty adamant that, given a choice between an
ISO standard and another standard for a given topic, we are
to refer to the ISO standard.


> On Sat, Nov 6, 2021 at 5:25 AM Steve Downey via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> From 24.1 "Character Names List" of the Unicode Standard 14.0 (the upstream document that seems to be well maintained)
> Normative Aliases
> A normative character name alias is a formal, unique, and stable alternate name for a character. In limited circumstances, characters are given normative character name aliases where there is a defect in the character name. These normative aliases do not replace the character name, but rather allow users to refer formally to the character without requiring the use of a defective name. For more information, see Section 4.8, Name.
> Normative aliases which provide information about corrections to defective character names or which provide alternate names in wide use for a Unicode format character are printed in the character names list, preceded by a special symbol ". Normative aliases serving other purposes, if listed, are shown by convention in all caps, following an “=”. Normative aliases of type “figment” for control codes are not listed. Normative aliases which represent commonly used abbreviations for control codes or format characters are shown in all caps, enclosed in parentheses. In contrast, informative aliases are shown in lowercase. For the definitive list of normative aliases, also including their type and suitable for machine parsing, see NameAliases.txt in the UCD.
> So, according to this, the parts in parenthesis are abbreviations, the ALL CAPS are normative aliases, which includes the ones listed for control codes.
> Some of this is captured in the NamesList.txt, and some of it is captured in the software that normatively (for the unicode standard) processes that file.
> I am not going to claim that we can read that out of 10646. I think 10646 is not actually fit for purpose. The description of the code charts is insufficient, and in any case is not machine readable which is actually required for fidelity here.
> I am intending to use the "normative aliases" for control codes as described in the Unicode standard to produce a table to be included in our standard. I believe this captures the intent of what we agreed.
> Pedantically, 10646 references http://www.unicode.org/versions/Unicode9.0.0/ch04.pdf <http://www.unicode.org/versions/Unicode9.0.0/ch04.pdf> normatively which refers to section 24.1 of the unicode standard via undated reference to describe character names in the Unicode Database. Which is not terribly sane.
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>

Received on 2021-11-06 07:03:49