C++ Logo


Advanced search

Re: Referencing the Unicode Standard

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Thu, 15 Dec 2022 09:17:18 +0100
On 15/12/2022 02.48, Corentin via SG16 wrote:
> On Thu, Dec 15, 2022, 02:21 Tom Honermann <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
> On 12/4/22 9:16 AM, Corentin via SG16 wrote:
>> Hey folks.
>> First draft updating some references to the Unicode standard (and more importantly replacing ISO-10646).
>> I'm hoping to get early feedback :)
>> https://isocpp.org/files/papers/D2736R0.pdf <https://isocpp.org/files/papers/D2736R0.pdf>
> Thanks for the paper, Corentin. I'm sorry I failed to notice the link here and to get this paper scheduled.
> I spent some time reading it tonight. It looks good so far.
> I think the "Control code aliases" table <http://eel.is/c++draft/tab:lex.charset.ucn> in [lex.charset]p5 <http://eel.is/c++draft/lex.charset#5> can be (and should be) removed with these changes and p(5.2) <http://eel.is/c++draft/lex.charset#5.2> updated accordingly. Actually, I think p(5.2) <http://eel.is/c++draft/lex.charset#5.2> can be removed and p(5.1) <http://eel.is/c++draft/lex.charset#5.1> merged with p5 <http://eel.is/c++draft/lex.charset#5>. The listed control aliases are present in NameAliases.txt <https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt> as control names.

Agreed, including NOT showing BELL (because it conflicts),
which is good.

> I considered that but we would need some wording that says that names in namealiases.txt with the control label should be supported. We'd get rid of the table but we'd have additional wording.
> And we can't say "just support name aliases.txt" because figments, abbreviations and alternates are current not supported and that would be a design change.
> I would support this change but I'm not sure it's in scope for this paper.

The current draft paper says

> The Unicode Standard subclause "Character Names List”

This is not the correct reference; this subclause (I think it is rather a "chapter" or a "section")
shows the general rules around character names presentation, but doesn't
contain the exhaustive list. The exhaustive list is scattered around
the "code charts", which are distributed as separate PDFs here:


It would be good to have a single unified PDF (as ISO 10646 does), but no luck,
it seems.


> Tom.
>> A careful examination of the 3 standards do not reveal anything I think we should be concerned about besides what I've highlighted in the paper but please let me know if you have specific questions we need to address.
>> I would like to point out the mess that is __STDC_ISO_10646__. and whose value currently depends on an ISO-10646 version.
>> In the paper I propose to make that value implementation-defined as it cannot be relied upon except to check if some piece of code has been updated in the past 20+ years.
>> I've also reworded the deprecated codecvt facilities to not mention UCS-2 and getting rid of one more reference.
>> I've massaged a few places to improve how we reference unicode properties.
>> The other thing that is not 100% clear to me is whether we should reference UAX44, the Derived Core properties and UAX 29 (which we do currently),
>> or if referencing the Unicode standard implies all of that (I think it does).
>> I've noticed that the Unicode standard incorrectly references version 14.0 of itself when it means 15.0 but hopefully we understand what is meant.
>> Thanks,
>> Corentin

Received on 2022-12-15 08:17:25