C++ Logo

sg16

Advanced search

Re: Referencing the Unicode Standard

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 15 Dec 2022 18:09:43 -0500
On 12/15/22 5:44 AM, Corentin wrote:
> Thanks folks.
> I did update the paper to remove the control alias table
> https://isocpp.org/files/papers/P2736R0.pdf

Thanks, Corentin.

A wording nit; I think "control code aliases" should be struck in this note:

    [ /Note/: None of the associated _Unicode_ character names,
    associated _Unicode_ character name aliases, or control code aliases
    have leading or trailing spaces. — /end note/ ]

Suggested edit:

    [ /Note/: None of the associated _Unicode_ character names_or_,
    associated _Unicode_ character name aliases, or control code aliases
    have leading or trailing spaces. — /end note/ ]

I think the reference to chapter 4.8 ("Name") is fine despite the fact
that the chapter number might change over time.

With regard to Jen's comments about the names being spread about the
code charts, that does seem to be true. However, chapter 24 ("About the
code charts") references both NamesList.txt
<https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt> and
NameAliases.txt
<https://www.unicode.org/Public/UCD/latest/ucd/NameAliases.txt> (chapter
4.8 also references the latter, but not the former). Perhaps a note
stating that the names are provided in these other files is warranted.

> I left __STDC_ISO_10646__ untouched, I'm not sure we reached a
> conclusion about it yesterday.

I agree. We can revisit this in January.

Tom.

>
> On Thu, Dec 15, 2022 at 9:17 AM Jens Maurer <jens.maurer_at_[hidden]> wrote:
>
>
>
> On 15/12/2022 02.48, Corentin via SG16 wrote:
> > On Thu, Dec 15, 2022, 02:21 Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> >
> > On 12/4/22 9:16 AM, Corentin via SG16 wrote:
> >> Hey folks.
> >> First draft updating some references to the Unicode
> standard (and more importantly replacing ISO-10646).
> >> I'm hoping to get early feedback :)
> >>
> >> https://isocpp.org/files/papers/D2736R0.pdf
> <https://isocpp.org/files/papers/D2736R0.pdf>
> >
> > Thanks for the paper, Corentin. I'm sorry I failed to notice
> the link here and to get this paper scheduled.
> >
> > I spent some time reading it tonight. It looks good so far.
> >
> > I think the "Control code aliases" table
> <http://eel.is/c++draft/tab:lex.charset.ucn> in [lex.charset]p5
> <http://eel.is/c++draft/lex.charset#5> can be (and should be)
> removed with these changes and p(5.2)
> <http://eel.is/c++draft/lex.charset#5.2> updated accordingly.
> Actually, I think p(5.2) <http://eel.is/c++draft/lex.charset#5.2>
> can be removed and p(5.1) <http://eel.is/c++draft/lex.charset#5.1>
> merged with p5 <http://eel.is/c++draft/lex.charset#5>. The listed
> control aliases are present in NameAliases.txt
> <https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt> as
> control names.
>
> Agreed, including NOT showing BELL (because it conflicts),
> which is good.
>
> > I considered that but we would need some wording that says that
> names in namealiases.txt with the control label should be
> supported. We'd get rid of the table but we'd have additional
> wording.
> > And we can't say "just support name aliases.txt" because
> figments, abbreviations and alternates are current not supported
> and that would be a design change.
> > I would support this change but I'm not sure it's in scope for
> this paper.
>
> The current draft paper says
>
> > The Unicode Standard subclause "Character Names List”
>
> This is not the correct reference; this subclause (I think it is
> rather a "chapter" or a "section")
> shows the general rules around character names presentation, but
> doesn't
> contain the exhaustive list. The exhaustive list is scattered around
> the "code charts", which are distributed as separate PDFs here:
>
> https://www.unicode.org/charts/
>
> It would be good to have a single unified PDF (as ISO 10646 does),
> but no luck,
> it seems.
>
> Jens
>
>
> > Tom.
> >
> >>
> >> A careful examination of the 3 standards do not reveal
> anything I think we should be concerned about besides what I've
> highlighted in the paper but please let me know if you have
> specific questions we need to address.
> >>
> >> I would like to point out the mess that is
> __STDC_ISO_10646__. and whose value currently depends on an
> ISO-10646 version.
> >> In the paper I propose to make that value
> implementation-defined as it cannot be relied upon except to check
> if some piece of code has been updated in the past 20+ years.
> >>
> >> I've also reworded the deprecated codecvt facilities to not
> mention UCS-2 and getting rid of one more reference.
> >>
> >> I've massaged a few places to improve how we reference
> unicode properties.
> >> The other thing that is not 100% clear to me is whether we
> should reference UAX44, the Derived Core properties and UAX 29
> (which we do currently),
> >> or if referencing the Unicode standard implies all of that
> (I think it does).
> >>
> >> I've noticed that the Unicode standard
> incorrectly references version 14.0 of itself when it means 15.0
> but hopefully we understand what is meant.
> >>
> >> Thanks,
> >> Corentin
> >>
> >
>

Received on 2022-12-15 23:09:45