C++ Logo

sg16

Advanced search

Re: Referencing the Unicode Standard

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 15 Dec 2022 11:44:17 +0100
Thanks folks.
I did update the paper to remove the control alias table
https://isocpp.org/files/papers/P2736R0.pdf
I left __STDC_ISO_10646__ untouched, I'm not sure we reached a conclusion
about it yesterday.

On Thu, Dec 15, 2022 at 9:17 AM Jens Maurer <jens.maurer_at_[hidden]> wrote:

>
>
> On 15/12/2022 02.48, Corentin via SG16 wrote:
> > On Thu, Dec 15, 2022, 02:21 Tom Honermann <tom_at_[hidden] <mailto:
> tom_at_[hidden]>> wrote:
> >
> > On 12/4/22 9:16 AM, Corentin via SG16 wrote:
> >> Hey folks.
> >> First draft updating some references to the Unicode standard (and
> more importantly replacing ISO-10646).
> >> I'm hoping to get early feedback :)
> >>
> >> https://isocpp.org/files/papers/D2736R0.pdf <
> https://isocpp.org/files/papers/D2736R0.pdf>
> >
> > Thanks for the paper, Corentin. I'm sorry I failed to notice the
> link here and to get this paper scheduled.
> >
> > I spent some time reading it tonight. It looks good so far.
> >
> > I think the "Control code aliases" table <
> http://eel.is/c++draft/tab:lex.charset.ucn> in [lex.charset]p5 <
> http://eel.is/c++draft/lex.charset#5> can be (and should be) removed with
> these changes and p(5.2) <http://eel.is/c++draft/lex.charset#5.2> updated
> accordingly. Actually, I think p(5.2) <
> http://eel.is/c++draft/lex.charset#5.2> can be removed and p(5.1) <
> http://eel.is/c++draft/lex.charset#5.1> merged with p5 <
> http://eel.is/c++draft/lex.charset#5>. The listed control aliases are
> present in NameAliases.txt <
> https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt> as control
> names.
>
> Agreed, including NOT showing BELL (because it conflicts),
> which is good.
>
> > I considered that but we would need some wording that says that names in
> namealiases.txt with the control label should be supported. We'd get rid of
> the table but we'd have additional wording.
> > And we can't say "just support name aliases.txt" because figments,
> abbreviations and alternates are current not supported and that would be a
> design change.
> > I would support this change but I'm not sure it's in scope for this
> paper.
>
> The current draft paper says
>
> > The Unicode Standard subclause "Character Names List”
>
> This is not the correct reference; this subclause (I think it is rather a
> "chapter" or a "section")
> shows the general rules around character names presentation, but doesn't
> contain the exhaustive list. The exhaustive list is scattered around
> the "code charts", which are distributed as separate PDFs here:
>
> https://www.unicode.org/charts/
>
> It would be good to have a single unified PDF (as ISO 10646 does), but no
> luck,
> it seems.
>
> Jens
>
>
> > Tom.
> >
> >>
> >> A careful examination of the 3 standards do not reveal anything I
> think we should be concerned about besides what I've highlighted in the
> paper but please let me know if you have specific questions we need to
> address.
> >>
> >> I would like to point out the mess that is __STDC_ISO_10646__. and
> whose value currently depends on an ISO-10646 version.
> >> In the paper I propose to make that value implementation-defined as
> it cannot be relied upon except to check if some piece of code has been
> updated in the past 20+ years.
> >>
> >> I've also reworded the deprecated codecvt facilities to not mention
> UCS-2 and getting rid of one more reference.
> >>
> >> I've massaged a few places to improve how we reference unicode
> properties.
> >> The other thing that is not 100% clear to me is whether we should
> reference UAX44, the Derived Core properties and UAX 29 (which we do
> currently),
> >> or if referencing the Unicode standard implies all of that (I think
> it does).
> >>
> >> I've noticed that the Unicode standard incorrectly references
> version 14.0 of itself when it means 15.0 but hopefully we understand what
> is meant.
> >>
> >> Thanks,
> >> Corentin
> >>
> >
>

Received on 2022-12-15 10:44:30