C++ Logo

sg16

Advanced search

Re: [SG16] Polls for named unicode escape sequences

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 29 Sep 2021 22:03:15 +0200
On Wed, Sep 29, 2021 at 9:51 PM Steve Downey via SG16 <sg16_at_[hidden]>
wrote:

> In order to make progress on wording I'd like to have some polls taken for
> named unicode escape sequences, since we've learned and changed a few
> things since the papers were first written.
>
> 1)
> In light of progress on D2290 Delimited escape sequences adding the form
> \u{ simple-hexadecimal-digit-sequence } to universal-character-name, named
> escape sequence should be an alternate form of universal-character-name,
> rather than only for literals.
>
> u-char:
> digit
> nondigit
>
> u-char-sequence:
> u-char
> u-char-sequence u-char
>
> universal-character-name:
> add \U{ u-char-sequence }
>

You mean \N, right?
But yes, otherwise sounds good


>
> Add text to say that u-char-sequence must match a name or alias in the UCD.
>
>
> 2)
> In light of the implementation experience using the Unicode standard rules
> for matching names producing a compact data form with fast lookup, propose
> that be adopted rather than mandating CAPITAL only exact match rules.
>

I'll go further. loose matching is ever so slightly better for
implementations.
The strings in the database can be stored stripped of whitespace, case,
etc, then at runtime the same is done to the needle. This produces a more
efficient storage


>
> Note that many online sources of codepoint names are lax about exact match.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-09-29 15:03:33