On Wed, Sep 29, 2021 at 9:51 PM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:

In order to make progress on wording I'd like to have some polls taken for named unicode escape sequences, since we've learned and changed a few things since the papers were first written.

1)
In light of progress on D2290 Delimited escape sequences adding the form \u{ simple-hexadecimal-digit-sequence } to universal-character-name, named escape sequence should be an alternate form of universal-character-name, rather than only for literals.

u-char:
digit
nondigit

u-char-sequence:
u-char
u-char-sequence u-char

universal-character-name:
add \U{ u-char-sequence }

You mean \N, right?

But yes, otherwise sounds good

Add text to say that u-char-sequence must match a name or alias in the UCD.

2)
In light of the implementation experience using the Unicode standard rules for matching names producing a compact data form with fast lookup, propose that be adopted rather than mandating CAPITAL only exact match rules.

I'll go further. loose matching is ever so slightly better for implementations.

The strings in the database can be stored stripped of whitespace, case, etc, then at runtime the same is done to the needle. This produces a more efficient storage

Note that many online sources of codepoint names are lax about exact match.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16