On Thu, 9 Apr 2020 at 08:15, Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:

See attached.

Note that a universal-character-name never
represents a member of the basic source
character set, so we don't have to call out
underscores specifically.

This makes any sequence involving a universal-character-name
a pp-identifier (and thus a preprocessing-token), so that

#define accent(x) x ## \u0300

does the right thing.

Did someone check that UAX #31 really is part of ISO 10646?
There should be a cross-reference to the Annex somewhere in [lex.name].

Further concerns:

We have a generic reference to ISO 10646 in the front matter
of the standard. That means the most recent version applies,
implicitly. That's a bit of a moving target, though: Does
an implementation lose conformance if a new version of ISO 10646
is issued (because more characters are allowed in identifiers in
later versions, maybe)?

Should we maybe require an implementation to document which
revision of ISO 10646 was used for XID_Start and XID_Continue?
This way, programmers can at least find out about a
portability pitfall.

C++23 programs should remain portable across compilers, therefore, in this case, it seems

necessary to specify a specific version, (and compilers can support newer versions as an extensions)

The paper should spend a section on explaining how expensive
(code size; maybe performance) a check for NFC is for the compiler.
Does the compiler need the entire Unicode tables, or are there
shortcuts (e.g. a few ranges of "bad" code points)?

I think both Zach and me should be able to provide data for that

Jens
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16