C++ Logo

sg16

Advanced search

[SG16] Wording for UAX #31 identifiers

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 9 Apr 2020 08:15:12 +0200
See attached.

Note that a universal-character-name never
represents a member of the basic source
character set, so we don't have to call out
underscores specifically.

This makes any sequence involving a universal-character-name
a pp-identifier (and thus a preprocessing-token), so that

#define accent(x) x ## \u0300

does the right thing.

Did someone check that UAX #31 really is part of ISO 10646?
There should be a cross-reference to the Annex somewhere in [lex.name].

Further concerns:

We have a generic reference to ISO 10646 in the front matter
of the standard. That means the most recent version applies,
implicitly. That's a bit of a moving target, though: Does
an implementation lose conformance if a new version of ISO 10646
is issued (because more characters are allowed in identifiers in
later versions, maybe)?

Should we maybe require an implementation to document which
revision of ISO 10646 was used for XID_Start and XID_Continue?
This way, programmers can at least find out about a
portability pitfall.


The paper should spend a section on explaining how expensive
(code size; maybe performance) a check for NFC is for the compiler.
Does the compiler need the entire Unicode tables, or are there
shortcuts (e.g. a few ranges of "bad" code points)?

Jens

Received on 2020-04-09 01:18:09