C++ Logo


Advanced search

[SG16] Comments on P1949R3: C++ Identifier Syntax using Unicode Standard Annex 31

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Tue, 21 Apr 2020 23:10:46 +0200
"Add an entry in clause 2 [intro.refs]:"

There are actually two entries added.

Further, the existing entry for UAX#29 is presented
differently. Is there a reason why the new entries
for UAX#31 and UAX#44 should deviate?


"Preprocessing identifier tokens lexically include all _identifier_s (5.10 [lex.name]) and _keyword_s (5.11 [lex.key])."

The underscores should be italics start/end HTML markers.

5.10 lex.name p1:
"An identifier shall conform to the NFC normalization specified in ISO/IEC 10646."

should probably come first. In an abstract sense, we first want NFC
before we check XID_Start and XID_Continue.


"Change: identifiers that were valid before, containing characters not present in UAX #44 properties XID_Start or XID_Continue, or in non-NFC normalization format, are now rejected."

Capital "I" for "Identifiers".
Suggested rephrasing: "Previously valid identifiers containing characters ..."

Rationale: Many confusable identifiers were previously technically allowed but not commonly used. C++23 requires these changes to conform to Unicode Standard UAX #31 recommendations and to prevent confusion between normalization formats causing compile errors.

Effect on original feature: Identifiers are now validated according to Unicode Standard recommended methods. Identifiers that contain invisible characters are not allowed.

This should be shortened a bit:
I'm not sure whether UAX#31 conformance *requires* those changes;
conformance seems to just require that we document what we accept
as an identifier.

Rationale: Prevent confusing characters in identifiers. NFC normalization of names ensures consistent linker behavior.

Effect on original feature: Some identifiers are no longer well-formed.

Annex X

Replace "must" with "is" or "is required to" or similar.

X.2 R1
<Continue> := <Start> + XID_Continue

I think this is just "XID_Continue" (without "Start").

"If an implementation wishes to allow"

Uh, "implementation" is read as "C++ implementation", but that's not what
is meant here.

Suggestion for general rephrasing:

X.3 R2. Immutable Identifiers

An implementation may choose to guarantee that the set of identifers will never change by fixing the set of codepoints allowed in identifers forever. C++ does not choose to make this guarantee. As scripts are added to Unicode, additional characters in those scripts may become available for use in idenfiers.


C++ does not guarantee that the set of valid identifiers will never change.
As scripts are added to Unicode, additional characters in those scripts may become available for use in identifiers.


Received on 2020-04-21 16:13:45