C++ Logo

sg16

Advanced search

[SG16] Comments on P1949R3: C++ Identifier Syntax using Unicode Standard Annex 31

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Tue, 21 Apr 2020 23:10:46 +0200
"Add an entry in clause 2 [intro.refs]:"

There are actually two entries added.

Further, the existing entry for UAX#29 is presented
differently. Is there a reason why the new entries
for UAX#31 and UAX#44 should deviate?

5.10new

"Preprocessing identifier tokens lexically include all _identifier_s (5.10 [lex.name]) and _keyword_s (5.11 [lex.key])."

The underscores should be italics start/end HTML markers.


5.10 lex.name p1:
"An identifier shall conform to the NFC normalization specified in ISO/IEC 10646."

should probably come first. In an abstract sense, we first want NFC
before we check XID_Start and XID_Continue.


diff.cpp20.lex

"Change: identifiers that were valid before, containing characters not present in UAX #44 properties XID_Start or XID_Continue, or in non-NFC normalization format, are now rejected."

Capital "I" for "Identifiers".
Suggested rephrasing: "Previously valid identifiers containing characters ..."


"
Rationale: Many confusable identifiers were previously technically allowed but not commonly used. C++23 requires these changes to conform to Unicode Standard UAX #31 recommendations and to prevent confusion between normalization formats causing compile errors.

Effect on original feature: Identifiers are now validated according to Unicode Standard recommended methods. Identifiers that contain invisible characters are not allowed.
"

This should be shortened a bit:
I'm not sure whether UAX#31 conformance *requires* those changes;
conformance seems to just require that we document what we accept
as an identifier.

Maybe:
"
Rationale: Prevent confusing characters in identifiers. NFC normalization of names ensures consistent linker behavior.

Effect on original feature: Some identifiers are no longer well-formed.
"


Annex X

Replace "must" with "is" or "is required to" or similar.


X.2 R1
<Continue> := <Start> + XID_Continue

I think this is just "XID_Continue" (without "Start").


X.2.1
"If an implementation wishes to allow"

Uh, "implementation" is read as "C++ implementation", but that's not what
is meant here.

Suggestion for general rephrasing:

X.3 R2. Immutable Identifiers

"
An implementation may choose to guarantee that the set of identifers will never change by fixing the set of codepoints allowed in identifers forever. C++ does not choose to make this guarantee. As scripts are added to Unicode, additional characters in those scripts may become available for use in idenfiers.
"

->

"
C++ does not guarantee that the set of valid identifiers will never change.
As scripts are added to Unicode, additional characters in those scripts may become available for use in identifiers.
"



Jens

Received on 2020-04-21 16:13:45