Late question looking over the paper.

How does the restriction of identifiers to follow the Unicode
specification compare to the C standard Annex D?

I ask, as I am wondering if we want some more wording
for C++ Annex C.5 [diff.iso] on which identifiers may (or may
not) be used in shared code.

AlisdairM

On Jun 8, 2020, at 19:14, JF Bastien via Ext <ext@lists.isocpp.org> wrote:

Hello Ⓔⓥⓞⓛⓤⓣⓘⓞⓝ,

Next week on Thursday the 18th at 10AM Pacific we'll be discussing Unicode identifiers. It was on our "tentatively ready" list as of Prague, but received some feedback and has been updated as detailed by Steve below. I'd like us to discuss the changes, and tentatively leave it on the tentatively ready list, so next time we can make decisions we reaffirm that we're forwarding to Core (as per our process).

Updated paper:

Here's the GitHub issue:

Meeting information:
Zoom Meeting ID 735059607
Zoom Meeting Password template
Zoom Meeting Automatic Phone-In US: +16699006833,,735059607# or +14086380968,,735059607#
Zoom Meeting Phone Number US: +1 669 900 6833
International numbers available https://iso.zoom.us/u/acPWjSNM0

See you then!

JF

p.s. this is valid C++:

int Ⓔⓥⓞⓛⓤⓣⓘⓞⓝ;


On Fri, Jun 5, 2020 at 1:37 PM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:

Last week SG16 (Text) approved forwarding this paper to EWG for consideration. It addresses fixing the state of allowed identifiers in C++.

https://isocpp.org/files/papers/P1949R4.html (also attached as d1949.html)

Summary

The allowed Unicode code points in identifiers include many that are unassigned or unnecessary, and others that are actually counter-productive. By adopting the recommendations of UAX #31, Unicode Identifier and Pattern Syntax, C++ will be easier to work with in international environments and less prone to accidental problems.

This proposal does not address some potential security concerns—so called homoglyph attacks—where letters that appear the same may be treated as distinct. Methods of defense against such attacks are complex and evolving, and requiring mitigation strategies would impose substantial implementation burden.

This proposal also recommends adoption of Unicode normalization form C (NFC) for identifiers to ensure that when compared, identifiers intended to be the same will compare as equal. Legacy encodings are generally naturally in NFC when converted to Unicode. Most tools will, by default, produce NFC text.

Some unusual scripts require the use of characters as joiners that are not allowed by UAX #31, these will no longer be available as identifiers in C++.

As a side-effect of adopting the identifier characters from UAX #31, using emoji in or as identifiers becomes ill-formed.


See also
https://unicode.org/reports/tr31/  Unicode® Standard Annex #31 UNICODE IDENTIFIER AND PATTERN SYNTAX


--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16
_______________________________________________
Ext mailing list
Ext@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/ext
Link to this post: http://lists.isocpp.org/ext/2020/06/14104.php