Here is the paper for WG14 combing through the identifiers and doing the analysis: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1518.htm

On Wed, May 15, 2019 at 1:39 PM Tom Honermann <tom@honermann.net> wrote:
Thanks for bringing this to our attention.  I agree there are opportunities for improvement here.  I filed a new SG16 issue to track this.


I encourage anyone interested in this to sign up to write a paper or provide additional background material in the issue (e.g., more history about the current list of ranges, an analysis of UAX#31 and its applicability to C++, etc...).

Tom.

On 5/10/19 12:43 PM, JF Bastien wrote:
Hi C++ પกٱƈѻɗﻉ ḟäṅṡ 👋!

The current list of valid identifier characters is pretty silly (see [lex.name] 5.10 Identifiers or cppreference summary). It allows characters such as zero-width joiner and zero-width space among a few silly things (see how bad this can get, h/t Richard Kogelnig).

I asked where it came from, and IIUC John looked at Unicode and cobbled the list of valid ranges manually. That ain't great.

Is this group interested in fixing things?

There's already an existing standard for this, maybe it's a thing we can adopt as-is or use as a starting point:

Further, the tooling group was just talking about module names. I think we should allow any valid identifier name as module name, and look at how this could map to file names for a tooling TR's purpose.

Thanks,

JF

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode


_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode