Hi C++ પกٱƈѻɗﻉ ḟäṅṡ 👋!

The current list of valid identifier characters is pretty silly (see [lex.name] 5.10 Identifiers or cppreference summary). It allows characters such as zero-width joiner and zero-width space among a few silly things (see how bad this can get, h/t Richard Kogelnig).

I asked where it came from, and IIUC John looked at Unicode and cobbled the list of valid ranges manually. That ain't great.

Is this group interested in fixing things?

There's already an existing standard for this, maybe it's a thing we can adopt as-is or use as a starting point:
https://unicode.org/reports/tr31/

Further, the tooling group was just talking about module names. I think we should allow any valid identifier name as module name, and look at how this could map to file names for a tooling TR's purpose.

Thanks,

J̙̘̗̘̟͐̀̎F͚̜͈̖͉̗̘̊