sg16: [SG16-Unicode] Identifiers in C++

From: JF Bastien <cxx_at_[hidden]>
Date: Fri, 10 May 2019 09:43:28 -0700

Hi C++ પกٱƈѻɗﻉ ḟäṅṡ 👋!

The current list of valid identifier characters is pretty silly (see [*lex.name
<http://lex.name>*] 5.10 Identifiers or cppreference summary
<https://en.cppreference.com/w/cpp/language/identifiers>). It allows
characters such as zero-width joiner and zero-width space among a few silly
things (see how bad this can get <https://godbolt.org/z/sBJk1k>,
h/t Richard Kogelnig).

I asked where it came from, and IIUC John looked at Unicode and cobbled the
list of valid ranges manually. That ain't great.

Is this group interested in fixing things?

There's already an existing standard for this, maybe it's a thing we can
adopt as-is or use as a starting point:

https://unicode.org/reports/tr31/

Further, the tooling group was just talking about module names. I think we
should allow any valid identifier name as module name, and look at how this
could map to file names for a tooling TR's purpose.

Thanks,

J̙̘̗̘̟͐̀̎F͚̜͈̖͉̗̘̊

Received on 2019-05-10 18:43:42