While we don't exclude scripts generally, by not doing script analysis, the lack of ZWJ and ZWNJ makes some words in Indic scripts problematic. The examples in https://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters are relevant. Zero Width Joiner and Zero Width Non-Joiner are used in Farsi, Malayalam, and Sinhala. 

Wikipedia https://en.wikipedia.org/wiki/Zero-width_joiner#Examples mentions Devanagari and Kannada, although it appears that recent editions of Unicode may have added explicit characters in Devanagari to alleviate the problem. 

Script recognition would also be necessary to identify the "emoji" script to allow sequences, as well as expanding the repertoire of allowed characters to include the currently explicitly disallowed emoji, the ones that were known at the time the allowed character ranges in C++ was put together. 

On Fri, Jun 19, 2020 at 1:26 AM Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:
On 19/06/2020 00.38, Ville Voutilainen via SG16 wrote:
> I'm confused to the hilt by this:
> "So, it seems we would increase consensus in EWG if we
> added emojis to the valid identifier characters."
> The paper I read didn't seem to go into that direction. That quoted
> bit (which I copy-pasted, it's not a drunken
> transformation) seems like it's a completely new direction.

Yesterday's EWG session had a poll at the end about forwarding P1949
to CWG (tentatively ready), and there were three "against" votes.
Asked about their reasons, the two points raised were:

 - Are we excluding any (possibly fringe) scripts?
(The paper should simply say "no, we don't", despite UAX #31
confusingly containing a table "Excluded Scripts", but that's
just for the opt-in "implementations may want to exclude them
from identifiers" provision.)

 - We should be as inclusive as possible, so we should include
emoji.  (Slides may use them; some people may want to express
themselves by using them.)

Whether adding the latter would turn some "yes" votes into
"no" votes in EWG is unknown. Let's ask.


SG16 mailing list