Subject: Re: Emojis in identifiers
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-22 11:23:58
On 6/19/20 11:07 AM, Steve Downey via SG16 wrote:
> While we don't exclude scripts generally, by not doing script
> analysis, the lack of ZWJ and ZWNJ makes some words in Indic scripts
> problematic. The examples in
> relevant. Zero Width Joiner and Zero Width Non-Joiner are used in
> Farsi,Â Malayalam, andÂ Sinhala.
Perhaps a revision of the paper can note the possibility of such script
analysis as a possible future direction?
> Devanagari andÂ Kannada, although it appears that recent editions of
> Unicode may have added explicit characters in Devanagari to
> alleviateÂ the problem.
It isn't clear to me that we have a good list of scripts that are
(partially) excluded by P1949R4.Â Is it reasonable to identify that set
and include it in a revision?Â Perhaps noting that Unicode is evolving
to better handle them such that they will not be excluded in the future?
> Script recognition wouldÂ also be necessary to identify the "emoji"
> script to allow sequences, as well as expanding the repertoireÂ of
> allowed characters to include the currently explicitly disallowed
> emoji, the ones that were known at the time the allowed character
> ranges in C++ was put together.
And finally, perhaps the next revision can acknowledge this as a
possibility and attempt to qualify the technical impact?Â I suspect JF
will want to poll inclusion of emoji, so the more we can do to inform
EWG on the consequences of doing so, the better.
> On Fri, Jun 19, 2020 at 1:26 AM Jens Maurer via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> On 19/06/2020 00.38, Ville Voutilainen via SG16 wrote:
> > I'm confused to the hilt by this:
> > "So, it seems we would increase consensus in EWG if we
> > added emojis to the valid identifier characters."
> > The paper I read didn't seem to go into that direction. That quoted
> > bit (which I copy-pasted, it's not a drunken
> > transformation) seems like it's a completely new direction.
> Yesterday's EWG session had a poll at the end about forwarding P1949
> to CWG (tentatively ready), and there were three "against" votes.
> Asked about their reasons, the two points raised were:
> Â - Are we excluding any (possibly fringe) scripts?
> (The paper should simply say "no, we don't", despite UAX #31
> confusingly containing a table "Excluded Scripts", but that's
> just for the opt-in "implementations may want to exclude them
> from identifiers" provision.)
> Â - We should be as inclusive as possible, so we should include
> emoji.Â (Slides may use them; some people may want to express
> themselves by using them.)
> Whether adding the latter would turn some "yes" votes into
> "no" votes in EWG is unknown. Let's ask.
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
SG16 list run by email@example.com