C++ Logo

SG16

Advanced search

Subject: Re: Emojis in identifiers
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-22 11:23:58


On 6/19/20 11:07 AM, Steve Downey via SG16 wrote:
> While we don't exclude scripts generally, by not doing script
> analysis, the lack of ZWJ and ZWNJ makes some words in Indic scripts
> problematic. The examples in
> https://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters%c2%a0are
> relevant. Zero Width Joiner and Zero Width Non-Joiner are used in
> Farsi, Malayalam, and Sinhala.
Perhaps a revision of the paper can note the possibility of such script
analysis as a possible future direction?
>
> Wikipedia
> https://en.wikipedia.org/wiki/Zero-width_joiner#Examples%c2%a0mentions
> Devanagari and Kannada, although it appears that recent editions of
> Unicode may have added explicit characters in Devanagari to
> alleviate the problem.

It isn't clear to me that we have a good list of scripts that are
(partially) excluded by P1949R4.  Is it reasonable to identify that set
and include it in a revision?  Perhaps noting that Unicode is evolving
to better handle them such that they will not be excluded in the future?

>
> Script recognition would also be necessary to identify the "emoji"
> script to allow sequences, as well as expanding the repertoire of
> allowed characters to include the currently explicitly disallowed
> emoji, the ones that were known at the time the allowed character
> ranges in C++ was put together.

And finally, perhaps the next revision can acknowledge this as a
possibility and attempt to qualify the technical impact?  I suspect JF
will want to poll inclusion of emoji, so the more we can do to inform
EWG on the consequences of doing so, the better.

Tom.

>
> On Fri, Jun 19, 2020 at 1:26 AM Jens Maurer via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 19/06/2020 00.38, Ville Voutilainen via SG16 wrote:
> > I'm confused to the hilt by this:
> >
> > "So, it seems we would increase consensus in EWG if we
> > added emojis to the valid identifier characters."
> >
> > The paper I read didn't seem to go into that direction. That quoted
> > bit (which I copy-pasted, it's not a drunken
> > transformation) seems like it's a completely new direction.
>
> Yesterday's EWG session had a poll at the end about forwarding P1949
> to CWG (tentatively ready), and there were three "against" votes.
> Asked about their reasons, the two points raised were:
>
>  - Are we excluding any (possibly fringe) scripts?
> (The paper should simply say "no, we don't", despite UAX #31
> confusingly containing a table "Excluded Scripts", but that's
> just for the opt-in "implementations may want to exclude them
> from identifiers" provision.)
>
>  - We should be as inclusive as possible, so we should include
> emoji.  (Slides may use them; some people may want to express
> themselves by using them.)
>
> Whether adding the latter would turn some "yes" votes into
> "no" votes in EWG is unknown. Let's ask.
>
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>



SG16 list run by sg16-owner@lists.isocpp.org