C++ Logo

SG16

Advanced search

Subject: Re: Emojis in identifiers
From: Steve Downey (sdowney_at_[hidden])
Date: 2020-06-18 16:18:38


I'll see if I can put together a list that makes sense of what characters
are being removed by UAX 31 and the current Unicode database against the
current list.

For emoji, I think it's also probably not clear to people who don't handle
text just how complicated they are. Simply allowing class Emoji would be
utterly insufficient. The regex for checking if something _might_ be a
valid emoji, per the Unicode standard:

\p{RI} \p{RI}
| \p{Emoji}
  ( \p{EMod}
  | \x{FE0F} \x{20E3}?
  | [\x{E0020}-\x{E007E}]+ \x{E007F} )?
  (\x{200D} \p{Emoji}
    ( \p{EMod}
    | \x{FE0F} \x{20E3}?
    | [\x{E0020}-\x{E007E}]+ \x{E007F} )?
  )*

http://www.unicode.org/reports/tr51/#Emoji_Sequences
I believe cutting off all of the extension mechanisms for emoji , such
as for gender or skin tone, to be unacceptable. However the
implementation cost in the lexer would be quite high.

On Thu, Jun 18, 2020 at 4:36 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> On 6/18/20 3:14 PM, Alisdair Meredith via SG16 wrote:
>
> It is not clear we would increase consensus,
> as we got feedback only from those who were
> concerned at the lack of emoji support. We
> don't know how many others might switch
> away from their support if emoji support were
> added.
>
> I would probably switch from in favor to
> against for this, as I find emoji unclear and
> often misleading in communicating meaning,
> although perhaps some smaller subset of the
> emoji space might be clearer?
>
> Note that I’m not saying to NOT do the work
> to clarify the cost/benefit of supporting emoji,
> just that it is not clear whether it will increase,
> reduce, or simply change consensus. More
> information in a paper is usually helpful though.
>
> Agreed with all of the above.
>
> There were quite a few abstentions. My guess is that a number of people
> felt undecided for other reasons. Perhaps ambivalence due to a perception
> that extended characters are not used in practice, or perhaps difficulty
> with appreciating the impact of the change.
>
> It is challenging to get an intuitive sense of what identifiers are in or
> out by comparing the list of code points in [lex.name]p1
> <http://eel.is/c++draft/lex.name#1> vs the list of code points with
> XID_Start/XID_Continue properties listed in the paper
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1949r4.html#appendix-a---xid_start-code-points>.
> Perhaps we can better compare and present how these lists differs? Perhaps
> with a table illustrating included and excluded identifiers?
>
> I think it might help increase confidence as well if we can collect more
> data regarding how extended characters are used in practice.
>
> Tom.
>
> AlisdairM
>
>
> On Jun 18, 2020, at 19:55, Jens Maurer via SG16 <sg16_at_[hidden]> <sg16_at_[hidden]> wrote:
>
> So, it seems we would increase consensus in EWG if we
> added emojis to the valid identifier characters.
>
> That also gets us zero-width joiners (ZWJ):https://www.unicode.org/reports/tr51/#gender-neutral
>
> but maybe we can limit the fall-out by allowing ZWJ
> only inside of sequences of emojis, although I hate
> to burden compilers with even more special rules around
> the source code text (beyond NFC).
>
> Jens
> --
> SG16 mailing listSG16_at_[hidden]https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org