Subject: Re: Emojis in identifiers
From: Martinho Fernandes (rmf_at_[hidden])
Date: 2020-06-22 10:02:45

I was one of the people voting "Against" on the telecon. The discussion
convinced me we didn't give proper consideration to this matter, so I
wanted us to go back and do so.

We did. Enough that I am now convinced that the paper should be adopted
as-is. The paper brings us back to a meaningful, thought-out (not by us,
but still), and conservative state from which to evolve. Everything else
should come in additional papers because they have way too many things to

My reasoning is informed by these three observations:

1. Emoji are not supported

It is now clear to me that supporting emoji is orthogonal to this paper.
Despite appearances, the status quo is not that emoji are supported; the
fact that some emojis work at present is an accident.

There are several examples of emojis that are presently not allowed in
identifiers, and this is not referring to *new* ones. In fact, new emoji
are automatically allowed because everything outside the BMP is allowed. An
example emoji that is not allowed is "frowning face", because it uses
U+2639 WHITE FROWNING FACE and this is outside of the currently allowed

If we want emoji identifiers to actually be supported we're going to need a
paper; the status quo doesn't cut it. P1949 is one of the two options we
have for emoji support: don't. A paper for the other option doesn't exist

2. The status quo is broken

The status quo allows arbitrary use of all sorts of format characters, tag
characters, etc. One could argue that this is ok, but I think it's worth
noting how the status quo already doesn't allow arbitrary use of combining
characters. The standard actually goes out of its way to explicitly
disallow their arbitrary use. I struggle to see how the reasoning for that
does not also apply to format characters, tag characters, etc. To me an
identifier that starts with ZWJ or with an emoji variation selector makes
as much sense as one starting with a combining acute.

3. The status quo is broken

I think most people would agree that things like e.g. punctuation do not
belong in identifiers. This is also evidenced by the fact that the status
quo forbids, among others, U+2018 LEFT SINGLE QUOTATION MARK. And yet the
status quo, because it allows everything outside the BMP, past, present,
future, allows some punctuation that is already assigned, and will allow
punctuation that is yet to be assigned. You can s/punctuation/similar
groups/ and this argument applies similarly.

To sum up: I think this paper should go in as-is, and if we really want
emoji identifiers, it should come in a separate paper.

On Thu, Jun 18, 2020 at 8:55 PM Jens Maurer via SG16 <sg16_at_[hidden]>

> So, it seems we would increase consensus in EWG if we
> added emojis to the valid identifier characters.
> That also gets us zero-width joiners (ZWJ):
> https://www.unicode.org/reports/tr51/#gender-neutral
> but maybe we can limit the fall-out by allowing ZWJ
> only inside of sequences of emojis, although I hate
> to burden compilers with even more special rules around
> the source code text (beyond NFC).
> Jens
