I was one of the people voting "Against" on the telecon. The discussion convinced me we didn't give proper consideration to this matter, so I wanted us to go back and do so.

We did. Enough that I am now convinced that the paper should be adopted as-is. The paper brings us back to a meaningful, thought-out (not by us, but still), and conservative state from which to evolve. Everything else should come in additional papers because they have way too many things to consider.

My reasoning is informed by these three observations:

1. Emoji are not supported

It is now clear to me that supporting emoji is orthogonal to this paper. Despite appearances, the status quo is not that emoji are supported; the fact that some emojis work at present is an accident.

There are several examples of emojis that are presently not allowed in identifiers, and this is not referring to *new* ones. In fact, new emoji are automatically allowed because everything outside the BMP is allowed. An example emoji that is not allowed is "frowning face", because it uses U+2639 WHITE FROWNING FACE and this is outside of the currently allowed ranges.

If we want emoji identifiers to actually be supported we're going to need a paper; the status quo doesn't cut it. P1949 is one of the two options we have for emoji support: don't. A paper for the other option doesn't exist yet.

2. The status quo is broken

The status quo allows arbitrary use of all sorts of format characters, tag characters, etc. One could argue that this is ok, but I think it's worth noting how the status quo already doesn't allow arbitrary use of combining characters. The standard actually goes out of its way to explicitly disallow their arbitrary use. I struggle to see how the reasoning for that does not also apply to format characters, tag characters, etc. To me an identifier that starts with ZWJ or with an emoji variation selector makes as much sense as one starting with a combining acute.

3. The status quo is broken

I think most people would agree that things like e.g. punctuation do not belong in identifiers. This is also evidenced by the fact that the status quo forbids, among others, U+2018 LEFT SINGLE QUOTATION MARK. And yet the status quo, because it allows everything outside the BMP, past, present, future, allows some punctuation that is already assigned, and will allow punctuation that is yet to be assigned. You can s/punctuation/similar groups/ and this argument applies similarly.

To sum up: I think this paper should go in as-is, and if we really want emoji identifiers, it should come in a separate paper.

On Thu, Jun 18, 2020 at 8:55 PM Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:
So, it seems we would increase consensus in EWG if we
added emojis to the valid identifier characters.

That also gets us zero-width joiners (ZWJ):
https://www.unicode.org/reports/tr51/#gender-neutral

but maybe we can limit the fall-out by allowing ZWJ
only inside of sequences of emojis, although I hate
to burden compilers with even more special rules around
the source code text (beyond NFC).

Jens
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16