<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 18 Jun 2020 at 23:19, Steve Downey via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org">sg16@lists.isocpp.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I&#39;ll see if I can put together a list that makes sense of what characters are being removed by UAX 31 and the current Unicode database against the current list. <br><br>For emoji, I think it&#39;s also probably not clear to people who don&#39;t handle text just how complicated they are. Simply allowing class Emoji would be utterly insufficient. The regex for checking if something _might_ be a valid emoji, per the Unicode standard:<br><pre><font color="#000000">\p{RI} \p{RI} 
| \p{Emoji} 
  ( \p{EMod} 
  | \x{FE0F} \x{20E3}? 
  | [\x{E0020}-\x{E007E}]+ \x{E007F} )?
  (\x{200D} \p{Emoji}
    ( \p{EMod} 
    | \x{FE0F} \x{20E3}? 
    | [\x{E0020}-\x{E007E}]+ \x{E007F} )?
  )*



</font><a href="http://www.unicode.org/reports/tr51/#Emoji_Sequences" style="color:rgb(0,0,0)" target="_blank">http://www.unicode.org/reports/tr51/#Emoji_Sequences</a><font color="#000000">

</font><font face="arial, sans-serif">I believe cutting off all of the extension mechanisms for emoji , such as for gender or skin tone, to be unacceptable. However the implementation cost in the lexer would be quite high. </font></pre></div></blockquote><div><br></div><div>Can we agree that we shout support unicode fully or not at all (exactly because of gender and skin tons, etc) ?</div><div><br></div><div>Parsing emojis is probably not sufficient / non trivial. simple solution is to stick to the list of supported emojis for inter exchange which</div><div>is finite (about 3000 or so elements I think)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 18, 2020 at 4:36 PM Tom Honermann via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org" target="_blank">sg16@lists.isocpp.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <div>On 6/18/20 3:14 PM, Alisdair Meredith
      via SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      <pre>It is not clear we would increase consensus,
as we got feedback only from those who were
concerned at the lack of emoji support.  We
don&#39;t know how many others might switch
away from their support if emoji support were
added.

I would probably switch from in favor to
against for this, as I find emoji unclear and
often misleading in communicating meaning,
although perhaps some smaller subset of the
emoji space might be clearer?

Note that I’m not saying to NOT do the work
to clarify the cost/benefit of supporting emoji,
just that it is not clear whether it will increase,
reduce, or simply change consensus.  More
information in a paper is usually helpful though.</pre>
    </blockquote>
    <p>Agreed with all of the above.</p>
    <p>There were quite a few abstentions.  My guess is that a number of
      people felt undecided for other reasons.  Perhaps ambivalence due
      to a perception that extended characters are not used in practice,
      or perhaps difficulty with appreciating the impact of the change.</p>
    <p>It is challenging to get an intuitive sense of what identifiers
      are in or out by comparing the list of code points in <a href="http://eel.is/c++draft/lex.name#1" target="_blank">[lex.name]p1</a>
      vs the list of code points with XID_Start/XID_Continue properties
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1949r4.html#appendix-a---xid_start-code-points" target="_blank">listed
        in the paper</a>.  Perhaps we can better compare and present how
      these lists differs?  Perhaps with a table illustrating included
      and excluded identifiers?</p>
    <p>I think it might help increase confidence as well if we can
      collect more data regarding how extended characters are used in
      practice.<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite">
      <pre>AlisdairM

</pre>
      <blockquote type="cite">
        <pre>On Jun 18, 2020, at 19:55, Jens Maurer via SG16 <a href="mailto:sg16@lists.isocpp.org" target="_blank">&lt;sg16@lists.isocpp.org&gt;</a> wrote:

So, it seems we would increase consensus in EWG if we
added emojis to the valid identifier characters.

That also gets us zero-width joiners (ZWJ):
<a href="https://www.unicode.org/reports/tr51/#gender-neutral" target="_blank">https://www.unicode.org/reports/tr51/#gender-neutral</a>

but maybe we can limit the fall-out by allowing ZWJ
only inside of sequences of emojis, although I hate
to burden compilers with even more special rules around
the source code text (beyond NFC).

Jens
-- 
SG16 mailing list
<a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a>
</pre>
      </blockquote>
      <pre></pre>
    </blockquote>
    <p><br>
    </p>
  </div>

-- <br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a><br>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
</blockquote></div>
-- <br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a><br>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
</blockquote></div></div>

