On Mon, Jun 22, 2020 at 6:24 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:On 6/19/20 11:07 AM, Steve Downey via SG16 wrote:
Perhaps a revision of the paper can note the possibility of such script analysis as a possible future direction?While we don't exclude scripts generally, by not doing script analysis, the lack of ZWJ and ZWNJ makes some words in Indic scripts problematic. The examples in https://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters are relevant. Zero Width Joiner and Zero Width Non-Joiner are used in Farsi, Malayalam, and Sinhala.
Wikipedia https://en.wikipedia.org/wiki/Zero-width_joiner#Examples mentions Devanagari and Kannada, although it appears that recent editions of Unicode may have added explicit characters in Devanagari to alleviate the problem.
It isn't clear to me that we have a good list of scripts that are (partially) excluded by P1949R4. Is it reasonable to identify that set and include it in a revision? Perhaps noting that Unicode is evolving to better handle them such that they will not be excluded in the future?
Script recognition would also be necessary to identify the "emoji" script to allow sequences, as well as expanding the repertoire of allowed characters to include the currently explicitly disallowed emoji, the ones that were known at the time the allowed character ranges in C++ was put together.
And finally, perhaps the next revision can acknowledge this as a possibility and attempt to qualify the technical impact? I suspect JF will want to poll inclusion of emoji, so the more we can do to inform EWG on the consequences of doing so, the better.
Tom.
Since the telecon, like Martinho, I've been closely following the discussion to better understand the points raised by the authors of the paper.My main concern is not Emoji per se, but other relevant scripts that might fall through the cracks of UAX#31. I like the approach that anyone that wants Emoji should try and change UAX#31, but the limitations that are being suggested should be clear.
--Although, I still don't think the paper should be accepted as-is, but I do agree that the clarifications proposed above by Tom would go a long way to convince me (and maybe others?) that UAX#31 is the only sensible way to go.-Marcos
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16