Date: Tue, 23 Jun 2020 13:56:24 +0200
On Mon, Jun 22, 2020 at 6:24 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> On 6/19/20 11:07 AM, Steve Downey via SG16 wrote:
>
> While we don't exclude scripts generally, by not doing script analysis,
> the lack of ZWJ and ZWNJ makes some words in Indic scripts problematic. The
> examples in
> https://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters are
> relevant. Zero Width Joiner and Zero Width Non-Joiner are used in
> Farsi, Malayalam, and Sinhala.
>
> Perhaps a revision of the paper can note the possibility of such script
> analysis as a possible future direction?
>
>
> Wikipedia https://en.wikipedia.org/wiki/Zero-width_joiner#Examples mentions
> Devanagari and Kannada, although it appears that recent editions of Unicode
> may have added explicit characters in Devanagari to alleviate the problem.
>
> It isn't clear to me that we have a good list of scripts that are
> (partially) excluded by P1949R4. Is it reasonable to identify that set and
> include it in a revision? Perhaps noting that Unicode is evolving to
> better handle them such that they will not be excluded in the future?
>
>
> Script recognition would also be necessary to identify the "emoji" script
> to allow sequences, as well as expanding the repertoire of allowed
> characters to include the currently explicitly disallowed emoji, the ones
> that were known at the time the allowed character ranges in C++ was put
> together.
>
> And finally, perhaps the next revision can acknowledge this as a
> possibility and attempt to qualify the technical impact? I suspect JF will
> want to poll inclusion of emoji, so the more we can do to inform EWG on the
> consequences of doing so, the better.
>
> Tom.
>
Since the telecon, like Martinho, I've been closely following the
discussion to better understand the points raised by the authors of the
paper.
My main concern is not Emoji per se, but other relevant scripts that might
fall through the cracks of UAX#31. I like the approach that anyone that
wants Emoji should try and change UAX#31, but the limitations that are
being suggested should be clear.
Although, I still don't think the paper should be accepted *as-is*, but I
do agree that the clarifications proposed above by Tom would go a long way
to convince me (and maybe others?) that UAX#31 is the only sensible way to
go.
-Marcos
sg16_at_[hidden]> wrote:
> On 6/19/20 11:07 AM, Steve Downey via SG16 wrote:
>
> While we don't exclude scripts generally, by not doing script analysis,
> the lack of ZWJ and ZWNJ makes some words in Indic scripts problematic. The
> examples in
> https://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters are
> relevant. Zero Width Joiner and Zero Width Non-Joiner are used in
> Farsi, Malayalam, and Sinhala.
>
> Perhaps a revision of the paper can note the possibility of such script
> analysis as a possible future direction?
>
>
> Wikipedia https://en.wikipedia.org/wiki/Zero-width_joiner#Examples mentions
> Devanagari and Kannada, although it appears that recent editions of Unicode
> may have added explicit characters in Devanagari to alleviate the problem.
>
> It isn't clear to me that we have a good list of scripts that are
> (partially) excluded by P1949R4. Is it reasonable to identify that set and
> include it in a revision? Perhaps noting that Unicode is evolving to
> better handle them such that they will not be excluded in the future?
>
>
> Script recognition would also be necessary to identify the "emoji" script
> to allow sequences, as well as expanding the repertoire of allowed
> characters to include the currently explicitly disallowed emoji, the ones
> that were known at the time the allowed character ranges in C++ was put
> together.
>
> And finally, perhaps the next revision can acknowledge this as a
> possibility and attempt to qualify the technical impact? I suspect JF will
> want to poll inclusion of emoji, so the more we can do to inform EWG on the
> consequences of doing so, the better.
>
> Tom.
>
Since the telecon, like Martinho, I've been closely following the
discussion to better understand the points raised by the authors of the
paper.
My main concern is not Emoji per se, but other relevant scripts that might
fall through the cracks of UAX#31. I like the approach that anyone that
wants Emoji should try and change UAX#31, but the limitations that are
being suggested should be clear.
Although, I still don't think the paper should be accepted *as-is*, but I
do agree that the clarifications proposed above by Tom would go a long way
to convince me (and maybe others?) that UAX#31 is the only sensible way to
go.
-Marcos
Received on 2020-06-23 07:00:14