sg16: Re: [SG16-Unicode] On the usage of grapheme clusters

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 20 Jun 2018 07:19:24 -0400

Another use case is finding a sequence that fits in a buffer of fixed size.
While truncation at arbitrary byte is self evidently wrong, truncation at a
code point can be problematic too. This is related to text selection, but
with the additional difficulty of not having a person in the decision loop.

On Wed, Jun 20, 2018, 06:05 Martinho Fernandes <rmf_at_[hidden]> wrote:

> On 19.06.18 22:34, Sergey Zubkov wrote:
> >> if you truly want to work with text, you usually need to work on the
> > layer above code points - grapheme clusters.
> >
> > What makes interactive selection (which uses GCs) more "true" than
> > rendering or collation (to give two examples of work with text that use
> > other kinds of code point sequences)?
> > Perhaps we could use a list of uses.
>
> This was off-topic in the other thread, but I want to mention that
> Unicode 11 finally added a paragraph to UAX#29 describing what grapheme
> clusters are for. I find that extremely important because IMO people are
> constantly mistaken about the usefulness of grapheme clusters. That
> paragraph reads:
>
> "Grapheme clusters can be treated as units, by default, for processes
> such as the formatting of drop caps, as well as the implementation of
> text selection, arrow key movement or backspacing through text, and so
> forth. For example, when a grapheme cluster is represented internally by
> a character sequence consisting of base character + accents, then using
> the right arrow key would skip from the start of the base character to
> the end of the last accent."
>
> This is important to keep in mind because the evolution of grapheme
> cluster rules is driven primarily by these use cases; other properties
> the current rules may have are accidental and should not be relied upon.
>
> --
> Martinho
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2018-06-20 13:19:37