C++ Logo


Advanced search

Subject: [SG16-Unicode] On the usage of grapheme clusters
From: Martinho Fernandes (rmf_at_[hidden])
Date: 2018-06-20 05:03:46

On 19.06.18 22:34, Sergey Zubkov wrote:
>> if you truly want to work with text, you usually need to work on the
> layer above code points - grapheme clusters.
> What makes interactive selection (which uses GCs) more "true" than
> rendering or  collation (to give two examples of work with text that use
> other kinds of code point sequences)?
> Perhaps we could use a list of uses.

This was off-topic in the other thread, but I want to mention that
Unicode 11 finally added a paragraph to UAX#29 describing what grapheme
clusters are for. I find that extremely important because IMO people are
constantly mistaken about the usefulness of grapheme clusters. That
paragraph reads:

"Grapheme clusters can be treated as units, by default, for processes
such as the formatting of drop caps, as well as the implementation of
text selection, arrow key movement or backspacing through text, and so
forth. For example, when a grapheme cluster is represented internally by
a character sequence consisting of base character + accents, then using
the right arrow key would skip from the start of the base character to
the end of the last accent."

This is important to keep in mind because the evolution of grapheme
cluster rules is driven primarily by these use cases; other properties
the current rules may have are accidental and should not be relied upon.


SG16 list run by herb.sutter at gmail.com