C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] code_unit_sequence and code_point_sequence

From: Sergey Zubkov <cubbi_at_[hidden]>
Date: Tue, 19 Jun 2018 16:34:30 -0400
> if you truly want to work with text, you usually need to work on the
layer above code points - grapheme clusters.

What makes interactive selection (which uses GCs) more "true" than
rendering or collation (to give two examples of work with text that use
other kinds of code point sequences)?
Perhaps we could use a list of uses.

On Tue, Jun 19, 2018, 4:19 PM Lyberta <lyberta_at_[hidden]> wrote:

> keld_at_[hidden]:
> > Is your code point advisory the same as codepoints in 10646/Unicode, also
> > called characters in 10646?
>
> Yes. A code point is unsigned 32 bit integer with the values in the
> range of 0-10FFFF. Modern C and C++ have type char32_t which is most
> suitable for holding code points.
>
> > And why not just treat these as 32-bit wchar-t?
> > I believe this is what we do in C.
>
> Because wide execution character set is implementation defined. So far
> nobody has expressed opinion of changing that and Windows violates the
> standard by having 16 bit wchar_t.
>
> > Then you can have functions converting to and from wchar-t.
>
> Yes, except if you convert text to UTF-32 before processing it, you will
> waste memory and a lot of interfaces still expect char*. More
> importantly, if you truly want to work with text, you usually need to
> work on the layer above code points - grapheme clusters.
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2018-06-19 22:34:44