sg16: Re: [SG16-Unicode] D1628R0 (Unicode character properties)

From: Lyberta <lyberta_at_[hidden]>
Date: Thu, 28 Mar 2019 07:42:00 +0000

Corentin:
> As requested by Tom, please find attach D1628R0 which will be discussed
> during today's meeting \N{WHITE EXCLAMATION MARK ORNAMENT}
>
> Feedback welcome :)

Do we really want std::uni? std::unicode seems much better.

Unicode always uses the term "code point", not "code point":
https://www.unicode.org/glossary/#code_point

So the name should be std::uni[code]::code_point.

In my experience, I never need the code point because surrogates are not
allowed in valid UTF. I only ever need unicode scalar values:
https://www.unicode.org/glossary/#unicode_scalar_value

Hence I think using code point interfaces should be discouraged.

I think constructing code points or scalar values from char8_t or
char16_t makes no sense. They are at the different levels.

I'm writing a competing proposal where I want to propose
std::unicode_code_point and std::unicode_scalar_value that have explicit
constructors from char32_t and explicit member function .value() to get
char32_t back. I think this is the only way forward. char8_t, char16_t
and char32_t are dumb types that have horrible names, we should o.nly
use them as a transition mechanism.

I'm gonna try to finish the early draft of my proposal and after release
of GCC 9 I'm gonna port my entire code base on its design so I will have
usage experience with it.

Received on 2019-03-28 08:49:49