sg16: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

From: keld_at <keld_at_[hidden]>
Date: Sat, 27 Apr 2019 17:41:38 +0200

On Sat, Apr 27, 2019 at 01:53:00PM +0000, Lyberta wrote:
> > posix advocates using utf-8 as the externel chaset and 32 bit 10646 as internal widechar encoding
>
> But doesn't require it. That's the point.

yes, that is the point. posix does not require unicode, it is character set independent.
and so is c++, and that is the way is should be and remain.
>
> >
> >> As for C, C only has char16_t, char32_t, minimalistic literal support...
> >> I'm not sure about standard library. That's about 2% of Unicode support.
> >
> > also look at iso 14651, 14652
>
> I don't see that in C standard library.

correct.
much of it is implemented in glibc.

> >
> > what is missing? i know: quite a lot, but please examplify
> >
>
> A full set of std::vector operations on sequences of scalar values.

will normal widechar functionlity suffice?

> A full set of std::vector operations on sequences of grapheme clusters.

yes that is probably new specific unicode functionality

> Querying properties of scalar values.

some of it is there , isalpha etc.

> Normalization.

not needed. but could be added.

> Case conversion.
toupper, tolower etc.

> Regular expressions.

strcmp and friends
14651 has the fundamentals for all of 10646.

keld

Received on 2019-04-27 17:41:39