sg16: Re: [SG16-Unicode] code_unit_sequence and code_point

From: keld_at <keld_at_[hidden]>
Date: Wed, 20 Jun 2018 11:34:03 +0200

On Tue, Jun 19, 2018 at 09:52:05PM -0400, Tom Honermann wrote:
> On 06/19/2018 04:19 PM, Lyberta wrote:
> > keld_at_[hidden]:
> >> Is your code point advisory the same as codepoints in 10646/Unicode, also
> >> called characters in 10646?
> > Yes. A code point is unsigned 32 bit integer with the values in the
> > range of 0-10FFFF. Modern C and C++ have type char32_t which is most
> > suitable for holding code points.
> >
> >> And why not just treat these as 32-bit wchar-t?
> >> I believe this is what we do in C.
> > Because wide execution character set is implementation defined. So far
> > nobody has expressed opinion of changing that and Windows violates the
> > standard by having 16 bit wchar_t.
>
> Technically, Windows doesn't violate the standard by having a 16-bit
> wchar_t. It violates the standard by using a wide execution character
> set that defines code points that do not fit in it's (16-bit) wchar_t
> type. We have an issue (https://github.com/sg16-unicode/sg16/issues/9)
> to track modifying the standard to enable Microsoft's implementation to
> be conforming.

I believe that using a 16-bit wchar_t to handle UCS characters in a UTF-16 form is a violation of the
C++ standard. You need to do some processing of surrogates, that is not portable to
other platforms,and is against specs for wchar_t.

I do not think this obsoletes wchar_t, it should not lead to obsoletion
that some people use it wrongly.

Using a 16 bit wchar_t is ok if you restrict yourself to only a 16 bit subset of UCS.

I am happy to have a specific type to handle code points that are defined to have
UCS code point values. I just note that I think APIs to handle such a type would need to
have exactly the same functionality as for handling wchar_t entities.

Best regards
Keld

Received on 2018-06-20 11:34:03