sg16: Re: [SG16-Unicode] Strong code unit types

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 5 Dec 2018 01:40:27 -0500

On 12/4/18 11:17 PM, Lyberta wrote:
> This is something that hit me recently. Why are we using fundamental
> types for code units? CppCon 2018 is full of people saying that we
> should migrate to strong types, that std::size_t should have been a
> struct, etc.
The primary reason for using fundamental types for code units is that
those are the types used for character and string literals.
>
> I propose we add strong types for code units:
>
> * utf8_code_unit
> * utf16_code_unit
> * utf32_code_unit
>
> These will hold char8,16,32_t inside of them respectively but will not
> allow the invalid values such as >245 for UTF-8, surrogates and
>> 0x10FFFF for UTF-32, etc.
> This will guarantee that all code units are valid and will allow us to
> write much faster code because we will never need to check for invalid
> values.

The downside of such validating types is the validation overhead.

I am in favor of introducing strong types for code points.

Tom.

Received on 2018-12-05 07:40:32