Date: Wed, 5 Dec 2018 01:40:27 -0500
On 12/4/18 11:17 PM, Lyberta wrote:
> This is something that hit me recently. Why are we using fundamental
> types for code units? CppCon 2018 is full of people saying that we
> should migrate to strong types, that std::size_t should have been a
> struct, etc.
The primary reason for using fundamental types for code units is that
those are the types used for character and string literals.
>
> I propose we add strong types for code units:
>
> * utf8_code_unit
> * utf16_code_unit
> * utf32_code_unit
>
> These will hold char8,16,32_t inside of them respectively but will not
> allow the invalid values such as >245 for UTF-8, surrogates and
>> 0x10FFFF for UTF-32, etc.
> This will guarantee that all code units are valid and will allow us to
> write much faster code because we will never need to check for invalid
> values.
The downside of such validating types is the validation overhead.
I am in favor of introducing strong types for code points.
Tom.
> This is something that hit me recently. Why are we using fundamental
> types for code units? CppCon 2018 is full of people saying that we
> should migrate to strong types, that std::size_t should have been a
> struct, etc.
The primary reason for using fundamental types for code units is that
those are the types used for character and string literals.
>
> I propose we add strong types for code units:
>
> * utf8_code_unit
> * utf16_code_unit
> * utf32_code_unit
>
> These will hold char8,16,32_t inside of them respectively but will not
> allow the invalid values such as >245 for UTF-8, surrogates and
>> 0x10FFFF for UTF-32, etc.
> This will guarantee that all code units are valid and will allow us to
> write much faster code because we will never need to check for invalid
> values.
The downside of such validating types is the validation overhead.
I am in favor of introducing strong types for code points.
Tom.
Received on 2018-12-05 07:40:32