C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Strong code unit types

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 5 Dec 2018 08:05:01 -0500
`codepoint` also, which is probably "just" a char32_t?

On Wed, Dec 5, 2018, 01:40 Tom Honermann <tom_at_[hidden] wrote:

> On 12/4/18 11:17 PM, Lyberta wrote:
> > This is something that hit me recently. Why are we using fundamental
> > types for code units? CppCon 2018 is full of people saying that we
> > should migrate to strong types, that std::size_t should have been a
> > struct, etc.
> The primary reason for using fundamental types for code units is that
> those are the types used for character and string literals.
> >
> > I propose we add strong types for code units:
> >
> > * utf8_code_unit
> > * utf16_code_unit
> > * utf32_code_unit
> >
> > These will hold char8,16,32_t inside of them respectively but will not
> > allow the invalid values such as >245 for UTF-8, surrogates and
> >> 0x10FFFF for UTF-32, etc.
> > This will guarantee that all code units are valid and will allow us to
> > write much faster code because we will never need to check for invalid
> > values.
>
> The downside of such validating types is the validation overhead.
>
> I am in favor of introducing strong types for code points.
>
> Tom.
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2018-12-05 14:05:16