C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] String views with strong code unit types
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-06-04 06:11:59


That literals aren't required to be well formed is a subset of the problem
that char8_t data may have come from anywhere and can't be assumed to be
well formed. Real world text is frequently broken.

On Tue, Jun 4, 2019, 06:27 JeanHeyd Meneide <phdofthehouse_at_[hidden]> wrote:

> On Tue, Jun 4, 2019 at 5:39 AM Lyberta <lyberta_at_[hidden]> wrote:
>
>> We can always modify the standard so that we get strong types via
>> compiler magic. I was thinking:
>>
>> utf8'a' -> std::unicode::utf8_code_unit
>> utf16'a' -> std::unicode::utf16_code_unit
>> utf32'a' -> std::unicode::utf32_code_unit
>> utf8"a" -> std::unicode::utf8_code_unit_sequence_view
>> utf16"a" -> std::unicode::utf16_code_unit_sequence_view
>> utf32"a" -> std::unicode::utf32_code_unit_sequence_view
>>
>> Well, that's future. I want something I can use now.
>>
>> Also, does the standard require well formed sequences in literals?
>>
>
> No, we lobbied specifically that you can insert "ill-formed" sequences
> (e.g., not perfectly well formed Unicode Scalar Values) into string
> literals. This is specifically to enable people who need literals of types
> that are not exactly conformant for various reasons (testing, or
> specifically creating WTF8/CESU8/etc. literals, and more).
>
> Granted, the only way you can do this is by writing `\x` values
> specifically in the string literal: it's a very powerful show that someone
> is doing something non-standard. That doesn't mean you can't assume
> char8_t, char16_t, and char32_t are not well-formed: if someone's shoving
> in direct code unit values with backslash-X syntax, you have to assume they
> are a Very Smart Person Who Knows What They Are Getting Themselves Into.
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>



SG16 list run by sg16-owner@lists.isocpp.org