C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] String views with strong code unit types

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Tue, 4 Jun 2019 06:27:40 -0400
On Tue, Jun 4, 2019 at 5:39 AM Lyberta <lyberta_at_[hidden]> wrote:

> We can always modify the standard so that we get strong types via
> compiler magic. I was thinking:
>
> utf8'a' -> std::unicode::utf8_code_unit
> utf16'a' -> std::unicode::utf16_code_unit
> utf32'a' -> std::unicode::utf32_code_unit
> utf8"a" -> std::unicode::utf8_code_unit_sequence_view
> utf16"a" -> std::unicode::utf16_code_unit_sequence_view
> utf32"a" -> std::unicode::utf32_code_unit_sequence_view
>
> Well, that's future. I want something I can use now.
>
> Also, does the standard require well formed sequences in literals?
>

No, we lobbied specifically that you can insert "ill-formed" sequences
(e.g., not perfectly well formed Unicode Scalar Values) into string
literals. This is specifically to enable people who need literals of types
that are not exactly conformant for various reasons (testing, or
specifically creating WTF8/CESU8/etc. literals, and more).

Granted, the only way you can do this is by writing `\x` values
specifically in the string literal: it's a very powerful show that someone
is doing something non-standard. That doesn't mean you can't assume
char8_t, char16_t, and char32_t are not well-formed: if someone's shoving
in direct code unit values with backslash-X syntax, you have to assume they
are a Very Smart Person Who Knows What They Are Getting Themselves Into.

Received on 2019-06-04 12:27:53