On Tue, Jun 4, 2019 at 5:39 AM Lyberta <lyberta@lyberta.net> wrote:

We can always modify the standard so that we get strong types via
compiler magic. I was thinking:

utf8'a' -> std::unicode::utf8_code_unit
utf16'a' -> std::unicode::utf16_code_unit
utf32'a' -> std::unicode::utf32_code_unit
utf8"a" -> std::unicode::utf8_code_unit_sequence_view
utf16"a" -> std::unicode::utf16_code_unit_sequence_view
utf32"a" -> std::unicode::utf32_code_unit_sequence_view

Well, that's future. I want something I can use now.

Also, does the standard require well formed sequences in literals?

No, we lobbied specifically that you can insert "ill-formed" sequences (e.g., not perfectly well formed Unicode Scalar Values) into string literals. This is specifically to enable people who need literals of types that are not exactly conformant for various reasons (testing, or specifically creating WTF8/CESU8/etc. literals, and more).

Granted, the only way you can do this is by writing `\x` values specifically in the string literal: it's a very powerful show that someone is doing something non-standard. That doesn't mean you can't assume char8_t, char16_t, and char32_t are not well-formed: if someone's shoving in direct code unit values with backslash-X syntax, you have to assume they are a Very Smart Person Who Knows What They Are Getting Themselves Into.