That literals aren't required to be well formed is a subset of the problem that char8_t data may have come from anywhere and can't be assumed to be well formed. Real world text is frequently broken. 
 


On Tue, Jun 4, 2019, 06:27 JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
On Tue, Jun 4, 2019 at 5:39 AM Lyberta <lyberta@lyberta.net> wrote:
We can always modify the standard so that we get strong types via
compiler magic. I was thinking:

utf8'a' -> std::unicode::utf8_code_unit
utf16'a' -> std::unicode::utf16_code_unit
utf32'a' -> std::unicode::utf32_code_unit
utf8"a" -> std::unicode::utf8_code_unit_sequence_view
utf16"a" -> std::unicode::utf16_code_unit_sequence_view
utf32"a" -> std::unicode::utf32_code_unit_sequence_view

Well, that's future. I want something I can use now.

Also, does the standard require well formed sequences in literals?

No, we lobbied specifically that you can insert "ill-formed" sequences (e.g., not perfectly well formed Unicode Scalar Values) into string literals. This is specifically to enable people who need literals of types that are not exactly conformant for various reasons (testing, or specifically creating WTF8/CESU8/etc. literals, and more).

Granted, the only way you can do this is by writing `\x` values specifically in the string literal: it's a very powerful show that someone is doing something non-standard. That doesn't mean you can't assume char8_t, char16_t, and char32_t are not well-formed: if someone's shoving in direct code unit values with backslash-X syntax, you have to assume they are a Very Smart Person Who Knows What They Are Getting Themselves Into.
_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode