Date: Mon, 09 Sep 2019 17:42:00 +0000
Tony V E:
> Do we have / could we have / should we have
> a clear long term (20 years) direction for text in C++?
>
> ie the long term direction is unicode.
> and/or specifically the long term direction is UTF8.
> We expect everyone to use char8_t then? Or we expect char to become utf8
> someday?
> What do we want the long term future to look like?
> deprecate std::string?
Here's how I imagine perfect text APIs in C++:
Remove char and wchar_t and remove all library APIs that depend on them.
That includes iostreams, std::char_traits, std::basic_string,
std::regex, std::format, entire C library for text, etc.
Replace char8_t with std::unicode::utf8_code_unit.
Replace char16_t with std::unicode::utf16_code_unit.
Replace char32_t with std::unicode::utf32_code_unit.
Add new literals:
utf8'a' -> std::unicode::utf8_code_unit.
utf16'a' -> std::unicode::utf16_code_unit.
utf32'a' -> std::unicode::utf32_code_unit.
utf8"foo" -> const std::unicode::utf8_code_unit[3].
utf16"foo" -> const std::unicode::utf16_code_unit[3].
utf32"foo" -> const std::unicode::utf32_code_unit[3].
Note the lack of NUL-termination.
Add Unicode libraries:
Storage:
std::unicode::code_unit_sequence
std::unicode::scalar_value_sequence
std::unicode::grapheme_cluster_sequence
std::unicode::text
View versions of those.
New Unicode streams, new Unicode regex, new Unicode formatting, etc.
> Do we have / could we have / should we have
> a clear long term (20 years) direction for text in C++?
>
> ie the long term direction is unicode.
> and/or specifically the long term direction is UTF8.
> We expect everyone to use char8_t then? Or we expect char to become utf8
> someday?
> What do we want the long term future to look like?
> deprecate std::string?
Here's how I imagine perfect text APIs in C++:
Remove char and wchar_t and remove all library APIs that depend on them.
That includes iostreams, std::char_traits, std::basic_string,
std::regex, std::format, entire C library for text, etc.
Replace char8_t with std::unicode::utf8_code_unit.
Replace char16_t with std::unicode::utf16_code_unit.
Replace char32_t with std::unicode::utf32_code_unit.
Add new literals:
utf8'a' -> std::unicode::utf8_code_unit.
utf16'a' -> std::unicode::utf16_code_unit.
utf32'a' -> std::unicode::utf32_code_unit.
utf8"foo" -> const std::unicode::utf8_code_unit[3].
utf16"foo" -> const std::unicode::utf16_code_unit[3].
utf32"foo" -> const std::unicode::utf32_code_unit[3].
Note the lack of NUL-termination.
Add Unicode libraries:
Storage:
std::unicode::code_unit_sequence
std::unicode::scalar_value_sequence
std::unicode::grapheme_cluster_sequence
std::unicode::text
View versions of those.
New Unicode streams, new Unicode regex, new Unicode formatting, etc.
Received on 2019-09-09 19:42:23