C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Lyberta <lyberta_at_[hidden]>
Date: Mon, 09 Sep 2019 17:42:00 +0000
Tony V E:
> Do we have / could we have / should we have
> a clear long term (20 years) direction for text in C++?
>
> ie the long term direction is unicode.
> and/or specifically the long term direction is UTF8.
> We expect everyone to use char8_t then? Or we expect char to become utf8
> someday?
> What do we want the long term future to look like?
> deprecate std::string?
Here's how I imagine perfect text APIs in C++:

Remove char and wchar_t and remove all library APIs that depend on them.
That includes iostreams, std::char_traits, std::basic_string,
std::regex, std::format, entire C library for text, etc.

Replace char8_t with std::unicode::utf8_code_unit.
Replace char16_t with std::unicode::utf16_code_unit.
Replace char32_t with std::unicode::utf32_code_unit.

Add new literals:
utf8'a' -> std::unicode::utf8_code_unit.
utf16'a' -> std::unicode::utf16_code_unit.
utf32'a' -> std::unicode::utf32_code_unit.

utf8"foo" -> const std::unicode::utf8_code_unit[3].
utf16"foo" -> const std::unicode::utf16_code_unit[3].
utf32"foo" -> const std::unicode::utf32_code_unit[3].

Note the lack of NUL-termination.

Add Unicode libraries:
Storage:
std::unicode::code_unit_sequence
std::unicode::scalar_value_sequence
std::unicode::grapheme_cluster_sequence
std::unicode::text

View versions of those.

New Unicode streams, new Unicode regex, new Unicode formatting, etc.


Received on 2019-09-09 19:42:23