Date: Sat, 28 Dec 2019 10:26:00 +0000
Thiago Macieira via Std-Discussion:
> std::char_traits is used by std::string, which is not a text container. The
> ones being created for std::text is a different story.
>
The latest SG16 proposal for std::text from ThePhD still uses
std::basic_string as default container for code units. I find it
short-sighted.
Looking at design of std::basic_string, it looks like the Traits
template parameter is designed to hold all the metadata about encoding
while CharT is just a C compatibility thing. So to have proper text
encoding you would provide your own traits like this:
using ascii_string = std::basic_string<char, ascii_traits>;
using ebcdic_string = std::basic_string<char, ebcdic_traits>;
And "char" is this polymorphic type with wild semantics that can't be
operated on without locale and stuff. Basically, insane type-unsafe
design going from C and char*.
Of course, proper design would use text encoding as template parameter
like this:
using ascii_string = std::basic_string<ascii>;
using ebcdic_string = std::basic_string<ebcdic>;
And then coming back to the topic of regex:
using ascii_regex = std::basic_regex<ascii>;
using ebcdic_regex = std::basic_regex<ebcdic>;
Which means that current design of regex from C++11 (and text processing
in general) is broken and requires the new design and deprecation of old
one.
> std::char_traits is used by std::string, which is not a text container. The
> ones being created for std::text is a different story.
>
The latest SG16 proposal for std::text from ThePhD still uses
std::basic_string as default container for code units. I find it
short-sighted.
Looking at design of std::basic_string, it looks like the Traits
template parameter is designed to hold all the metadata about encoding
while CharT is just a C compatibility thing. So to have proper text
encoding you would provide your own traits like this:
using ascii_string = std::basic_string<char, ascii_traits>;
using ebcdic_string = std::basic_string<char, ebcdic_traits>;
And "char" is this polymorphic type with wild semantics that can't be
operated on without locale and stuff. Basically, insane type-unsafe
design going from C and char*.
Of course, proper design would use text encoding as template parameter
like this:
using ascii_string = std::basic_string<ascii>;
using ebcdic_string = std::basic_string<ebcdic>;
And then coming back to the topic of regex:
using ascii_regex = std::basic_regex<ascii>;
using ebcdic_regex = std::basic_regex<ebcdic>;
Which means that current design of regex from C++11 (and text processing
in general) is broken and requires the new design and deprecation of old
one.
Received on 2019-12-28 04:29:05