Subject: Re: [SG16-Unicode] code_unit_sequence and code_point_sequence
From: Lyberta (lyberta_at_[hidden])
Date: 2018-06-19 10:09:00
> [mjz] This is one approach. Another is Zach's opinionated "there is only one storage container" approach.
Zach's approach is exactly what I don't want to see in the standard. His
type only supports UTF-8.
As we see with std::chrono. Encoding form should be a template
parameter. Nothing restricts us from standardizing
std::dynamic_encoding_form where code unit type is compile-time while
its meaning is determined at runtime.
I only advantage of std::basic_string over std::vector is Small Buffer
Optimization. Perhaps we can work with LEWG to standardize something
like sbo_vector. Then code_unit_sequence could just take it as template
parameter but require value_type be std::byte.
The heirarchy would then be from bottom to top:
Where each template will use the previous one in its implementation. Of
course, this is just the default hierarchy. A user can manually opt-in for:
* std::vector<char16_t> // For UTF-16 case, for example.
* std::code_point_sequence // Basically no-ops on this layer. This case
is typical for TMP.
I'm baffled a bit about Zach's design. He goes 100% templates above the
code point level, there was no need to restrict his "string layer" to
UTF-8, especially since implementing code point iteration is much easier
than grapheme cluster and higher ones which he did implement.
SG16 list run by herb.sutter at gmail.com