Date: Tue, 19 Jun 2018 14:34:18 -0400
On 06/19/2018 11:09 AM, Lyberta wrote:
> Mark Zeren:
>> [mjz] This is one approach. Another is Zach's opinionated "there is only one storage container" approach.
> Zach's approach is exactly what I don't want to see in the standard. His
> type only supports UTF-8.
I also don't want to see a UTF-8 only text type; GB18030 is important in
China and UTF-16 isn't going away any time soon. And uses for Modified
UTF-8, CESU8, Shift-JIS, etc... will remain long in to the future.
>
> As we see with std::chrono. Encoding form should be a template
> parameter. Nothing restricts us from standardizing
> std::dynamic_encoding_form where code unit type is compile-time while
> its meaning is determined at runtime.
Agreed.
>
> I only advantage of std::basic_string over std::vector is Small Buffer
> Optimization. Perhaps we can work with LEWG to standardize something
> like sbo_vector.
There have been attempts. See P0274:
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0274r0.pdf
> Then code_unit_sequence could just take it as template
> parameter but require value_type be std::byte.
>
> The heirarchy would then be from bottom to top:
>
> * std::sbo_vector<std::byte>
> * std::code_unit_sequence
> * std::code_point_sequence
> * std::text
This is overspecification in my opinion. And like Martinho, I don't see
the point of code_unit_sequence (or code_point_sequence); that is a
concept, not a container.
>
> Where each template will use the previous one in its implementation. Of
> course, this is just the default hierarchy. A user can manually opt-in for:
>
> * std::vector<char16_t> // For UTF-16 case, for example.
> * std::code_point_sequence
> * std::text
>
> Or:
>
> * std::vector<char32_t>
> * std::text
>
> Or even:
>
> * std::vector<char32_t>
> * std::code_point_sequence // Basically no-ops on this layer. This case
> is typical for TMP.
> * std::text
Why do you think it is important to specify an underlying storage
container type for std::text?
>
> I'm baffled a bit about Zach's design. He goes 100% templates above the
> code point level, there was no need to restrict his "string layer" to
> UTF-8, especially since implementing code point iteration is much easier
> than grapheme cluster and higher ones which he did implement.
I'll let Zach speak for himself if he wishes to.
Tom.
> Mark Zeren:
>> [mjz] This is one approach. Another is Zach's opinionated "there is only one storage container" approach.
> Zach's approach is exactly what I don't want to see in the standard. His
> type only supports UTF-8.
I also don't want to see a UTF-8 only text type; GB18030 is important in
China and UTF-16 isn't going away any time soon. And uses for Modified
UTF-8, CESU8, Shift-JIS, etc... will remain long in to the future.
>
> As we see with std::chrono. Encoding form should be a template
> parameter. Nothing restricts us from standardizing
> std::dynamic_encoding_form where code unit type is compile-time while
> its meaning is determined at runtime.
Agreed.
>
> I only advantage of std::basic_string over std::vector is Small Buffer
> Optimization. Perhaps we can work with LEWG to standardize something
> like sbo_vector.
There have been attempts. See P0274:
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0274r0.pdf
> Then code_unit_sequence could just take it as template
> parameter but require value_type be std::byte.
>
> The heirarchy would then be from bottom to top:
>
> * std::sbo_vector<std::byte>
> * std::code_unit_sequence
> * std::code_point_sequence
> * std::text
This is overspecification in my opinion. And like Martinho, I don't see
the point of code_unit_sequence (or code_point_sequence); that is a
concept, not a container.
>
> Where each template will use the previous one in its implementation. Of
> course, this is just the default hierarchy. A user can manually opt-in for:
>
> * std::vector<char16_t> // For UTF-16 case, for example.
> * std::code_point_sequence
> * std::text
>
> Or:
>
> * std::vector<char32_t>
> * std::text
>
> Or even:
>
> * std::vector<char32_t>
> * std::code_point_sequence // Basically no-ops on this layer. This case
> is typical for TMP.
> * std::text
Why do you think it is important to specify an underlying storage
container type for std::text?
>
> I'm baffled a bit about Zach's design. He goes 100% templates above the
> code point level, there was no need to restrict his "string layer" to
> UTF-8, especially since implementing code point iteration is much easier
> than grapheme cluster and higher ones which he did implement.
I'll let Zach speak for himself if he wishes to.
Tom.
Received on 2018-06-19 20:41:48