Date: Fri, 12 Apr 2019 19:00:00 +0000
JeanHeyd Meneide:
> On Fri, Apr 12, 2019 at 6:45 AM Lyberta <lyberta_at_[hidden]> wrote:
>
>> I guess at least teachability and clean structure. The guidance would
>> be: "stuff in std is old and unusable for text, stuff in std::text is
>> new and usable".
Previously it was suggested to focus on Unicode so I no longer propose
std::text namespace but I think we should put Unicode into std::unicode.
It was also suggested to only provide proper API for Unicode and support
other character sets via transcoding. Hence maybe instead of std::text
we should have std::unicode::text.
In particular, I don't think this is a good idea to support holding ECS
or WECS inside std::[unicode::]text.
> The sub-namespace isn't really necessary here because we are not in
> competition for certain names or algorithms, save for the 3 names I want to
> specifically name `std::text_decode`, `std::text_encode`,
> `std::text_transcode`, and similar because it internally implies other
> semantics and I do not want to steal the names `decode` and `encode` when
> those are much more broad terms.
Is there a proposal for those?
>
> Other names such as `std::utf8`, `std::utf32`, `std::utf16`,
> `std::wide_execution`, and `std::narrow_execution` are fairly specific to
> the text domain and I don't see them clashing.
Maybe a bit offtopic but I don't think std::narrow_execution and
std::wide_execution are good names. I think appending _character_set
would make them less ambiguous.
> Regarding earlier points on what the standard does provide: the standard
> needs to provide encodings for all the encoding types that are (currently)
> pushed out by the standard, and nothing more. This includes: std::utf8,
> std::utf16, std::utf32, std::wide_execution, and std::narrow_execution.
I agree but I want to stress that this would be a good idea to provide
only minimal support for ECS and WECS (i.e. transcoding only) and just
let users migrate to Unicode.
> The
> standard should not vend any other encodings, but the Encoding and Decoding
> interfaces should be standard -- much like Allocator -- that allows a user
> to swap in their own class type and object that replaces the use of an
> encoding in any interface / function standard templates provide. (Similar
> to char_traits, except not as useless.) This means users can employ
> whatever encoding or power they have under the hood and enjoy fast and
> correct text processing so long as they follow the required semantics.
Again, it's been suggested to provide full-fledged API to Unicode only.
> Note that we cannot only ship utf8 as an encoding, because the standard
> already ships and acknowledges more than utf8 as one of the encoding for
> string literals. It would be highly dysfunctional to have utf16 string
> literals that the standard library itself cannot process in a reasonable
> manner.
Yes, supporting UTF-16 and UTF-32 is trivial if algorithms work in terms
of scalar values (which they should).
> On Fri, Apr 12, 2019 at 6:45 AM Lyberta <lyberta_at_[hidden]> wrote:
>
>> I guess at least teachability and clean structure. The guidance would
>> be: "stuff in std is old and unusable for text, stuff in std::text is
>> new and usable".
Previously it was suggested to focus on Unicode so I no longer propose
std::text namespace but I think we should put Unicode into std::unicode.
It was also suggested to only provide proper API for Unicode and support
other character sets via transcoding. Hence maybe instead of std::text
we should have std::unicode::text.
In particular, I don't think this is a good idea to support holding ECS
or WECS inside std::[unicode::]text.
> The sub-namespace isn't really necessary here because we are not in
> competition for certain names or algorithms, save for the 3 names I want to
> specifically name `std::text_decode`, `std::text_encode`,
> `std::text_transcode`, and similar because it internally implies other
> semantics and I do not want to steal the names `decode` and `encode` when
> those are much more broad terms.
Is there a proposal for those?
>
> Other names such as `std::utf8`, `std::utf32`, `std::utf16`,
> `std::wide_execution`, and `std::narrow_execution` are fairly specific to
> the text domain and I don't see them clashing.
Maybe a bit offtopic but I don't think std::narrow_execution and
std::wide_execution are good names. I think appending _character_set
would make them less ambiguous.
> Regarding earlier points on what the standard does provide: the standard
> needs to provide encodings for all the encoding types that are (currently)
> pushed out by the standard, and nothing more. This includes: std::utf8,
> std::utf16, std::utf32, std::wide_execution, and std::narrow_execution.
I agree but I want to stress that this would be a good idea to provide
only minimal support for ECS and WECS (i.e. transcoding only) and just
let users migrate to Unicode.
> The
> standard should not vend any other encodings, but the Encoding and Decoding
> interfaces should be standard -- much like Allocator -- that allows a user
> to swap in their own class type and object that replaces the use of an
> encoding in any interface / function standard templates provide. (Similar
> to char_traits, except not as useless.) This means users can employ
> whatever encoding or power they have under the hood and enjoy fast and
> correct text processing so long as they follow the required semantics.
Again, it's been suggested to provide full-fledged API to Unicode only.
> Note that we cannot only ship utf8 as an encoding, because the standard
> already ships and acknowledges more than utf8 as one of the encoding for
> string literals. It would be highly dysfunctional to have utf16 string
> literals that the standard library itself cannot process in a reasonable
> manner.
Yes, supporting UTF-16 and UTF-32 is trivial if algorithms work in terms
of scalar values (which they should).
Received on 2019-04-12 21:00:36