I'm sure many people agree that UTF16 was a mistake. I'm not sure how many people agree that it deserves deprecation, or removal.

On Fri, Apr 12, 2019 at 4:46 PM Steve Downey <sdowney@gmail.com> wrote:
I'm not placing ECS and WECS?

Presumably, Execution Character Set and Wide Execution Character Set.

On Fri, Apr 12, 2019 at 3:00 PM Lyberta <lyberta@lyberta.net> wrote:
Previously it was suggested to focus on Unicode so I no longer propose
std::text namespace but I think we should put Unicode into std::unicode.

I ultimately don't have a horse in the race: I'll stick the code wherever the final bikeshed is built.
 
Is there a proposal for those?

I am working on a proposal; I believe someone else might be working on a proposal for it as well. There is also an in-progress implementation.
WIP Proposal: https://thephd.github.io/vendor/future_cxx/papers/d1629.html
WIP Implementation (will be moved to separate repository in a few months): https://github.com/ThePhD/phd/tree/master/include/phd/text
 
Maybe a bit offtopic but I don't think std::narrow_execution and
std::wide_execution are good names. I think appending _character_set
would make them less ambiguous.

I took a vote on narrow/wide vs. narrow_execution/wide_execution, but not narrow_character_set/wide_character_set or narrow_execution_character_set/wide_execution_character_set. I'm down for making these names as ugly and unpalatable and unspell-able as possible, because nobody should be using them ever without compelling reason (e.g., interop with old code).
 
> Regarding earlier points on what the standard does provide: the standard
> needs to provide encodings for all the encoding types that are (currently)
> pushed out by the standard, and nothing more. This includes: std::utf8,
> std::utf16, std::utf32, std::wide_execution, and std::narrow_execution.

I agree but I want to stress that this would be a good idea to provide
only minimal support for ECS and WECS (i.e. transcoding only) and just
let users migrate to Unicode.


I agree. The entire unicode library will only work with unicode_code_point/scalar_value (char32_t or a strong typedef, whatever people decide). However, in order to compensate for the fact that the stored text sequences in many places will not be able to use this library, we need robust transcoding (encode/decode) support. The default is encodings that:

1. pipe things from code_unit_t ->  unicode_code_point_t;
2. (do all your work here);
3. then, pipe things from  unicode_code_point_t -> code_unit_t

If you specify the inner bit to not be Unicode, the library should (and will) loudly and noisily fail you for not providing Unicode it can use. But maybe someone just wants ebcdic -> wide_ebcdic with some strange non-unicode intermediary encoding. That's fine too; it just won't work with all of the Standard because it is Sufficiently Weird. Your encode/decode will work, and your transcode within that boundary, but not transcoding outside of it without some way to go from what you have to Unicode.
 
> The standard should not vend any other encodings...

Again, it's been suggested to provide full-fledged API to Unicode only.

I agree; the core of the library will be built on Unicode and Unicode Algorithms that work on Unicode Code Points/Unicode Scalar Values. However, there are one too many text encodings in the wild and serving up production data -- including obscene amounts of Financial and Government data -- that is not in a Unicode Format of any kind. Telling these industries that they will not be apart of the new world does not sound like a useful business proposition; therefore, they will pay the cost of (lazy, eager) transcoding as described above, and then use the Unicode Algorithms once they transcode. (They can then optionally translate back down to whatever they want; e.g., when they're sending it out of their program.)

Note that only the people who do not keep Unicode around will need to pay the cost of transcoding. If your data is already Unicode-friendly, then the standard and the interfaces we provide will support you fully. This means that any hard-coded algorithms that are not templated on encoding / decoding must provide a range to Unicode Codepoints to work on (or straight up take char8_t, char16_t, and char32_t, all of which are assumed by compile-time conventions to be valid Unicode).
 
ECS and WCS must be transcoded. (Or cast/handled in some similar manner.)

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode