ISOCPP sg16 List: [isocpp-sg16] P2728R7 Announcement

From: Eddie Nolan <eddiejnolan_at_[hidden]>
Date: Tue, 8 Oct 2024 10:12:00 -0400

Hi SG16,

I’ve published R7 of P2728, “Unicode in the Library, Part 1: UTF
Transcoding.” It’s available at https://isocpp.org/files/papers/P2728R7.html

Here are the main bullet points from the R7 changelog. Some of these
changes were made based on previous comments on the paper from telecon
minutes and mailing list feedback; others will be new to SG16.

   -

   Rename as_utfN to to_utfN to emphasize that a conversion is taking place
   and to contrast with the code unit views, which remain named as_charN_t.
   -

   Refactor utf_view into an exposition-only @*utf-view-impl*@ class used
   as an implementation detail of separate to_utf8_view, to_utf16_view, and
   to_utf32_view classes, addressing broken deduction guides in the
   previous revision.
   -

   Change utf_iterator to an exposition-only member class of
   @*utf-view-impl*_at_.
   -

   Eliminate iterator unpacking mechanism and replace it with an
   alternative solution to the problem of transcoding ranges wrapping other
   transcoding ranges. This simplifies the API at the expense of removing the
   transcoding iterator’s begin() and end() member functions and losing the
   ability to implement unpacking for user-defined UTF iterators.
   -

   Remove std::uc::format.
   -

   Remove transcoding_error_handler mechanism.
   -

   Introduce new error handling mechanism based on a new transcoding_error
   enumeration which is returned by an success() member function of the
   transcoding view’s iterator.
   -

   Provide a reference implementation.

The new error handling mechanism can be used to implement all the desired
error handling approaches that raised concerns in previous revisions. This
includes choosing an alternative code point instead of U+FFFD as a
replacement character; dropping invalid code unit sequences instead of
inserting replacement characters; collecting statistics about transcoding
errors; and throwing exceptions with descriptive messages on invalid
sequences.

The new revision also adds an appendix that provides proof-of-concept
implementations of some existing UTF transcoding APIs from ICU, iconv, and
others, including their error handling functionality, implemented using the
transcoding views in the paper.

The reference implementation is currently available at
https://github.com/ednolan/UtfView, although I’m planning to migrate it
over to the Beman organization soon.

Any feedback is much appreciated, and I look forward to continuing the
review process for the paper.

Thanks,

Eddie

Received on 2024-10-08 14:12:13