C++ Logo

sg16

Advanced search

Re: Thoughts on P2728R6: Unicode in the Library, Part 1: UTF Transcoding

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Wed, 13 Sep 2023 21:30:40 +0200
On 13/09/2023 18.58, Tom Honermann via SG16 wrote:
> The following reflects some of my personal thoughts regarding this paper

Same here.

In general, I find the paper seriously lacking rationale.

I notice that we still have the "Format" enum, despite suggestions
in earlier teleconferences that it can be replaced with a type
parameter taking a choice of char8_t, char16_t, or char32_t instead.
(There is a lot of back-and-forth mapping going on between Format
and those types; that would be avoided.)

That means we might need a

template<class T>
concept is_unicode_char = /* T is one of char8_t, char16_t, or char32_t */

but we can directly use the corresponding type in a lot of circumstances.



I notice we have

template<format Format, class R>
  utf_view(R &&) -> utf_view<Format, views::all_t<R>>;

How is "Format" going to be deduced here?


The deduction guides for utf8_view and friends are not shown in the synopses.


Why is there a project_view? The existing transform_view seems to work quite
nicely in its place.


as_utfN should be "to_utfN", in my view (it's doing actual work, not just
casting values).


How can I configure error handling for the utf_view ? That seems to be missing.


    constexpr auto begin() {
      constexpr format from_format = format-of<ranges::range_value_t<V>>();
      if constexpr(is-charn-view<V>) {
        return make_begin<from_format>(base_.impl_.begin().base(), base_.impl_.end().base());
      } else {
        return make_begin<from_format>(ranges::begin(base_), ranges::end(base_));
      }
    }


This apparently wants to do some unpacking in some special cases, in addition / beyond
unpacking utfN -> utfX -> utfY chains. It would be better to state that this unpacking
is done by the range adaptor object; see (for example) [range.drop.overview].

And, of course, instead of talking about as_utfN, we should talk about as_utf<T>
where T is one of char8_t, char16_t, or char32_t. as_utfN can still exist,
e.g. as references to the corresponding variable template specializations.


"5.4.2 Why utf_iterator is not a nested type within utf_view"

I disagree with the rationale. You can get at the iterator if you have
a view, or you can construct a view type and ask for its iterator if you
feel the need.

Jens

Received on 2023-09-13 19:30:46