C++ Logo

sg16

Advanced search

Re: Thoughts on P2728R6: Unicode in the Library, Part 1: UTF Transcoding

From: Zach Laine <whatwasthataddress_at_[hidden]>
Date: Thu, 14 Sep 2023 10:59:19 -0500
On Wed, Sep 13, 2023 at 2:30 PM Jens Maurer via SG16
<sg16_at_[hidden]> wrote:
>
> On 13/09/2023 18.58, Tom Honermann via SG16 wrote:
> > The following reflects some of my personal thoughts regarding this paper
>
> Same here.
>
> In general, I find the paper seriously lacking rationale.

Please indicate which design decisions you think lack rationale.

> I notice that we still have the "Format" enum, despite suggestions
> in earlier teleconferences that it can be replaced with a type
> parameter taking a choice of char8_t, char16_t, or char32_t instead.
> (There is a lot of back-and-forth mapping going on between Format
> and those types; that would be avoided.)
>
> That means we might need a
>
> template<class T>
> concept is_unicode_char = /* T is one of char8_t, char16_t, or char32_t */
>
> but we can directly use the corresponding type in a lot of circumstances.

Right. I'll try to remove format. I expect the removal to go smoothly.

> I notice we have
>
> template<format Format, class R>
> utf_view(R &&) -> utf_view<Format, views::all_t<R>>;
>
> How is "Format" going to be deduced here?
>
>
> The deduction guides for utf8_view and friends are not shown in the synopses.

I find this confusing too, but apparently this is just how deductions
guides interact with template aliases. Tomasz showed me how to get it
to work. If you look down a bit from that declaration, you'll see the
aliases for the utfN_views. The combination of the guide for utf_view
and the alias for utfN_view makes this work.

> Why is there a project_view? The existing transform_view seems to work quite
> nicely in its place.

It does not, because transform_view cannot be a borrowed_range.
project_view was an attempt to remedy that, suggested in an SG-9
meeting. Since then, SG-9 has decided they'd rather see a solution
for making transform_view conditionally borrowed. I'm going to write
a separate paper with Barry for that. More explicitly, transform_view
will be gone from the next revision.

> as_utfN should be "to_utfN", in my view (it's doing actual work, not just
> casting values).

This seems to be 100% an LEWG issue.

> How can I configure error handling for the utf_view ? That seems to be missing.

It is indeed missing. As Corentin pointed out, there's not really a
mechanism in the ranges work for early termination on error.

> constexpr auto begin() {
> constexpr format from_format = format-of<ranges::range_value_t<V>>();
> if constexpr(is-charn-view<V>) {
> return make_begin<from_format>(base_.impl_.begin().base(), base_.impl_.end().base());
> } else {
> return make_begin<from_format>(ranges::begin(base_), ranges::end(base_));
> }
> }
>
>
> This apparently wants to do some unpacking in some special cases, in addition / beyond
> unpacking utfN -> utfX -> utfY chains. It would be better to state that this unpacking
> is done by the range adaptor object; see (for example) [range.drop.overview].

That would be problematic, because it is inconvenient -- when you
create a utf_view<format::utf8, char8_view<V>> foo (whether directly
or via adaptors), you want foo.base() to return a char8_view<V>, not a
V. Giving you a V would seem to silently convert foo.base() from
char8_t to whatever it was adapted from (say char).

> And, of course, instead of talking about as_utfN, we should talk about as_utf<T>
> where T is one of char8_t, char16_t, or char32_t. as_utfN can still exist,
> e.g. as references to the corresponding variable template specializations.

That seems odd to me. Do you have a use case for someone wanting to
write code that converts to "some UTF" generically? I'm all about
genericity, but I'm having a hard time coming up with a realistic
example. Usually you need to convert to exactly UTF-8, or UTF-32, or
whatever. Coming *from* a generic input is likely to be a fairly
common case, but converting to a generic output does not seem seem
likely to be.

> "5.4.2 Why utf_iterator is not a nested type within utf_view"
>
> I disagree with the rationale. You can get at the iterator if you have
> a view, or you can construct a view type and ask for its iterator if you
> feel the need.

SG-9 voted on this and found weak consensus for having a separate iterator.

Zach

Received on 2023-09-14 15:59:33