On 4/16/23 2:54 PM, Zach Laine via SG16 wrote:
We again talked about utfN_view at the last meeting.  I was trying to
justify their existence, and again I could not remember the salient
point during the discussion.  Now I have.  Here is one of them:

template<utf8_iter I, sentinel_for<I> S = I>
  struct utf8_view : view_interface<utf8_view<I, S>> {
    using iterator = I;
    using sentinel = S;

    constexpr utf8_view() {}
    constexpr utf8_view(iterator first, sentinel last);

    constexpr iterator begin() const;
    constexpr sentinel end() const;

    friend constexpr bool operator==(utf8_view lhs, utf8_view rhs)
      { return lhs.begin() == rhs.begin() && lhs.end() == rhs.end(); }

    template<class CharT, class Traits>
      friend basic_ostream<CharT, Traits>&
        operator<<(basic_ostream<CharT, Traits>& os, utf8_view v);

  private:
    using iterator_t = unspecified;          // exposition only
    using sentinel_t = unspecified;          // exposition only

    iterator_t first_;                       // exposition only
    [[no_unique_address]] sentinel_t last_;  // exposition only
  };

Note the operator<<.  I don't know how to provide a general-purpose
way to stream out a subrange<I, S>, when we know that it happens to
contain UTF-8, so I created utf8_view, and added an operator<<.  I
have a similar concern about adding support for
std::format-/std::print-ing ranges of UTF.
I don't think the operator<< above works as a general-purpose method regardless. What does it do when CharT is wchar_t?
Streaming or printing a utfN_view "just works", and this convenience
is used throughout Boost.Text and the examples in the papers I'm
proposing.

I suspect this is not actually true. The paper doesn't explain what operator<< actually does at present. Does it "just work" on Windows to stream to stdout if the user hasn't changed the console encoding to UTF-8 and is not using Microsoft's new Terminal? What would it do if stdout is directed to a terminal in an EBCDIC environment? What if it were directed to a text file in that same environment?

There are some hard questions here that I think need to be (separately) answered before we can start supplying such operators.

  I think the value of this convenience is evident in the
examples.  If someone has a reasonable alternative, I'm happy to
replace utfN_view with something that works more like a typical
std::ranges view.  Without such an alternative, I want to keep the
current design.

For the case where UTF text is held in char or wchar_t based storage, the solution I prefer is to give the programmer a tool for presenting that data through an interface that exposes it as char8_t, char16_t, or char32_t. Then, we can just rely on the type system to infer the right encoding to use. Something like the following where the unspecified iterator converts the value type of the supplied iterator to char8_t.

template<std::input_iterator I, std::sentinel_for<I> S>
requires std::convertible_to<std::value_type_t<I>, char8_t>
struct as_utf8_view : std::ranges::view_base {
    using iterator = /* unspecified */;
    using sentinel = /* unspecified */;

    constexpr as_utf8_view();
    constexpr as_utf8_view(I, S);

    constexpr iterator begin() const;
    constexpr sentinel end() const;
};
template<std::ranges::range R>
requires std::convertible_to<std::ranges::range_value_t<R>, char8_t>
auto as_utf8(R r) {
  return as_utf8_view(std::ranges::begin(r), std::ranges::end(r));
}

That suffices to adapt a range of values of a type that is convertible to char8_t to a view of char8_t values such that they can be used with any interface that works with a range of char8_t.

(Feel free to substitute CTAD as desired)

Tom.