On Wed, Feb 1, 2023 at 4:22 AM Corentin via SG16 <sg16@lists.isocpp.org> wrote:

I guess I should because I think it's basically the one original trick I came up with.

for any given view, we have the view itself, and its adaptor, such that foo_view(V) and V | view

V | view is defined by an operator|() taking a range (which need not be a view, though it needs to be viewable).
And most of the time, that is the interface people are going to use when using view, it's just much more convenient.

So, instead of defining a single operator|(range_of_char32_t) on that range adaptor, we can defined additional overloads that take char8/16_t and produce a unicode_algo_view<codepoint_view<charN_t>>
instead of unicode_algo_view<all_t<...>>.

The benefits are:
We can support implicit decode/encode for the sake of ergonomy, in the place ergonomy is most desirable
The only places we need to specify decode/encode steps and their error policies is in these range adaptors overloads, which we could probably do once for all unicode algorithms (both in term of spec and implementation), without the view themselves having to take on double duties
We can ellide these implicit decode/encode steps when chaining algorithms (r | normalize | word_break) as these pipe operators basically construct a graph of algorithms, we can remove these implicit nodes when they do redundant work ie decode | normalize | encode | decode | word_break | encode should just be decode | normalize | word_break | encode
Does that clarify?

I don't want to get to hung up on an off the cuff example, but I do have concerns about the word_break | encode sequence. If the word_break adaptor produces a sequence of words, decode will produce gibberish run on text. If it produces non-word sequences interspersed between the words, then it's more difficult to use because each range needs to be examined to implement things like text selection or filling.