C++ Logo

sg16

Advanced search

Re: Considerations for Unicode algorithms

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 1 Feb 2023 10:57:27 -0500
On Wed, Feb 1, 2023 at 4:22 AM Corentin via SG16 <sg16_at_[hidden]>
wrote:

> I guess I should because I think it's basically the one original trick I
> came up with.
>
> for any given view, we have the view itself, and its adaptor, such that foo_view(V)
> and V | view
>
> V | view is defined by an operator|() taking a range (which need not be a
> view, though it needs to be viewable).
> And most of the time, that is the interface people are going to use when
> using view, it's just much more convenient.
>
> So, instead of defining a single operator|(range_of_char32_t) on that
> range adaptor, we can defined additional overloads that take char8/16_t and
> produce a unicode_algo_view<codepoint_view<charN_t>>
> instead of unicode_algo_view<all_t<...>>.
>
> The benefits are:
>
> - We can support implicit decode/encode for the sake of ergonomy, in
> the place ergonomy is most desirable
> - The only places we need to specify decode/encode steps and their
> error policies is in these range adaptors overloads, which we could
> probably do once for all unicode algorithms (both in term of spec and
> implementation), without the view themselves having to take on double duties
> - We can ellide these implicit decode/encode steps when chaining
> algorithms (r | normalize | word_break) as these pipe operators
> basically construct a graph of algorithms, we can remove these implicit
> nodes when they do redundant work ie decode | normalize | encode |
> decode | word_break | encode should just be decode | normalize |
> word_break | encode
>
> Does that clarify?
>
>
>>
>> I don't want to get to hung up on an off the cuff example, but I do have
concerns about the word_break | encode sequence. If the word_break adaptor
produces a sequence of words, decode will produce gibberish run on text. If
it produces non-word sequences interspersed between the words, then it's
more difficult to use because each range needs to be examined to implement
things like text selection or filling.

Received on 2023-02-01 15:57:40