C++ Logo

sg16

Advanced search

Re: Considerations for Unicode algorithms

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 31 Jan 2023 15:49:53 +0100
Hi Corentin,

On 30/01/2023 14.36, Corentin via SG16 wrote:
> As promised eons ago, I put some of my thoughts on Unicode algorithms in a paper.
> I'll try to improve the form when I have time, but I wanted to give Zach and everyone else time to look at it before Issaquah, if we want to have something to discuss in the corridor track.
>
> https://isocpp.org/files/papers/D2773R0.pdf <https://isocpp.org/files/papers/D2773R0.pdf>

Thanks for that. In general, views can be less efficient than
(say) writing a loop by hand, because the view often needs to
store its current state in data members in preparation for the
next iteration, and that is sometimes less friendly for the
optimizer. For example, a "drop_view" needs to check in
every iteration whether it's still in the "dropping" phase or
not. Written explicitly, one would simply skip the first n
elements in a separate loop and then make another loop that
processes the remaining elements (if any).

What's the situation with the Unicode algorithms?
Is there a performance benefit for integrating (say) the
UTF-8 -> code point decoding stage into (some of) the
algorithms themselves, for example because that would
allow more seamless application of SIMD?

I agree with leaving locale-based tailoring out of the
picture for now.

Jens

Received on 2023-01-31 14:49:58