Thank you very much for this paper, Corentin!

Personal (non-chair) comments follow.

First, I agreed with nearly everything in it and the parts I didn't are not the most important parts. Very nice work!

With respect to the comments in the "char32_t as code point type" and design choices I made in P0244 (Text_view), I think the use cases are different. In P0244, "code point" is used in the generic non-Unicode sense and therefore requires, given a generic code point, a way to identify the associated character set. For Unicode specific algorithms, I agree that there is no need for such abstraction and that char32_t now suffices (following adoption of P1041 (Make char16_t/char32_t string literals be UTF-16/32). Even if we were to approve some form of generic code point type or concept in the future, I see no need for Unicode specific algorithms to work in terms of it.

These are the items where I still need some convincing:

code unit sequences should be validated by default.

The only way I know of to do this well (without contracts) is for validation to produce a wrapper type that statically indicates that validation has been performed. Validation is fast, but fast operations add up if repeated many times. I favor specifying preconditions that can be specified as contracts in the future.

On text containers

The continued absence of text containers means that programmers will continue to use std::string and other string-like types to build text; including their associated insert or delete functions at arbitrary locations. I think we can provide better builders, but I don't think we are ready to yet; we need rope-like containers first and I think that means we need segmented data structures in ranges before that. I agree there is no need for implementation of Unicode algorithms to depend on such containers though.

Tom.

On 1/30/23 8:36 AM, Corentin via SG16 wrote:
Hey folks.
As promised eons ago, I put some of my thoughts on Unicode algorithms in a paper.
I'll try to improve the form when I have time, but I wanted to give Zach and everyone else time to look at it before Issaquah, if we want to have something to discuss in the corridor track.

https://isocpp.org/files/papers/D2773R0.pdf 

Thanks, 

Corentin