C++ Logo


Advanced search

Re: Considerations for Unicode algorithms

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 31 Jan 2023 17:35:09 -0500
Thank you very much for this paper, Corentin!

Personal (non-chair) comments follow.

First, I agreed with nearly everything in it and the parts I didn't are
not the most important parts. Very nice work!

With respect to the comments in the "char32_t as code point type" and
design choices I made in P0244 (Text_view) <https://wg21.link/p0244>, I
think the use cases are different. In P0244, "code point" is used in the
generic non-Unicode sense and therefore requires, given a generic code
point, a way to identify the associated character set. For Unicode
specific algorithms, I agree that there is no need for such abstraction
and that char32_t now suffices (following adoption of P1041 (Make
char16_t/char32_t string literals be UTF-16/32)
<https://wg21.link/p1041>. Even if we were to approve some form of
generic code point type or concept in the future, I see no need for
Unicode specific algorithms to work in terms of it.

These are the items where I still need some convincing:

*code unit sequences should be validated by default.*

The only way I know of to do this well (without contracts) is for
validation to produce a wrapper type that statically indicates that
validation has been performed. Validation is fast, but fast operations
add up if repeated many times. I favor specifying preconditions that can
be specified as contracts in the future.

*On text containers*

The continued absence of text containers means that programmers will
continue to use std::string and other string-like types to build text;
including their associated insert or delete functions at arbitrary
locations. I think we can provide better builders, but I don't think we
are ready to yet; we need rope-like containers first and I think that
means we need segmented data structures in ranges before that. I agree
there is no need for implementation of Unicode algorithms to depend on
such containers though.


On 1/30/23 8:36 AM, Corentin via SG16 wrote:
> Hey folks.
> As promised eons ago, I put some of my thoughts on Unicode algorithms
> in a paper.
> I'll try to improve the form when I have time, but I wanted to give
> Zach and everyone else time to look at it before Issaquah, if we want
> to have something to discuss in the corridor track.
> https://isocpp.org/files/papers/D2773R0.pdf
> Thanks,
> Corentin

Received on 2023-01-31 22:35:11