C++ Logo

sg16

Advanced search

Re: Considerations for Unicode algorithms

From: Steve Downey <sdowney_at_[hidden]>
Date: Mon, 30 Jan 2023 16:51:32 -0500
On Mon, Jan 30, 2023 at 4:33 PM Zach Laine via SG16 <sg16_at_[hidden]>
wrote:

>
> Also, I think the algorithms should be generic. They should not work
> only with char32_t, or only with int, etc. Users should be free to
> use char8_t, char, unsigned char, etc., for UTF-8. std::byte if
> you're nasty.
>
>
> I promise to actually read the whole paper before having more comments.
The algorithms that Unicode defines are all in terms of code points, not
encoded forms, I would generally prefer to see the algorithms deal with
code points only, perhaps generic on convertible to a char32_t than to have
to worry about decoding issues in the middle of an algorithm. Replacement
character policy should be in a small number of places, for example in the
decoders for utf16BE etc.

Received on 2023-01-30 21:51:46