C++ Logo

sg16

Advanced search

Re: Considerations for Unicode algorithms

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 31 Jan 2023 19:49:43 +0100
On 31/01/2023 19.38, Corentin via SG16 wrote:
> On Tue, Jan 31, 2023 at 7:01 PM Zach Laine <whatwasthataddress_at_[hidden] <mailto:whatwasthataddress_at_[hidden]>> wrote:
> The right question is not "Who cares about vector<byte>?" The right
> question is "If the algorithm can process iterators from vector<byte>
> with no code change to the algorithm, why shouldn't it?" That is, why
> should that weirdo using vector<byte> have to copy her weird data?
>
>
> Because forcing users to confirm intent, especially for something that requires more domain-specific
> knowledge or confidence that people reduce the opportunities for bugs.
>
> This is especially true on platforms where UTF-8 is not the default, and on which codepoint_view(char*) is likely to produce mojibake, fail, or otherwise not behave correctly,
> and to leave users confused (if only because it will work on some platforms/environments but not other).
>
> Just because we can accept anything (I agree with that) does not mean we should!

I think there is a difference between "we require users to copy their
data to a vector of a type we like" and "we require users to use a
type-converting or code point-constructing view for odd types".

The first option is right out, but the second one we should consider
seriously. Those that use std::byte to store UTF-8 should probably
be required to explicitly view-convert that to char8_t
(and that all syntax fluff gets optimized away, of course).

Jens

Received on 2023-01-31 18:49:47