C++ Logo

sg16

Advanced search

Re: Considerations for Unicode algorithms

From: Zach Laine <whatwasthataddress_at_[hidden]>
Date: Tue, 31 Jan 2023 12:01:42 -0600
Even though we discussed this offline, I'll repeat it here for everyone else.

The idea is that users should not be required to copy data to make the
data conform to a particular type in the interface. For instance,
making char-users copy their data to char8_t, or making char8_t users
copy data to char. Since the algorithm does not care what the input
type is, that should be reflected in the interface; it should accept
any 8-bit integral type.

The right question is not "Who cares about vector<byte>?" The right
question is "If the algorithm can process iterators from vector<byte>
with no code change to the algorithm, why shouldn't it?" That is, why
should that weirdo using vector<byte> have to copy her weird data?

Zach

On Tue, Jan 31, 2023 at 2:25 AM Peter Brett <pbrett_at_[hidden]> wrote:
>
> Hi Zach,
>
> Doesn't this add a lot of complexity? I really would like to understand the rationale/motivation for this level of generality, with some examples of code that is significantly improved by them.
>
> For example, I am struggling to envisage a situation in which I'd find it useful to do sentence break iteration on a std::vector<byte> without any intermediate decoding step.
>
> Best regards,
>
> Peter
>
> -----Original Message-----
> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Zach Laine via SG16
> Sent: 30 January 2023 21:33
> To: Corentin <corentin.jabot_at_[hidden]>
> Cc: Zach Laine <whatwasthataddress_at_[hidden]>; SG16 <sg16_at_[hidden]>
> Subject: Re: [SG16] Considerations for Unicode algorithms
>
> Also, I think the algorithms should be generic. They should not work
> only with char32_t, or only with int, etc. Users should be free to
> use char8_t, char, unsigned char, etc., for UTF-8. std::byte if
> you're nasty.
>

Received on 2023-01-31 18:01:56