C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] code_unit_sequence and code_point_sequence

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 19 Jun 2018 21:42:21 -0400
On 06/19/2018 08:40 AM, Lyberta wrote:
> Martinho Fernandes:
>> But this just makes the example convince me that code_unit_sequence is
>> even less useful. If I understood correctly you wanted to show that
>> supporting utf32be in code_point_sequence makes things more complicated
>> for the user. Correct me if my understanding of the interface is wrong,
>> but, roughly, I don't think I can be convinced that:
>>
>> code_unit_sequence<utf16, big_endian> cus(std::move(source));
>> code_point_sequence<utf16> cps(std::move(cus));
> code_point_sequence takes container as the first parameter so the second
> line of code will be just:
>
> code_point_sequence cps(std::move(cus));
>
> We can provide deduction guide, for example:
>
> code_point_sequence sps{u"Hello"};
>
> Would deduce to:
>
> code_point_sequence<code_unit_sequence<utf16, std::endian::native,
> std::allocator<std::byte>>>

I mentioned this in another reply, but I think it is an important enough
issue to repeat here. It isn't possible to portably access the elements
of char16_t arrays as arrays of std::byte (or [unsigned] char) with the
expectation that each element of the byte/char array correspond to an
(high or low) octet of a char16_t object. On systems that have a byte
size of 16-bits or larger with sizeof(char16_t)==1, the individual
octets are not addressable. And yes, such systems do exist, are
actively maintained, and are actively developed for.

>
> Remember there are tons of code unit sequence types out there:
> std::basic_string, Microsoft's CString, Qt's QString, wxWidgets's
> wxString. We want to support all those types if people provide their
> encoding form traits.

Indeed. This was precisely the motivation for the text_view design.

Tom.

Received on 2018-06-20 03:42:23