sg16: Re: [SG16-Unicode] code_unit_sequence and code_point

From: R. Martinho Fernandes <rmf_at_[hidden]>
Date: Tue, 19 Jun 2018 20:27:37 +0200

On June 19, 2018 8:16:00 PM GMT+02:00, Lyberta <lyberta_at_[hidden]> wrote:
>Tom Honermann:
>> On 06/19/2018 04:11 AM, Lyberta wrote:
>> UTF-16 and UTF-32 are convenient for views over u"text" and U"text"
>> respectively. And the BE/LE variants are useful as views over (byte
>> oriented) network and file I/O (without having to first convert from
>> encoding scheme to encoding form).
>
>Yes, but I think those views should not share the same class template.

Why not, though? I can believe you when you say you think that, but that doesn't make it any more convincing than me saying "I think they should be the same class template". There are compelling reasons to make them the same template, mainly that making different templates will just be duplicating the same code, and the same interface.

>> Following the thread further, it seems you would like to have a
>simple
>> codec for translating BE/LE data (e.g., to load BE/LE byte oriented
>data
>> into native endian larger-than-byte types). That sounds reasonable,
>but
>> I don't see why it should be part of text interfaces.
>
>I agree and I don't see why BE/LE variants of encodings should be
>parameters to text_view. Maybe we should defer endianness and BOM
>issues
>for later.

Why not? Why treat endianness specially? There's a clear reason to treat BOMs specially: they make the decoding/encoding process stateful. There's no clear reason to do the same for endianness. In the end, UTF-16BE, UTF-8, GB18030, and a bunch of others are just mappings between code points and byte sequences and I see no compelling reason to treat one of those specially.

Received on 2018-06-19 20:28:10