C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] code_unit_sequence and code_point_sequence

From: R. Martinho Fernandes <rmf_at_[hidden]>
Date: Tue, 19 Jun 2018 11:35:26 +0200
I don't quite understand why deserializing into 16-bit units is useful, though. I would expect code that deserializes text to either perform transcoding to produce a buffer in an encoding suitable to work with some external API, or otherwise to need the decoded text, not the code units. I might be missing something people do with code units, but IME they're either decoded or opaque blobs to pass elsewhere.

More importantly, though, I don't understand what needs to get complicated in the code point interface. The interface you have is already enough as is (any reservations that some might have about adding new sequence container types notwithstanding). The interface requires no change at all to support UTF-16BE, etc; the implementation can use std::string just fine (remember, the code units for UTF-16BE are just bytes). It will probably work fine when you finish the implementation; it just needs implementations of the encoding schemes.

On June 19, 2018 10:54:00 AM GMT+02:00, Lyberta <lyberta_at_[hidden]> wrote:
>R. Martinho Fernandes:
>> Maybe you know a use case for this that isn't the implementation nor
>transcoding?
>
>What about [de]serialization? I just really don't want to complicate
>code_point_sequence with endianness. code_point_sequence should be able
>to use std::basic_string as buffer and std::basic_string supports only
>native endianness.
>
>code_unit_sequence is for working with encoding schemes. All other code
>should only work with encoding forms and be oblivious to endianness.

Received on 2018-06-19 11:35:38