C++ Logo


Advanced search

Re: [SG16-Unicode] code_unit_sequence and code_point_sequence

From: Lyberta <lyberta_at_[hidden]>
Date: Wed, 20 Jun 2018 09:47:00 +0000
R. Martinho Fernandes:
> On June 20, 2018 7:52:00 AM GMT+02:00, Lyberta <lyberta_at_[hidden]> wrote:
>> I idea that programmers won't need to.
>> std::text t = u8"Hello";
>> Type of text will be
>> std::text<std::code_point_sequence<std::code_unit_sequence<std::utf8,
>> std::endian::native, std::no_bom>>>;
>> Here is standard library has chosen native endianness and no reading or
>> writing of BOM - a sane default. Then we provide helpers such as:
>> auto t = std::make_text<std::endian::big, std::bom>(u8"Hello");
>> Type of text will be
>> std::text<std::code_point_sequence<std::code_unit_sequence<std::utf8,
>> std::endian::big, std::bom>>>;
>> Here programmer has explicitly requested for BE with reading and
>> writing
>> of BOM. std::bom and std::no_bom are just placeholders, this should be
>> an enum class.
> I'm sorry, these examples are bonkers again. They are not convincing because you used UTF-8. What does big endian UTF-8 even mean?

Another proof that Unicode is hard. Of course, endianness is skipped for
UTF-8. I'm starting to be a bit overwhelmed.

> Can you write the same with e.g. the UTF-16 variants instead? That would make much better examples. I've been trying but I don't understand what e.g. this should mean:
> auto t = std::make_text<std::endian::big, std::bom>(u"Hello");

This would mean that bytes in memory will be stored in big endian order
and BOM will be read and written. If during deserialization such
instance of std::text will not find BOM in LE, an exception would be
thrown (or std::error_code alternative if we want to follow the path of

In my proposal std::text takes something that satisfies
CodePointSequence and std::text<std::vector<char32_t>> will compile and
work as expected, just no BOM and endianness handling inside std::vector.

At this point I feel the need to implement my proposal first so I would
cover all the blank spots and you will see the code.

Received on 2018-06-20 11:47:37