sg16: Re: [SG16-Unicode] code_unit_sequence and code_point

From: Martinho Fernandes <rmf_at_[hidden]>
Date: Wed, 20 Jun 2018 12:39:58 +0200

On 20.06.18 11:47, Lyberta wrote:
> In my proposal std::text takes something that satisfies
> CodePointSequence and std::text<std::vector<char32_t>> will compile and
> work as expected, just no BOM and endianness handling inside std::vector.

So how do I get "endianness handling" if my data is in a vector? That's
a rhetorical question. The real question is: why do I need to ask that
question at all? You'll notice that simpler designs simply don't have
this problem. They can work with any source and do UTF-16BE/LE just
fine, with the same interface that handles any other encoding, nothing
special. The interface that handles one case handles all the cases, all
in the same fashion.
> At this point I feel the need to implement my proposal first so I would
> cover all the blank spots and you will see the code.

With all due respect, I think that the proposal really needs to get
*proper use cases* sorted out first. All this time we've been asking
"what do I use this for" and getting poorly thought-out examples that
actually demonstrate flaws instead of demonstrating usage (why is it
possible to have "big endian utf-8" and "little endian utf-8" as
separate types at all?).

I don't need to see an implementation; I alone have implemented this
sort of thing four times over already, and I know for a fact that others
on this list have done the same. I (we?) trivially believe this can be
implemented because it *has been* implemented.

The thing that trips me is that I still don't know what kind of usage
this enables that a simpler design wouldn't enable. A simpler design
would be one that doesn't have three specialized containers, one that
doesn't have a "bytes to code units" adapter of dubious value, one that
doesn't leak byte order concerns everywhere, one that isn't built on the
assumption that we want basic_string to be removed.

-- 
Martinho

Received on 2018-06-20 12:41:22