C++ Logo


Advanced search

Re: [SG16-Unicode] code_unit_sequence

From: Lyberta <lyberta_at_[hidden]>
Date: Tue, 16 Jul 2019 18:31:00 +0000
Niall Douglas:
> For me personally, any new string type which doesn't look a lot like the
> LLVM string type doesn't offer a compelling reason to break with
> std::string.

Which one is it exactly? Can you give a link? I've found LLVM::StringRef
and a lot of functions that works in terms of "char". This is a
non-starter for me. We have std::byte for bytes, we have std::[u]int8_t
for small integers. Type "char" should die already.

> There is some good stuff in your proposal. But I want a much more
> sensible string object more than I want the other stuff.

What do you want with it? Code units are so low level that I don't see
any meaningful algorithm that would require bloating the API.

scalar_value_sequence is the layer above and it maybe would make sense
to add more member functions there such as .replace because that would
preserve the well-formedness of Unicode. That is the case for my library
but I intend to tweak it before writing proposal.

But before that I need to write a proposal for encoding forms and that
requires ranges. I intend to implement some form of ranges myself so I
can test code. Right now in my own library I use iterator pairs but
that's too C++17.

> Also, you really ought to review all the other Unicode string object
> proposals current and historical, and do a matrix of comparison between
> yours and all the others. No harm to throw in other language's string
> objects too, if they are informative.

The only good Unicode proposal I've found is n3572 and I've read it back
in 2013 when it was proposed. Obviously, at that time we didn't have
concepts and ranges.

It has some very bad design decisions such as using dumb types and
performing transcoding on the fly. I like the list of members in
codepoint_properties but I don't like incorrect name (it's code point,
not codepoint) and unless Unicode Character Database (why horrible name
again) provides properties for surrogates, I intend only to allow
querying properties of scalar values as member functions of
std::unicode::scalar_value. Obviously in constexpr. I'm looking forward
to writing a C++ code generator that will parse UCD and write million
line header of constexpr C++ :)

Received on 2019-07-16 20:31:16