C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Fundamental Unicode types

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 28 Mar 2019 08:41:19 -0400
I'm not sure we have consensus on text being a Container. While I think
grapheme clusters are useful chunks to iterate over, I'm not convinced they
are primitive types. Operationally they look like a range of codepoints,
identical to what you might get from any of the breaking algorithms.
I share Corentin's concern about throwing operations on the base types.
Text in the wild is broken far to often to be exceptional, but on the other
hand, once checked, the invariant is easy to maintain and adding checks is
costly. I wonder if Contracts might be more appropriate?


On Thu, Mar 28, 2019, 05:32 Lyberta <lyberta_at_[hidden]> wrote:

> > charX_t have a requirement to be code units in C++20.
>
> But they do not have the requirements of holding the valid values hence
> they are still dangerous.
>
> >
> > We also really do not want to have code units API. Because you can not do
> > anything useful with it.
> > Especially iterating over code units or querying the properties of code
> > units is something that is probably not useful ever (and has a propency
> to
> > be missed used)
>
> This is a transition mechanism from std::basic_string and many other
> legacy string classes. Besides, all those functions are required when
> implementing encoding forms so I decided to expose them. I don't think
> they are harmful.
>
> >
> > Scalar value and grapheme views are useful indeed Imo. Text is useful but
> > it's basically something that can spawn a scalar or grapheme view with
> some
> > storage, high level invariants and state.
>
> Current consensus is that if you call std::begin on std::text, it will
> return grapheme cluster iterator, I'd personally use .to_graphemes() for
> that. So for now I plan to implement it as a distinct type. I haven't
> yet implemented grapheme cluster level so I don't have insights yet.
>
> > Lastly, I am very concerned about a design that would throw by default.
> > Especially something like domain_error. It basically means I wouldn't use
> > any standard Unicode facilities and nor would people in a lot of
> Industries
> > (games, embedded etc).
>
> I do plan to rebase my proposal on top of p0709. That way those people
> won't have any excuse any longer :P
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-03-28 13:41:33