C++ Logo


Advanced search

Re: [SG16-Unicode] Opinion: std::text should have no begin/end

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 31 May 2018 22:35:39 +0200
> There is no compelling reason for those things to be members. Boost.Text
> will have all that functionality (and mostly does already), separated out
> as free-function algorithms. I find that this works quite well.


> Moreover, a text-type is a sequence of something. Or rather, I can't get
> much use out of it if it isn't. So, what is it a sequence of? I think
> that graphemes, code points, and code units are the most essential units of
> work that one might want to use. However, a sequence of exactly *one* of
> these should be the kind of range that a text type models.
> I think most users that work with text will want graphemes as the
> essential view of text. That may prove to be untrue.

That's too many assumptions and conditionals for my taste.
If a default does not immediately make sense to an overwhelming majority, I
don't think that default can be justified.
People will bring their own set of assumptions and I think it's probably
better to teach them the choices they can make and let them make that
choice for their use case.

Ad yeah, research will need to be involved - is that unreasonable ? See
Kate Gregory's 'It's complicated' talk.

And grapheme vs codepoint is I think very usecase dependent.
I don't like the idea of baking-in "a default use case".

Most text/locale issues inherited from the 80s boil down to bad default and
baked-in assumptions which are I think at the basis of people
I know it framed mine for a long time.

If we make something a default, it will then again frame people
understanding of what Unicode is and whatever we do will be, at best,

I actually quite like the idea if text being an opaque thing that you need
to feed to a view to operate on.

If we definitively need a default, I would rather it be codepoints, i think
it's the less "opinionated".

> At the very least, it means that we don't arm the wrong gun for the user
>> by accident / by default. And at some level, the user will have to read
>> through even the synopsis of what each of these functions will bring to
>> them. Some are obvious (words, sentences, etc. text segmentation
>> algorithms), but others might require some thinking (codepoints, graphemes
>> in particular).

Not picking one is a lot worse. One of the many problems with Unicode is
> that it is too damn complicated for experienced users, much less new ones.
> We need types with the right default so that people can just pick up a new
> version of their compiler and get to work without taking a week first to do
> research. To the extent possible, it should "just work."
>> I can understand that not having defaults could be a frustrating
>> experience to start with, however. I feel like it's justifiable given that
>> there's no 100% right answer. Maybe we can get 80% but it will only make
>> the 20% more surprising. I feel like having docs / tables that describe the
>> various segmentation algorithms, what they bring to table, and what the
>> user can get out of it might be more worth while.
>> Is this a viable path?
I think it is :)
I usually read the manual when I buy a power tool I'm not familiar with.

> I don't think it is. Again, the future reactions to Boost.Text will bear
> that out (or not!).
> Zach
> Corentin

> _______________________________________________
> Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2018-05-31 22:35:53