Subject: Re: [SG16-Unicode] Opinion: std::text should have no begin/end
From: Corentin (corentin.jabot_at_[hidden])
Date: 2018-05-31 15:35:39
> There is no compelling reason for those things to be members. Boost.Text
> will have all that functionality (and mostly does already), separated out
> as free-function algorithms. I find that this works quite well.
> Moreover, a text-type is a sequence of something. Or rather, I can't get
> much use out of it if it isn't. So, what is it a sequence of? I think
> that graphemes, code points, and code units are the most essential units of
> work that one might want to use. However, a sequence of exactly *one* of
> these should be the kind of range that a text type models.
> I think most users that work with text will want graphemes as the
> essential view of text. That may prove to be untrue.
That's too many assumptions and conditionals for my taste.
If a default does not immediately make sense to an overwhelming majority, I
don't think that default can be justified.
People will bring their own set of assumptions and I think it's probably
better to teach them the choices they can make and let them make that
choice for their use case.
Ad yeah, research will need to be involved - is that unreasonable ? See
Kate Gregory's 'It's complicated' talk.
And grapheme vs codepoint is I think very usecase dependent.
I don't like the idea of baking-in "a default use case".
Most text/locale issues inherited from the 80s boil down to bad default and
baked-in assumptions which are I think at the basis of people
I know it framed mine for a long time.
If we make something a default, it will then again frame people
understanding of what Unicode is and whatever we do will be, at best,
I actually quite like the idea if text being an opaque thing that you need
to feed to a view to operate on.
If we definitively need a default, I would rather it be codepoints, i think
it's the less "opinionated".
> At the very least, it means that we don't arm the wrong gun for the user
>> by accident / by default. And at some level, the user will have to read
>> through even the synopsis of what each of these functions will bring to
>> them. Some are obvious (words, sentences, etc. text segmentation
>> algorithms), but others might require some thinking (codepoints, graphemes
>> in particular).
Not picking one is a lot worse. One of the many problems with Unicode is
> that it is too damn complicated for experienced users, much less new ones.
> We need types with the right default so that people can just pick up a new
> version of their compiler and get to work without taking a week first to do
> research. To the extent possible, it should "just work."
>> I can understand that not having defaults could be a frustrating
>> experience to start with, however. I feel like it's justifiable given that
>> there's no 100% right answer. Maybe we can get 80% but it will only make
>> the 20% more surprising. I feel like having docs / tables that describe the
>> various segmentation algorithms, what they bring to table, and what the
>> user can get out of it might be more worth while.
>> Is this a viable path?
I think it is :)
I usually read the manual when I buy a power tool I'm not familiar with.
> I don't think it is. Again, the future reactions to Boost.Text will bear
> that out (or not!).
> Unicode mailing list
SG16 list run by email@example.com