C++ Logo


Advanced search

Re: [SG16-Unicode] Opinion: std::text should have no begin/end

From: Tony V E <tvaneerd_at_[hidden]>
Date: Fri, 1 Jun 2018 11:28:04 -0400
On Wed, May 30, 2018 at 8:36 PM, ThePhD <phdofthehouse_at_[hidden]> wrote:

> I've been bikeshedding and looking at a wide variety of languages and
> implementations of text. Many still use code units and provide iterators to
> code units as its default level of abstraction. Others provide code points,
> and a few newer languages and libraries provide grapheme clusters.
> I'm beginning to think `std::text` -- in whatever form it takes -- should
> include no defaults and instead just pack itself with member functions such
> as `.codepoints()`, `.graphemes(/*options*/)`, `.words(/*options*/)` and
> let the user decide at what level they want to be working. I think encoding
> and normalization should be part of the type name, because those are the
> two very important, but after that we should simply be handing out views
> and letting people pick whatever abstraction level they want.
> At the very least, it means that we don't arm the wrong gun for the user
> by accident / by default. And at some level, the user will have to read
> through even the synopsis of what each of these functions will bring to
> them. Some are obvious (words, sentences, etc. text segmentation
> algorithms), but others might require some thinking (codepoints, graphemes
> in particular).
> I can understand that not having defaults could be a frustrating
> experience to start with, however. I feel like it's justifiable given that
> there's no 100% right answer. Maybe we can get 80% but it will only make
> the 20% more surprising. I feel like having docs / tables that describe the
> various segmentation algorithms, what they bring to table, and what the
> user can get out of it might be more worth while.
> Is this a viable path?
> _______________________________________________
> Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Rule #6 on Naming: _not_ understanding is better than _mis_understanding.

begin() and end() will be misunderstood - ie assumed to mean X when they
mean Y.
.codepoints(), etc, will not be guessed/assumed. They will be looked up,
then understood.

Be seeing you,

Received on 2018-06-01 17:28:06