On Wed, May 30, 2018 at 8:36 PM, ThePhD <phdofthehouse@gmail.com> wrote:
I've been bikeshedding and looking at a wide variety of languages and implementations of text. Many still use code units and provide iterators to code units as its default level of abstraction. Others provide code points, and a few newer languages and libraries provide grapheme clusters.

I'm beginning to think `std::text` -- in whatever form it takes -- should include no defaults and instead just pack itself with member functions such as `.codepoints()`, `.graphemes(/*options*/)`, `.words(/*options*/)` and let the user decide at what level they want to be working. I think encoding and normalization should be part of the type name, because those are the two very important, but after that we should simply be handing out views and letting people pick whatever abstraction level they want.

At the very least, it means that we don't arm the wrong gun for the user by accident / by default. And at some level, the user will have to read through even the synopsis of what each of these functions will bring to them. Some are obvious (words, sentences, etc. text segmentation algorithms), but others might require some thinking (codepoints, graphemes in particular).

I can understand that not having defaults could be a frustrating experience to start with, however. I feel like it's justifiable given that there's no 100% right answer. Maybe we can get 80% but it will only make the 20% more surprising. I feel like having docs / tables that describe the various segmentation algorithms, what they bring to table, and what the user can get out of it might be more worth while.

Is this a viable path?

_______________________________________________
Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode



Rule #6 on Naming: _not_ understanding is better than _mis_understanding.

begin() and end() will be misunderstood - ie assumed to mean X when they mean Y.
.codepoints(), etc, will not be guessed/assumed.  They will be looked up, then understood.


--
Be seeing you,
Tony