On Wed, 27 Mar 2019 at 20:34 Markus Scherer <markus.icu@gmail.com> wrote:
On Wed, Mar 27, 2019 at 9:44 AM JF Bastien <cxx@jfbastien.com> wrote:
Are you worried that implementations wouldn't be able to use ICU directly?

No, but it feels like an unnecessary addition to the language when lots of people have competent existing libraries for the purpose.

As such, it looks like spec and library bloat.

Many users will want to be able to use the latest version of Unicode, which will tend to be newer than what their compiler provides.

How so?

I don't know what you mean with "How so?".
Lots of people routinely use a compiler and/or a standard library that's not the latest version, and even the latest version may be some months behind the latest Unicode version.

A library like ICU follows Unicode very closely, and for a developer, it's easier to update a third-party library than to guarantee that the language runtime on all of their users' machines are up to date.

If data structures need to be changed, it's not only a maintenance burden but potentially a specification burden. Or do you mean rebuilding tries, or new parsers of the unicode database? 

Unicode data file parsers sometimes need adjustment for new values.
Sometimes derivation of properties changes; if you use some derivations yourself, you need to carefully review spec changes to follow.
In data structures, it is common to encode many values as ints with few bits; sometimes the addition of values will require widening fields and tables.
Numeric values are essentially open-ended, but data storage isn't. If you want to store large values and fractions, sometimes you need to adjust your data structures.

Then there are contributory properties like Other_Alphabetic that are not generally obvious as something that should not be exposed via API; I didn't check if that's in the proposal.


I specifically didn't expose the properties used for derivations in the api because as you say it's not useful - and TR#44 explicitely recommend not to do it
I've also excluded properties that are deprecated or superseded.
There is a number of properties that are only useful to specific algorithms or domains (like the BiDi properties), which we could exclude - on the other end TR#18 call for them being supported in regexes and in general I think that if we are to have Unicode support in the standard, it might be misguided to cherry pick things  - especially the binary properties which have a low implementation burden.
 

markus