C++ Logo


Advanced search

Re: [SG16-Unicode] D1628R0 (Unicode character properties)

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 27 Mar 2019 23:12:02 +0100
On Wed, 27 Mar 2019 at 20:34 Markus Scherer <markus.icu_at_[hidden]> wrote:

> On Wed, Mar 27, 2019 at 9:44 AM JF Bastien <cxx_at_[hidden]> wrote:
>> Are you worried that implementations wouldn't be able to use ICU directly?
> No, but it feels like an unnecessary addition to the language when lots of
> people have competent existing libraries for the purpose.
> As such, it looks like spec and library bloat.
> Many users will want to be able to use the latest version of Unicode,
>>> which will tend to be newer than what their compiler provides.
>> How so?
> I don't know what you mean with "How so?".
> Lots of people routinely use a compiler and/or a standard library that's
> not the latest version, and even the latest version may be some months
> behind the latest Unicode version.
> A library like ICU follows Unicode very closely, and for a developer, it's
> easier to update a third-party library than to guarantee that the language
> runtime on all of their users' machines are up to date.
> If data structures need to be changed, it's not only a maintenance burden
>> but potentially a specification burden. Or do you mean rebuilding tries, or
>> new parsers of the unicode database?
> Unicode data file parsers sometimes need adjustment for new values.
> Sometimes derivation of properties changes; if you use some derivations
> yourself, you need to carefully review spec changes to follow.
> In data structures, it is common to encode many values as ints with few
> bits; sometimes the addition of values will require widening fields and
> tables.
> Numeric values are essentially open-ended, but data storage isn't. If you
> want to store large values and fractions, sometimes you need to adjust your
> data structures.
> Then there are contributory properties like Other_Alphabetic that are not
> generally obvious as something that should not be exposed via API; I didn't
> check if that's in the proposal.

I specifically didn't expose the properties used for derivations in the api
because as you say it's not useful - and TR#44 explicitely recommend not to
do it
I've also excluded properties that are deprecated or superseded.
There is a number of properties that are only useful to specific algorithms
or domains (like the BiDi properties), which we could exclude - on the
other end TR#18 call for them being supported in regexes and in general I
think that if we are to have Unicode support in the standard, it might be
misguided to cherry pick things - especially the binary properties which
have a low implementation burden.

> markus

Received on 2019-03-27 23:12:16