C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] D1628R0 (Unicode character properties)

From: Markus Scherer <markus.icu_at_[hidden]>
Date: Wed, 27 Mar 2019 12:33:57 -0700
On Wed, Mar 27, 2019 at 9:44 AM JF Bastien <cxx_at_[hidden]> wrote:

> Are you worried that implementations wouldn't be able to use ICU directly?
>

No, but it feels like an unnecessary addition to the language when lots of
people have competent existing libraries for the purpose.

As such, it looks like spec and library bloat.

Many users will want to be able to use the latest version of Unicode, which
>> will tend to be newer than what their compiler provides.
>>
>
> How so?
>

I don't know what you mean with "How so?".
Lots of people routinely use a compiler and/or a standard library that's
not the latest version, and even the latest version may be some months
behind the latest Unicode version.

A library like ICU follows Unicode very closely, and for a developer, it's
easier to update a third-party library than to guarantee that the language
runtime on all of their users' machines are up to date.

If data structures need to be changed, it's not only a maintenance burden
> but potentially a specification burden. Or do you mean rebuilding tries, or
> new parsers of the unicode database?


Unicode data file parsers sometimes need adjustment for new values.
Sometimes derivation of properties changes; if you use some derivations
yourself, you need to carefully review spec changes to follow.
In data structures, it is common to encode many values as ints with few
bits; sometimes the addition of values will require widening fields and
tables.
Numeric values are essentially open-ended, but data storage isn't. If you
want to store large values and fractions, sometimes you need to adjust your
data structures.

Then there are contributory properties like Other_Alphabetic that are not
generally obvious as something that should not be exposed via API; I didn't
check if that's in the proposal.

markus

Received on 2019-03-27 20:34:11