ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Robin Leroy <eggrobin_at_[hidden]>
Date: Sun, 7 Jan 2024 13:05:54 +0100

Le dim. 7 janv. 2024 à 09:49, JF Bastien via SG16 <sg16_at_[hidden]> a
écrit :

> > Has someone reached out to them (or is one listening now?) to understand
>> if they’ve considered the specific issue in front of us?
>>
>> A liaison is on this reflector, Robin Leroy.
>
>
>
> �
>
𒇻 𒊭𒀠𒈠𒋫

Le dim. 7 janv. 2024 à 03:14, Tom Honermann via SG16 <sg16_at_[hidden]>
a écrit :

> […]

As a user, I would like and expect newer compiler versions to provide
> support for newer Unicode versions independent of whatever standard mode I
> happen to compile my code with.

[…]

I think it makes sense to specify a minimum Unicode version for each C++
> standard and I would not be opposed to adding such specification. […]

The points made by Tom seem persuasive to me.
In particular, from the perspective of ensuring the interoperability of
text, while the pace of Unicode updates means one always has to deal with
slightly different versions of Unicode, having C++ conformance mandate
sticking to and shipping very old versions of Unicode seems problematic,
since—as you all know all too well and lament—C++ versions get used well
past their withdrawal date. ICU itself is just switching from C++11 to
C++17, while it uses modern compilers.

As Unicode algorithms are made available as part of the C++ standard
library in future versions—which I see as a good thing in principle,
especially for fundamental ones like normalization—, a dated reference with
no allowance for newer versions would in practice mean that part of my job
advising my colleagues on best practices in internationalization would be
to discourage any use of them in favour of libraries that are kept up to
date with Unicode.

Le dim. 7 janv. 2024 à 09:31, Jens Maurer via SG16 <sg16_at_[hidden]>
a écrit :

> On 07/01/2024 09.19, Jens Maurer via SG16 wrote:

> That option feels at odds with how normative references work in the
> formal ISO world.
> > Please read the intro text in [intro.refs]; […]
>
> "I'm not seeing liberty to have a normative reference where the
> implementation
> can choose."

I have cited in an earlier email the example of another language standard
under the ægis of ISO/IEC JTC 1/SC 22 which has done so (with ISO/IEC
10646) since 2012, namely ISO/IEC 8652.
That standard was most recently revised in May of 2023, so this seems to
still be OK.

On the specific point mentioned in the title of this email thread:

On Fri, Jan 5, 2024 at 5:54 PM Steve Downey <sdowney_at_[hidden]> wrote:

> Did we actually specify something for C++23 that depends on or provides
> the breaking algorithms?
>
Le ven. 5 janv. 2024 à 17:59, Corentin via SG16 <sg16_at_[hidden]> a
écrit :

> std::format width estimation requires clustering
>

[format.string.std], paragraph 13
<https://eel.is/c++draft/format.string.std#13> reads

> For a sequence of characters in UTF-8, UTF-16, or UTF-32, an
> implementation should use as its field width the sum of the field widths of
> the first code point of each extended grapheme cluster. Extended grapheme
> clusters are defined by UAX #29 of the Unicode Standard. […]

That is *should*, not *shall*, so is there really a conformance requirement
here, regardless of the version meant by the phrase “the Unicode Standard”?
See also this discussion from the 2022-11-02 meeting of SG 16
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2022.md#november-2nd-2022>
:

   - Hubert asked why the reference for extended grapheme cluster is
   non-normative.
   - Jens replied that he thinks UAX #29 <https://unicode.org/reports/tr29> is
   only referenced to satisfy normative encouragement for an implementation
   direction.
   - Charlie expressed agreement with Jens' recollection.

Best regards,

Robin Leroy

Received on 2024-01-07 12:06:14