Le dim. 7 janv. 2024 à 13:45, Jens Maurer <jens.maurer@gmx.net> a écrit :

Well, the normative references of that standard refer to ISO/IEC 10646:2020
specifically.

https://www.iso.org/obp/ui/en/#iso:std:iso-iec:8652:ed-4:v1:en

The text you linked to does say
http://www.ada-auth.org/standards/22aarm/html/AA-2-1.html#p17

"The categories defined above, as well as case mapping and folding, may be based
on an implementation-defined version of ISO/IEC 10646 (2003 edition or later)."

Note however that WG 9 has approved for a future Corrigendum the addition of a versionless reference to the Unicode Character Database, with paragraph 2.1(17) being changed to

The categories defined above, as well as case mapping and folding, may be based on an implementation-defined version of {the Unicode Character Database (4.0 or later)}[ISO/IEC 10646 (2003 edition or later)].

Note also that that AI has the class binding interpretation, the equivalent of a C++ defect report.

That limits the freedom to character classifications and case folding,
but nothing else (in particular, if we were to follow that lead, it's
not obvious that the named-universal-character repertoire can be extended
by an implementation).

That would be because those are the only properties Ada uses from the UCD.

I should note that the General_Category property assignments do not have stability guarantees, whereas the names do; the names are less problematic here.

The expansion of the répertoire is covered by the reference to the General_Category property (characters move out of General_Category=Unassigned).

Maybe providing volatile Unicode algorithms in the C++
standard library isn't such a good idea, after all.
(UTF-8 to UTF-16 is stable, but apparently some grapheme clustering isn't.)

Note that the word “stable” can mean many things, from completely immutable (encoding forms), to evolving while being immutable on the assigned répertoire (normalization, character names), to evolving while being backward compatible (identifiers).

These latter kinds of backward compatible stability policies are designed to facilitate the use of versionless references to the Unicode Standard and frequent implementation upgrades, improving the interoperability across implementations of text interchanged using an expanding répertoire.

We also try to move carefully even where we have no formal stability guarantees. In particular we are aware that grapheme cluster segmentation affects many implementers out there (Swift also has it deep in its standard libraries), especially when it comes to the state machine (the property assignments can change more freely).

The decision to change the grapheme cluster breaking state machine in Unicode Version 15.1 came after the change was tested in the wild for four years as the ICU default, see L2/23-079 Section 5.5.

Though again it seems to me that there is no conformance requirement in C++ to use any version of UAX #29 grapheme cluster breaking.

Le dim. 7 janv. 2024 à 13:12, Jonathan Wakely <cxx@kayari.org> a écrit :

If I use the field width of the first code point in <some cluster that bears a resemblance to an extended grapheme cluster as described by Unicode> then that's still conforming.

In particular, and perhaps usefully for implementers, that reading means a conformant implementation could rely on an ICU implementation that has tailorings “from the future”, as ICU’s grapheme cluster breaking did from 2019 to 2023.

Best regards,

Robin Leroy