C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sun, 7 Jan 2024 13:45:10 +0100
On 07/01/2024 13.05, Robin Leroy wrote:
> Le dim. 7 janv. 2024 à 09:49, JF Bastien via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> a écrit :
>
> > Has someone reached out to them (or is one listening now?) to understand if they’ve considered the specific issue in front of us?
>
> A liaison is on this reflector, Robin Leroy.
>
>
>
> �
>
> 𒇻 𒊭𒀠𒈠𒋫
>
> Le dim. 7 janv. 2024 à 03:14, Tom Honermann via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> a écrit :
>
> […]
>
> As a user, I would like and expect newer compiler versions to provide support for newer Unicode versions independent of whatever standard mode I happen to compile my code with.
>
> […]
>
> I think it makes sense to specify a minimum Unicode version for each C++ standard and I would not be opposed to adding such specification. […]
>
> The points made by Tom seem persuasive to me.
> In particular, from the perspective of ensuring the interoperability of text, while the pace of Unicode updates means one always has to deal with slightly different versions of Unicode, having C++ conformance mandate sticking to and shipping very old versions of Unicode seems problematic, since—as you all know all too well and lament—C++ versions get used well past their withdrawal date. ICU itself is just switching from C++11 to C++17, while it uses modern compilers.
>
> As Unicode algorithms are made available as part of the C++ standard library in future versions—which I see as a good thing in principle, especially for fundamental ones like normalization—, a dated reference with no allowance for newer versions would in practice mean that part of my job advising my colleagues on best practices in internationalization would be to discourage any use of them in favour of libraries that are kept up to date with Unicode.

That's a good point. Maybe providing volatile Unicode algorithms in the C++
standard library isn't such a good idea, after all.
(UTF-8 to UTF-16 is stable, but apparently some grapheme clustering isn't.)

> Le dim. 7 janv. 2024 à 09:31, Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> a écrit :
>
> On 07/01/2024 09.19, Jens Maurer via SG16 wrote:
>
> > That option feels at odds with how normative references work in the formal ISO world.
> > Please read the intro text in [intro.refs]; […]
>
> "I'm not seeing liberty to have a normative reference where the implementation
> can choose."
>
> I have cited in an earlier email the example of another language standard under the ægis of ISO/IEC JTC 1/SC 22 which has done so (with ISO/IEC 10646) since 2012, namely ISO/IEC 8652.
> That standard was most recently revised in May of 2023, so this seems to still be OK.

Well, the normative references of that standard refer to ISO/IEC 10646:2020
specifically.

https://www.iso.org/obp/ui/en/#iso:std:iso-iec:8652:ed-4:v1:en

The text you linked to does say
http://www.ada-auth.org/standards/22aarm/html/AA-2-1.html#p17

"The categories defined above, as well as case mapping and folding, may be based
on an implementation-defined version of ISO/IEC 10646 (2003 edition or later)."

That limits the freedom to character classifications and case folding,
but nothing else (in particular, if we were to follow that lead, it's
not obvious that the named-universal-character repertoire can be extended
by an implementation).

And the "ancient version of Unicode with modern C++" isn't, in any case,
because we would always refer to the then-current version of Unicode
with every release of C++.

Jens

Received on 2024-01-07 12:45:17