> Has someone reached out to them (or is one listening now?) to understand if they’ve considered the specific issue in front of us?
A liaison is on this reflector, Robin Leroy.
𒇻 𒊭𒀠𒈠𒋫
[…]
As a user, I would like and expect newer compiler versions to provide support for newer Unicode versions independent of whatever standard mode I happen to compile my code with.
[…]
I think it makes sense to specify a minimum Unicode version for each C++ standard and I would not be opposed to adding such specification. […]
The points made by Tom seem persuasive to me.
In particular, from the perspective of ensuring the interoperability of text, while the pace of Unicode updates means one always has to deal with slightly different versions of Unicode, having C++ conformance mandate sticking to and shipping very old versions of Unicode seems problematic, since—as you all know all too well and lament—C++ versions get used well past their withdrawal date. ICU itself is just switching from C++11 to C++17, while it uses modern compilers.
As Unicode algorithms are made available as part of the C++ standard library in future versions—which I see as a good thing in principle, especially for fundamental ones like normalization—, a dated reference with no allowance for newer versions would in practice mean that part of my job advising my colleagues on best practices in internationalization would be to discourage any use of them in favour of libraries that are kept up to date with Unicode.
On 07/01/2024 09.19, Jens Maurer via SG16 wrote:
> That option feels at odds with how normative references work in the formal ISO world.
> Please read the intro text in [intro.refs]; […]
"I'm not seeing liberty to have a normative reference where the implementation
can choose."
I have cited in an earlier email the example of another language standard under the ægis of ISO/IEC JTC 1/SC 22 which has done so (with ISO/IEC 10646) since 2012, namely ISO/IEC 8652.
That standard was most recently revised in May of 2023, so this seems to still be OK.
On the specific point mentioned in the title of this email thread:
Did we actually specify something for C++23 that depends on or provides the breaking algorithms?
std::format width estimation requires clustering
For a sequence of characters in UTF-8, UTF-16, or UTF-32, an implementation should use as its field width the sum of the field widths of the first code point of each extended grapheme cluster. Extended grapheme clusters are defined by UAX #29 of the Unicode Standard. […]
That is should, not shall, so is there really a conformance requirement here, regardless of the version meant by the phrase “the Unicode Standard”?
- Hubert asked why the reference for extended grapheme cluster is non-normative.
- Jens replied that he thinks UAX #29 is only referenced to satisfy normative encouragement for an implementation direction.
- Charlie expressed agreement with Jens' recollection.
Best regards,
Robin Leroy