Indeed, this is why I stated that the set of code points that can be specified via universal-character-name doesn't change.On 07/01/2024 03.14, Tom Honermann wrote:The code points that can be specified via /universal-character-name <http://eel.is/c++draft/lex.charset#nt:universal-character-name>/ don't change, but additional names may become available for use in /named-universal-character <http://eel.is/c++draft/lex.charset#nt:named-universal-character>/.That's a technically incorrect statement, because /universal-character-name/ includes /named-universal-character/ per the grammar. The set of code points that can be specified via hex digits doesn't change depending on the Unicode version; agreed.
The C++ standard includes in its bibliography an undated reference to the IANA Time Zone Database with a linked reference in [time.zone.general]p1. I grant that is a non-normative reference and the use of it differs somewhat from the situation we face with referencing the Unicode Standard, but it is an example of specified behavior that is intended to change at points that are not aligned with the release of C++ standard revisions.It is a fact that parts of the Unicode Standard will necessarily change as a byproduct of continually adding and improving support for the evolving collection of human languages. While we can choose to evolve C++ in some lockstep form with the Unicode Standard, users will nevertheless be exposed to differences in behavior at some point. It is far from clear to me that implementors and programmers benefit by having those changes happen at discrete points.For any other feature added to C++, we have expressly bought in to a model where such evolution (and exposure of differences) happens at discrete points, namely when a new C++ revision is released every three years.
Why are features added to Unicode any different, conceptually?
It is desirable that programs written and compiled for a particular C++ standard revision be able to correctly consume text produced in accordance with newer Unicode standards subject to limitations imposed by the interfaces that we specify. Requiring that programmers migrate their code to newer C++ standards in order to take advantage of corrections in newer Unicode standards would impose an unnecessary hindrance.
I don't think I've seen such a claim.From an implementation perspective, having C++23 mode use one Unicode version and C++26 mode use another version seems problematic, at least for implementations that don't provide distinct standard library implementations for each standard mode (as is the case for all major implementors).We've heard another implementer claim otherwise.
#ifdef's in standard library implementations triggering on the desired standard mode seem quite common.
They certainly are common, but they are also not without cost.
Some standard library implementations assume or require the
availability of language features from newer standard revisions in
older standard modes so as to avoid unwanted #ifdef directives.
I think these concerns are adequately addressed by specifying a minimum Unicode version. Note that implementations are always free to accept additional character names as a conforming extension (a diagnostic for use of such names can be issued).As a user, I would like and expect newer compiler versions to provide support for newer Unicode versions independent of whatever standard mode I happen to compile my code with.I disagree, from a user perspective. As a user, I foremost want portability: A program working with compiler X claiming conformance to C++ZZ should work unchanged on a different compiler Y also claiming conformance to C++ZZ. That portability argument is the only reason we have WG21 to start with. If compiler X gives me newer Unicode than compiler Y, I may have used newer named-universal-characters or relied on newer Unicode algorithm behavior when developing my program, just to see it break down when moving to compiler Y that hasn't gotten around to upgrading to the new Unicode version, yet.
That's bad, and in my view much worse than having the users of compiler X wait three years until they get the new feature. Again, compiler vendors have options to offer post-standard features to their audience if they so choose; everybody opting in to such options is aware that their code might be non-portable.
I think the attention placed on backward compatibility by the
Unicode Consortium suffices here; I think their efforts are at
least on par with WG21.
I view the change in behavior that spawned this email thread as
more of a bug fix than a new feature.
Agreed.ABI concerns are just as relevant for minor compiler upgrades as it is for major upgrades these days. Going forward, we should strive to ensure that Unicode features that don't have a strong stability policy are adequately hidden behind an ABI boundary. I don't recall having discussed use of the grapheme breaking algorithm in std::format from an ABI perspective.That applies regardless of release cadence of changed Unicode features, but is more of a pain point with mid-term Unicode updates. C++ standard versions are susceptible to ABI breaks anyway, as much as we sometimes strive to avoid them.
I think it makes sense to specify a minimum Unicode version for each C++ standard and I would not be opposed to adding such specification. However, it is possible that the choice of Unicode version might not always remain a choice that implementors make. As we add additional Unicode features to the C++ standard, implementors might find it desirable to rely on system provided Unicode services (e.g., by an OS provided build of ICU), at least for some features. I think we might be best off having the choice of Unicode version be implementation-defined and use of a recent version a QoI matter.That option feels at odds with how normative references work in the formal ISO world. Please read the intro text in [intro.refs]; I'm not seeing liberty to have a normative rThe real question is whether Unicode behavior will differ for -std=c++23 mode for gcc 14.1 vs gcc 19.1. I sure hope that it would!And I sure hope it doesn't, given the discussion we've had so far. (This sentiment is quite strong at this point.)
Thank you for filing the CWG issue. There are clearly nuances and
perspectives that warrant additional discussion. I'm going to add
this topic as one of the agenda items for this week's SG16
meeting.
Tom.
Jens