ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: JF Bastien <cxx_at_[hidden]>
Date: Sun, 7 Jan 2024 17:30:38 +0900

On Sun, Jan 7, 2024 at 5:19 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

>
>
> On 07/01/2024 03.14, Tom Honermann wrote:
> > The code points that can be specified via /universal-character-name <
> http://eel.is/c++draft/lex.charset#nt:universal-character-name>/ don't
> change, but additional names may become available for use in
> /named-universal-character <
> http://eel.is/c++draft/lex.charset#nt:named-universal-character>/.
>
> That's a technically incorrect statement, because
> /universal-character-name/
> includes /named-universal-character/ per the grammar.
>
> The set of code points that can be specified via hex digits doesn't change
> depending on the Unicode version; agreed.
>
> > It is a fact that parts of the Unicode Standard will necessarily change
> as a byproduct of continually adding and improving support for the evolving
> collection of human languages. While we can choose to evolve C++ in some
> lockstep form with the Unicode Standard, users will nevertheless be exposed
> to differences in behavior at some point. It is far from clear to me that
> implementors and programmers benefit by having those changes happen at
> discrete points.
>
> For any other feature added to C++, we have expressly bought in to a model
> where
> such evolution (and exposure of differences) happens at discrete points,
> namely
> when a new C++ revision is released every three years.
>
> Why are features added to Unicode any different, conceptually?

My recollection from our discussion was that the Unicode Consortium itself
strongly recommends a floating reference. We therefore followed this
recommendation.

Has someone reached out to them (or is one listening now?) to understand if
they’ve considered the specific issue in front of us?

>
> > From an implementation perspective, having C++23 mode use one Unicode
> version and C++26 mode use another version seems problematic, at least for
> implementations that don't provide distinct standard library
> implementations for each standard mode (as is the case for all major
> implementors).
>
> We've heard another implementer claim otherwise.
> #ifdef's in standard library implementations triggering on the desired
> standard mode seem quite common.
>
> > As a user, I would like and expect newer compiler versions to provide
> support for newer Unicode versions independent of whatever standard mode I
> happen to compile my code with.
>
> I disagree, from a user perspective.
>
> As a user, I foremost want portability: A program working with compiler X
> claiming
> conformance to C++ZZ should work unchanged on a different compiler Y also
> claiming
> conformance to C++ZZ. That portability argument is the only reason we
> have WG21
> to start with. If compiler X gives me newer Unicode than compiler Y, I
> may have
> used newer named-universal-characters or relied on newer Unicode algorithm
> behavior
> when developing my program, just to see it break down when moving to
> compiler Y
> that hasn't gotten around to upgrading to the new Unicode version, yet.
>
> That's bad, and in my view much worse than having the users of compiler X
> wait
> three years until they get the new feature. Again, compiler vendors have
> options
> to offer post-standard features to their audience if they so choose;
> everybody
> opting in to such options is aware that their code might be non-portable.
>
> > ABI concerns are just as relevant for minor compiler upgrades as it is
> for major upgrades these days. Going forward, we should strive to ensure
> that Unicode features that don't have a strong stability policy are
> adequately hidden behind an ABI boundary. I don't recall having discussed
> use of the grapheme breaking algorithm in std::format from an ABI
> perspective.
>
> That applies regardless of release cadence of changed Unicode features,
> but is more of a pain point with mid-term Unicode updates. C++ standard
> versions are susceptible to ABI breaks anyway, as much as we sometimes
> strive to avoid them.
>
> > I think it makes sense to specify a minimum Unicode version for each C++
> standard and I would not be opposed to adding such specification. However,
> it is possible that the choice of Unicode version might not always remain a
> choice that implementors make. As we add additional Unicode features to the
> C++ standard, implementors might find it desirable to rely on system
> provided Unicode services (e.g., by an OS provided build of ICU), at least
> for some features. I think we might be best off having the choice of
> Unicode version be implementation-defined and use of a recent version a QoI
> matter.
>
> That option feels at odds with how normative references work in the formal
> ISO world.
> Please read the intro text in [intro.refs]; I'm not seeing liberty to have
> a normative r
> > The real question is whether Unicode behavior will differ for
> -std=c++23 mode for gcc 14.1 vs gcc 19.1. I sure hope that it would!
>
> And I sure hope it doesn't, given the discussion we've had so far.
> (This sentiment is quite strong at this point.)
>
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-01-07 08:30:50