ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sun, 7 Jan 2024 11:51:43 +0100

On Sun, Jan 7, 2024 at 9:30 AM JF Bastien <cxx_at_[hidden]> wrote:

>
>
> On Sun, Jan 7, 2024 at 5:19 PM Jens Maurer via SG16 <sg16_at_[hidden]>
> wrote:
>
>>
>>
>> On 07/01/2024 03.14, Tom Honermann wrote:
>> > The code points that can be specified via /universal-character-name <
>> http://eel.is/c++draft/lex.charset#nt:universal-character-name>/ don't
>> change, but additional names may become available for use in
>> /named-universal-character <
>> http://eel.is/c++draft/lex.charset#nt:named-universal-character>/.
>>
>> That's a technically incorrect statement, because
>> /universal-character-name/
>> includes /named-universal-character/ per the grammar.
>>
>> The set of code points that can be specified via hex digits doesn't change
>> depending on the Unicode version; agreed.
>>
>> > It is a fact that parts of the Unicode Standard will necessarily change
>> as a byproduct of continually adding and improving support for the evolving
>> collection of human languages. While we can choose to evolve C++ in some
>> lockstep form with the Unicode Standard, users will nevertheless be exposed
>> to differences in behavior at some point. It is far from clear to me that
>> implementors and programmers benefit by having those changes happen at
>> discrete points.
>>
>> For any other feature added to C++, we have expressly bought in to a
>> model where
>> such evolution (and exposure of differences) happens at discrete points,
>> namely
>> when a new C++ revision is released every three years.
>>
>> Why are features added to Unicode any different, conceptually?
>
>
>
> My recollection from our discussion was that the Unicode Consortium itself
> strongly recommends a floating reference. We therefore followed this
> recommendation.
>
> Has someone reached out to them (or is one listening now?) to understand
> if they’ve considered the specific issue in front of us?
>

It is my recollection as well.
Generally my understanding is that SG16 would be happy with the standard
setting a minimum version (i.e. we do not want a c++26 compiler to keep
using unicode 2.0 forever) but generally we do not want to prevent
implementers from using a newer version.
I agree that we might want to add some sort of wording making that intent
more clear.

>
>
>>
>> > From an implementation perspective, having C++23 mode use one Unicode
>> version and C++26 mode use another version seems problematic, at least for
>> implementations that don't provide distinct standard library
>> implementations for each standard mode (as is the case for all major
>> implementors).
>>
>> We've heard another implementer claim otherwise.
>> #ifdef's in standard library implementations triggering on the desired
>> standard mode seem quite common.
>>
>> > As a user, I would like and expect newer compiler versions to provide
>> support for newer Unicode versions independent of whatever standard mode I
>> happen to compile my code with.
>>
>> I disagree, from a user perspective.
>>
>> As a user, I foremost want portability: A program working with compiler X
>> claiming
>> conformance to C++ZZ should work unchanged on a different compiler Y also
>> claiming
>> conformance to C++ZZ. That portability argument is the only reason we
>> have WG21
>> to start with. If compiler X gives me newer Unicode than compiler Y, I
>> may have
>> used newer named-universal-characters or relied on newer Unicode
>> algorithm behavior
>> when developing my program, just to see it break down when moving to
>> compiler Y
>> that hasn't gotten around to upgrading to the new Unicode version, yet.
>>
>> That's bad, and in my view much worse than having the users of compiler X
>> wait
>> three years until they get the new feature. Again, compiler vendors have
>> options
>> to offer post-standard features to their audience if they so choose;
>> everybody
>> opting in to such options is aware that their code might be non-portable.
>>
>> > ABI concerns are just as relevant for minor compiler upgrades as it is
>> for major upgrades these days. Going forward, we should strive to ensure
>> that Unicode features that don't have a strong stability policy are
>> adequately hidden behind an ABI boundary. I don't recall having discussed
>> use of the grapheme breaking algorithm in std::format from an ABI
>> perspective.
>>
>> That applies regardless of release cadence of changed Unicode features,
>> but is more of a pain point with mid-term Unicode updates. C++ standard
>> versions are susceptible to ABI breaks anyway, as much as we sometimes
>> strive to avoid them.
>>
>> > I think it makes sense to specify a minimum Unicode version for each
>> C++ standard and I would not be opposed to adding such specification.
>> However, it is possible that the choice of Unicode version might not always
>> remain a choice that implementors make. As we add additional Unicode
>> features to the C++ standard, implementors might find it desirable to rely
>> on system provided Unicode services (e.g., by an OS provided build of ICU),
>> at least for some features. I think we might be best off having the choice
>> of Unicode version be implementation-defined and use of a recent version a
>> QoI matter.
>>
>> That option feels at odds with how normative references work in the
>> formal ISO world.
>> Please read the intro text in [intro.refs]; I'm not seeing liberty to have
>> a normative r
>> > The real question is whether Unicode behavior will differ for
>> -std=c++23 mode for gcc 14.1 vs gcc 19.1. I sure hope that it would!
>>
>> And I sure hope it doesn't, given the discussion we've had so far.
>> (This sentiment is quite strong at this point.)
>>
>> Jens
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2024-01-07 10:52:01