ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sun, 7 Jan 2024 22:55:49 +0100

On 07/01/2024 21.27, Tom Honermann wrote:
> The C++ standard includes in its bibliography <http://eel.is/c++draft/bibliography> an undated reference to the IANA Time Zone Database <https://www.iana.org/time-zones> with a linked reference in [time.zone.general]p1 <http://eel.is/c++draft/time#zone.general-1>. I grant that is a non-normative reference and the use of it differs somewhat from the situation we face with referencing the Unicode Standard, but it is an example of specified behavior that is intended to change at points that are not aligned with the release of C++ standard revisions.

I don't think we specify anything at all regarding the details of timezone values,
and there's quite a strong differentiation between the timezone data (which is
IANA) and the "algorithms" on top of that data (which are C++-specified).

>> Why are features added to Unicode any different, conceptually?
>
> It is desirable that programs written and compiled for a particular C++ standard revision be able to correctly consume text produced in accordance with newer Unicode standards subject to limitations imposed by the interfaces that we specify. Requiring that programmers migrate their code to newer C++ standards in order to take advantage of corrections in newer Unicode standards would impose an unnecessary hindrance.

So, we don't have a "C++23" mode for compilers, we have a "C++23-with-Unicode-15" mode, then?
And maybe compiler vendors opt not to support older Unicode modes when moving forward.

>> As a user, I foremost want portability: A program working with compiler X claiming
>> conformance to C++ZZ should work unchanged on a different compiler Y also claiming
>> conformance to C++ZZ. That portability argument is the only reason we have WG21
>> to start with. If compiler X gives me newer Unicode than compiler Y, I may have
>> used newer named-universal-characters or relied on newer Unicode algorithm behavior
>> when developing my program, just to see it break down when moving to compiler Y
>> that hasn't gotten around to upgrading to the new Unicode version, yet.
> I think these concerns are adequately addressed by specifying a minimum Unicode version. Note that implementations are always free to accept additional character names as a conforming extension (a diagnostic for use of such names can be issued).

There's no such thing as a "conforming extension" in C++.

[lex.charset] p5 requires that a program be flagged as ill-formed
if a named-universal-character is spelled that doesn't exist (yet).
So, it's not that a "diagnostic can be issued"; it must be issued.

>> That's bad, and in my view much worse than having the users of compiler X wait
>> three years until they get the new feature. Again, compiler vendors have options
>> to offer post-standard features to their audience if they so choose; everybody
>> opting in to such options is aware that their code might be non-portable.
>
> I think the attention placed on backward compatibility by the Unicode Consortium suffices here; I think their efforts are at least on par with WG21.

I don't think Unicode will (or can) consider possible ABI breaks in C++ implementations
of their algorithms, should we ever get there. Note that the availability of templates
in C++ might establish ABI boundaries at surprising locations in the view of
implementations in other programming languages (or in less template-heavy C++).

> I view the change in behavior that spawned this email thread as more of a bug fix than a new feature.

We've expressly refrained from fixing bugs in std::regex because of
ABI break concerns, if I remember correctly. Are we delegating that
choice to Unicode in some areas?

> Thank you for filing the CWG issue. There are clearly nuances and perspectives that warrant additional discussion. I'm going to add this topic as one of the agenda items for this week's SG16 meeting.

Thanks,
Jens

Received on 2024-01-07 21:55:59