ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 10 Jan 2024 09:07:31 +0100

On Wed, Jan 10, 2024 at 4:24 AM Tom Honermann <tom_at_[hidden]> wrote:

> On 1/7/24 4:55 PM, Jens Maurer wrote:
> > On 07/01/2024 21.27, Tom Honermann wrote:
> >> The C++ standard includes in its bibliography <
> http://eel.is/c++draft/bibliography> an undated reference to the IANA
> Time Zone Database <https://www.iana.org/time-zones> with a linked
> reference in [time.zone.general]p1 <
> http://eel.is/c++draft/time#zone.general-1>. I grant that is a
> non-normative reference and the use of it differs somewhat from the
> situation we face with referencing the Unicode Standard, but it is an
> example of specified behavior that is intended to change at points that are
> not aligned with the release of C++ standard revisions.
> > I don't think we specify anything at all regarding the details of
> timezone values,
> > and there's quite a strong differentiation between the timezone data
> (which is
> > IANA) and the "algorithms" on top of that data (which are C++-specified).
> I agree and acknowledged that there are differences. But there are
> similarities as well. std::format may produce different output for the
> same chrono time_point value for different implementations when timezone
> information is included if the implementations have different versions
> of the timezone DB. That isn't so different from different output being
> produced for the same code point based on Unicode version.
> >
> >>> Why are features added to Unicode any different, conceptually?
> >> It is desirable that programs written and compiled for a particular C++
> standard revision be able to correctly consume text produced in accordance
> with newer Unicode standards subject to limitations imposed by the
> interfaces that we specify. Requiring that programmers migrate their code
> to newer C++ standards in order to take advantage of corrections in newer
> Unicode standards would impose an unnecessary hindrance.
> > So, we don't have a "C++23" mode for compilers, we have a
> "C++23-with-Unicode-15" mode, then?
> I would say we have a "C++23" mode for compilers and that the Unicode
> version is implementation-defined.
> > And maybe compiler vendors opt not to support older Unicode modes when
> moving forward.
> Yes.
> >
> >>> As a user, I foremost want portability: A program working with
> compiler X claiming
> >>> conformance to C++ZZ should work unchanged on a different compiler Y
> also claiming
> >>> conformance to C++ZZ. That portability argument is the only reason we
> have WG21
> >>> to start with. If compiler X gives me newer Unicode than compiler Y,
> I may have
> >>> used newer named-universal-characters or relied on newer Unicode
> algorithm behavior
> >>> when developing my program, just to see it break down when moving to
> compiler Y
> >>> that hasn't gotten around to upgrading to the new Unicode version, yet.
> >> I think these concerns are adequately addressed by specifying a minimum
> Unicode version. Note that implementations are always free to accept
> additional character names as a conforming extension (a diagnostic for use
> of such names can be issued).
> > There's no such thing as a "conforming extension" in C++.
> Not that is recognized by the standard, but that is the terminology
> commonly used when an implementation gives meaning to code that is
> ill-formed according to the standard.
> >
> > [lex.charset] p5 requires that a program be flagged as ill-formed
> > if a named-universal-character is spelled that doesn't exist (yet).
> > So, it's not that a "diagnostic can be issued"; it must be issued.
> Yes.
> >
> >>> That's bad, and in my view much worse than having the users of
> compiler X wait
> >>> three years until they get the new feature. Again, compiler vendors
> have options
> >>> to offer post-standard features to their audience if they so choose;
> everybody
> >>> opting in to such options is aware that their code might be
> non-portable.
> >> I think the attention placed on backward compatibility by the Unicode
> Consortium suffices here; I think their efforts are at least on par with
> WG21.
> > I don't think Unicode will (or can) consider possible ABI breaks in C++
> implementations
> > of their algorithms, should we ever get there. Note that the
> availability of templates
> > in C++ might establish ABI boundaries at surprising locations in the
> view of
> > implementations in other programming languages (or in less
> template-heavy C++).
> Agreed. That is why I previously suggested we might want to be careful
> to ensure that Unicode features that don't have a strong stability
> policy are isolate behind an ABI boundary. For the case Jonathan
> reported, it looks like we are in the clear. Someone please correct me
> if I'm mistaken.
> >
> >> I view the change in behavior that spawned this email thread as more of
> a bug fix than a new feature.
> > We've expressly refrained from fixing bugs in std::regex because of
> > ABI break concerns, if I remember correctly. Are we delegating that
> > choice to Unicode in some areas?
>
> I think we should strive not to do so. std::regex is a good example of a
> failure to isolate ABI concerns.
>

Right.
We know that Unicode can change, in some bounded ways, which I think
Unicode is doing a pretty good job describing on a per-algorithm basis.
Given that, implementers can (and should) hide implementation details from
ABI, or not extend ABI promises to Unicode algorithms.

>
> Tom.
>
> >
> >> Thank you for filing the CWG issue. There are clearly nuances and
> perspectives that warrant additional discussion. I'm going to add this
> topic as one of the agenda items for this week's SG16 meeting.
> > Thanks,
> > Jens
> >
>

Received on 2024-01-10 08:07:50