C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: JF Bastien <cxx_at_[hidden]>
Date: Mon, 8 Jan 2024 07:18:26 +0900
On Mon, Jan 8, 2024 at 6:56 AM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

>
> On 07/01/2024 21.27, Tom Honermann wrote:
> > The C++ standard includes in its bibliography <
> http://eel.is/c++draft/bibliography> an undated reference to the IANA
> Time Zone Database <https://www.iana.org/time-zones> with a linked
> reference in [time.zone.general]p1 <
> http://eel.is/c++draft/time#zone.general-1>. I grant that is a
> non-normative reference and the use of it differs somewhat from the
> situation we face with referencing the Unicode Standard, but it is an
> example of specified behavior that is intended to change at points that are
> not aligned with the release of C++ standard revisions.
>
> I don't think we specify anything at all regarding the details of timezone
> values,
> and there's quite a strong differentiation between the timezone data
> (which is
> IANA) and the "algorithms" on top of that data (which are C++-specified).
>
> >> Why are features added to Unicode any different, conceptually?
> >
> > It is desirable that programs written and compiled for a particular C++
> standard revision be able to correctly consume text produced in accordance
> with newer Unicode standards subject to limitations imposed by the
> interfaces that we specify. Requiring that programmers migrate their code
> to newer C++ standards in order to take advantage of corrections in newer
> Unicode standards would impose an unnecessary hindrance.
>
> So, we don't have a "C++23" mode for compilers, we have a
> "C++23-with-Unicode-15" mode, then?
> And maybe compiler vendors opt not to support older Unicode modes when
> moving forward.



There’s a compile-time as well as a runtime component to this, so you’d
need two flags. For some features, the behavior might also change based not
on how the program was compiled, but on which dynamic library is loaded at
runtime.

I think we need a paper working through this complexity.




>
> >> As a user, I foremost want portability: A program working with compiler
> X claiming
> >> conformance to C++ZZ should work unchanged on a different compiler Y
> also claiming
> >> conformance to C++ZZ. That portability argument is the only reason we
> have WG21
> >> to start with. If compiler X gives me newer Unicode than compiler Y, I
> may have
> >> used newer named-universal-characters or relied on newer Unicode
> algorithm behavior
> >> when developing my program, just to see it break down when moving to
> compiler Y
> >> that hasn't gotten around to upgrading to the new Unicode version, yet.
> > I think these concerns are adequately addressed by specifying a minimum
> Unicode version. Note that implementations are always free to accept
> additional character names as a conforming extension (a diagnostic for use
> of such names can be issued).
>
> There's no such thing as a "conforming extension" in C++.
>
> [lex.charset] p5 requires that a program be flagged as ill-formed
> if a named-universal-character is spelled that doesn't exist (yet).
> So, it's not that a "diagnostic can be issued"; it must be issued.
>
> >> That's bad, and in my view much worse than having the users of compiler
> X wait
> >> three years until they get the new feature. Again, compiler vendors
> have options
> >> to offer post-standard features to their audience if they so choose;
> everybody
> >> opting in to such options is aware that their code might be
> non-portable.
> >
> > I think the attention placed on backward compatibility by the Unicode
> Consortium suffices here; I think their efforts are at least on par with
> WG21.
>
> I don't think Unicode will (or can) consider possible ABI breaks in C++
> implementations
> of their algorithms, should we ever get there. Note that the availability
> of templates
> in C++ might establish ABI boundaries at surprising locations in the view
> of
> implementations in other programming languages (or in less template-heavy
> C++).
>
> > I view the change in behavior that spawned this email thread as more of
> a bug fix than a new feature.
>
> We've expressly refrained from fixing bugs in std::regex because of
> ABI break concerns, if I remember correctly. Are we delegating that
> choice to Unicode in some areas?
>
> > Thank you for filing the CWG issue. There are clearly nuances and
> perspectives that warrant additional discussion. I'm going to add this
> topic as one of the agenda items for this week's SG16 meeting.
>
> Thanks,
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-01-07 22:18:38