C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 9 Jan 2024 22:24:16 -0500
On 1/7/24 4:55 PM, Jens Maurer wrote:
> On 07/01/2024 21.27, Tom Honermann wrote:
>> The C++ standard includes in its bibliography <http://eel.is/c++draft/bibliography> an undated reference to the IANA Time Zone Database <https://www.iana.org/time-zones> with a linked reference in [time.zone.general]p1 <http://eel.is/c++draft/time#zone.general-1>. I grant that is a non-normative reference and the use of it differs somewhat from the situation we face with referencing the Unicode Standard, but it is an example of specified behavior that is intended to change at points that are not aligned with the release of C++ standard revisions.
> I don't think we specify anything at all regarding the details of timezone values,
> and there's quite a strong differentiation between the timezone data (which is
> IANA) and the "algorithms" on top of that data (which are C++-specified).
I agree and acknowledged that there are differences. But there are
similarities as well. std::format may produce different output for the
same chrono time_point value for different implementations when timezone
information is included if the implementations have different versions
of the timezone DB. That isn't so different from different output being
produced for the same code point based on Unicode version.
>
>>> Why are features added to Unicode any different, conceptually?
>> It is desirable that programs written and compiled for a particular C++ standard revision be able to correctly consume text produced in accordance with newer Unicode standards subject to limitations imposed by the interfaces that we specify. Requiring that programmers migrate their code to newer C++ standards in order to take advantage of corrections in newer Unicode standards would impose an unnecessary hindrance.
> So, we don't have a "C++23" mode for compilers, we have a "C++23-with-Unicode-15" mode, then?
I would say we have a "C++23" mode for compilers and that the Unicode
version is implementation-defined.
> And maybe compiler vendors opt not to support older Unicode modes when moving forward.
Yes.
>
>>> As a user, I foremost want portability: A program working with compiler X claiming
>>> conformance to C++ZZ should work unchanged on a different compiler Y also claiming
>>> conformance to C++ZZ. That portability argument is the only reason we have WG21
>>> to start with. If compiler X gives me newer Unicode than compiler Y, I may have
>>> used newer named-universal-characters or relied on newer Unicode algorithm behavior
>>> when developing my program, just to see it break down when moving to compiler Y
>>> that hasn't gotten around to upgrading to the new Unicode version, yet.
>> I think these concerns are adequately addressed by specifying a minimum Unicode version. Note that implementations are always free to accept additional character names as a conforming extension (a diagnostic for use of such names can be issued).
> There's no such thing as a "conforming extension" in C++.
Not that is recognized by the standard, but that is the terminology
commonly used when an implementation gives meaning to code that is
ill-formed according to the standard.
>
> [lex.charset] p5 requires that a program be flagged as ill-formed
> if a named-universal-character is spelled that doesn't exist (yet).
> So, it's not that a "diagnostic can be issued"; it must be issued.
Yes.
>
>>> That's bad, and in my view much worse than having the users of compiler X wait
>>> three years until they get the new feature. Again, compiler vendors have options
>>> to offer post-standard features to their audience if they so choose; everybody
>>> opting in to such options is aware that their code might be non-portable.
>> I think the attention placed on backward compatibility by the Unicode Consortium suffices here; I think their efforts are at least on par with WG21.
> I don't think Unicode will (or can) consider possible ABI breaks in C++ implementations
> of their algorithms, should we ever get there. Note that the availability of templates
> in C++ might establish ABI boundaries at surprising locations in the view of
> implementations in other programming languages (or in less template-heavy C++).
Agreed. That is why I previously suggested we might want to be careful
to ensure that Unicode features that don't have a strong stability
policy are isolate behind an ABI boundary. For the case Jonathan
reported, it looks like we are in the clear. Someone please correct me
if I'm mistaken.
>
>> I view the change in behavior that spawned this email thread as more of a bug fix than a new feature.
> We've expressly refrained from fixing bugs in std::regex because of
> ABI break concerns, if I remember correctly. Are we delegating that
> choice to Unicode in some areas?

I think we should strive not to do so. std::regex is a good example of a
failure to isolate ABI concerns.

Tom.

>
>> Thank you for filing the CWG issue. There are clearly nuances and perspectives that warrant additional discussion. I'm going to add this topic as one of the agenda items for this week's SG16 meeting.
> Thanks,
> Jens
>

Received on 2024-01-10 03:24:18