ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Eddie Nolan <eddiejnolan_at_[hidden]>
Date: Wed, 10 Jan 2024 14:24:19 -0500

It seems problematic to allow constexpr implementations of any Unicode
functionality for which the Unicode standard hasn’t guaranteed stability.
While careful implementations can use the technique of deferring to a
non-inline function to hide those details from ABI, constexpr requires that
the entire implementation is inline, which means that every Unicode update
can causes an ABI break.

Also, it seems like a misuse of constexpr to apply it to functionality that
happens to be constant at the time of compilation but which can change over
time. If Unicode updates can change the result, then it shouldn’t be
constexpr for the same reason we wouldn’t want to make the time zone
database accessible via constexpr.

If we apply the proposed resolution to CWG2843 that we fix the Unicode
version referenced by the standard at version 15.0.0, and we also continue
to have Unicode functionality in fully inline constexpr functions in
standard library implementations, then we might invite the outcome that
updates to the standard’s Unicode version start being blocked because of
ABI concerns.

On Wed, Jan 10, 2024 at 3:07 AM Corentin via SG16 <sg16_at_[hidden]>
wrote:

>
>
> On Wed, Jan 10, 2024 at 4:24 AM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 1/7/24 4:55 PM, Jens Maurer wrote:
>> > On 07/01/2024 21.27, Tom Honermann wrote:
>> >> The C++ standard includes in its bibliography <
>> http://eel.is/c++draft/bibliography> an undated reference to the IANA
>> Time Zone Database <https://www.iana.org/time-zones> with a linked
>> reference in [time.zone.general]p1 <
>> http://eel.is/c++draft/time#zone.general-1>. I grant that is a
>> non-normative reference and the use of it differs somewhat from the
>> situation we face with referencing the Unicode Standard, but it is an
>> example of specified behavior that is intended to change at points that are
>> not aligned with the release of C++ standard revisions.
>> > I don't think we specify anything at all regarding the details of
>> timezone values,
>> > and there's quite a strong differentiation between the timezone data
>> (which is
>> > IANA) and the "algorithms" on top of that data (which are
>> C++-specified).
>> I agree and acknowledged that there are differences. But there are
>> similarities as well. std::format may produce different output for the
>> same chrono time_point value for different implementations when timezone
>> information is included if the implementations have different versions
>> of the timezone DB. That isn't so different from different output being
>> produced for the same code point based on Unicode version.
>> >
>> >>> Why are features added to Unicode any different, conceptually?
>> >> It is desirable that programs written and compiled for a particular
>> C++ standard revision be able to correctly consume text produced in
>> accordance with newer Unicode standards subject to limitations imposed by
>> the interfaces that we specify. Requiring that programmers migrate their
>> code to newer C++ standards in order to take advantage of corrections in
>> newer Unicode standards would impose an unnecessary hindrance.
>> > So, we don't have a "C++23" mode for compilers, we have a
>> "C++23-with-Unicode-15" mode, then?
>> I would say we have a "C++23" mode for compilers and that the Unicode
>> version is implementation-defined.
>> > And maybe compiler vendors opt not to support older Unicode modes when
>> moving forward.
>> Yes.
>> >
>> >>> As a user, I foremost want portability: A program working with
>> compiler X claiming
>> >>> conformance to C++ZZ should work unchanged on a different compiler Y
>> also claiming
>> >>> conformance to C++ZZ. That portability argument is the only reason
>> we have WG21
>> >>> to start with. If compiler X gives me newer Unicode than compiler Y,
>> I may have
>> >>> used newer named-universal-characters or relied on newer Unicode
>> algorithm behavior
>> >>> when developing my program, just to see it break down when moving to
>> compiler Y
>> >>> that hasn't gotten around to upgrading to the new Unicode version,
>> yet.
>> >> I think these concerns are adequately addressed by specifying a
>> minimum Unicode version. Note that implementations are always free to
>> accept additional character names as a conforming extension (a diagnostic
>> for use of such names can be issued).
>> > There's no such thing as a "conforming extension" in C++.
>> Not that is recognized by the standard, but that is the terminology
>> commonly used when an implementation gives meaning to code that is
>> ill-formed according to the standard.
>> >
>> > [lex.charset] p5 requires that a program be flagged as ill-formed
>> > if a named-universal-character is spelled that doesn't exist (yet).
>> > So, it's not that a "diagnostic can be issued"; it must be issued.
>> Yes.
>> >
>> >>> That's bad, and in my view much worse than having the users of
>> compiler X wait
>> >>> three years until they get the new feature. Again, compiler vendors
>> have options
>> >>> to offer post-standard features to their audience if they so choose;
>> everybody
>> >>> opting in to such options is aware that their code might be
>> non-portable.
>> >> I think the attention placed on backward compatibility by the Unicode
>> Consortium suffices here; I think their efforts are at least on par with
>> WG21.
>> > I don't think Unicode will (or can) consider possible ABI breaks in C++
>> implementations
>> > of their algorithms, should we ever get there. Note that the
>> availability of templates
>> > in C++ might establish ABI boundaries at surprising locations in the
>> view of
>> > implementations in other programming languages (or in less
>> template-heavy C++).
>> Agreed. That is why I previously suggested we might want to be careful
>> to ensure that Unicode features that don't have a strong stability
>> policy are isolate behind an ABI boundary. For the case Jonathan
>> reported, it looks like we are in the clear. Someone please correct me
>> if I'm mistaken.
>> >
>> >> I view the change in behavior that spawned this email thread as more
>> of a bug fix than a new feature.
>> > We've expressly refrained from fixing bugs in std::regex because of
>> > ABI break concerns, if I remember correctly. Are we delegating that
>> > choice to Unicode in some areas?
>>
>> I think we should strive not to do so. std::regex is a good example of a
>> failure to isolate ABI concerns.
>>
>
> Right.
> We know that Unicode can change, in some bounded ways, which I think
> Unicode is doing a pretty good job describing on a per-algorithm basis.
> Given that, implementers can (and should) hide implementation details from
> ABI, or not extend ABI promises to Unicode algorithms.
>
>
>
>>
>> Tom.
>>
>> >
>> >> Thank you for filing the CWG issue. There are clearly nuances and
>> perspectives that warrant additional discussion. I'm going to add this
>> topic as one of the agenda items for this week's SG16 meeting.
>> > Thanks,
>> > Jens
>> >
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-01-10 19:24:31