C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Eddie Nolan <eddiejnolan_at_[hidden]>
Date: Wed, 10 Jan 2024 15:42:08 -0500
My email above was based on some assumptions that were corrected at today's
telecon:

   - I thought that implementers were allowed to add constexpr to standard
   library functions as an extension; they are not
   - I thought that the libstdc++/libc++ implementers had implemented a
   constexpr std::format that depended on the Unicode version for width
   estimation; in fact, the constexpr functions they are referring to are
   user-inaccessible implementation details, and the top-level std::format
   is not constexpr
   - Corentin explained that even a hypothetical constexpr std::format
   implementation could avoid runtime ABI problems using if consteval, and
   that differences in the constexpr std::format width estimation would not
   be considered a significant ABI issue.


On Wed, Jan 10, 2024 at 2:24 PM Eddie Nolan <eddiejnolan_at_[hidden]> wrote:

> It seems problematic to allow constexpr implementations of any Unicode
> functionality for which the Unicode standard hasn’t guaranteed stability.
> While careful implementations can use the technique of deferring to a
> non-inline function to hide those details from ABI, constexpr requires
> that the entire implementation is inline, which means that every Unicode
> update can causes an ABI break.
>
> Also, it seems like a misuse of constexpr to apply it to functionality
> that happens to be constant at the time of compilation but which can change
> over time. If Unicode updates can change the result, then it shouldn’t be
> constexpr for the same reason we wouldn’t want to make the time zone
> database accessible via constexpr.
>
> If we apply the proposed resolution to CWG2843 that we fix the Unicode
> version referenced by the standard at version 15.0.0, and we also continue
> to have Unicode functionality in fully inline constexpr functions in
> standard library implementations, then we might invite the outcome that
> updates to the standard’s Unicode version start being blocked because of
> ABI concerns.
>
> On Wed, Jan 10, 2024 at 3:07 AM Corentin via SG16 <sg16_at_[hidden]>
> wrote:
>
>>
>>
>> On Wed, Jan 10, 2024 at 4:24 AM Tom Honermann <tom_at_[hidden]> wrote:
>>
>>> On 1/7/24 4:55 PM, Jens Maurer wrote:
>>> > On 07/01/2024 21.27, Tom Honermann wrote:
>>> >> The C++ standard includes in its bibliography <
>>> http://eel.is/c++draft/bibliography> an undated reference to the IANA
>>> Time Zone Database <https://www.iana.org/time-zones> with a linked
>>> reference in [time.zone.general]p1 <
>>> http://eel.is/c++draft/time#zone.general-1>. I grant that is a
>>> non-normative reference and the use of it differs somewhat from the
>>> situation we face with referencing the Unicode Standard, but it is an
>>> example of specified behavior that is intended to change at points that are
>>> not aligned with the release of C++ standard revisions.
>>> > I don't think we specify anything at all regarding the details of
>>> timezone values,
>>> > and there's quite a strong differentiation between the timezone data
>>> (which is
>>> > IANA) and the "algorithms" on top of that data (which are
>>> C++-specified).
>>> I agree and acknowledged that there are differences. But there are
>>> similarities as well. std::format may produce different output for the
>>> same chrono time_point value for different implementations when timezone
>>> information is included if the implementations have different versions
>>> of the timezone DB. That isn't so different from different output being
>>> produced for the same code point based on Unicode version.
>>> >
>>> >>> Why are features added to Unicode any different, conceptually?
>>> >> It is desirable that programs written and compiled for a particular
>>> C++ standard revision be able to correctly consume text produced in
>>> accordance with newer Unicode standards subject to limitations imposed by
>>> the interfaces that we specify. Requiring that programmers migrate their
>>> code to newer C++ standards in order to take advantage of corrections in
>>> newer Unicode standards would impose an unnecessary hindrance.
>>> > So, we don't have a "C++23" mode for compilers, we have a
>>> "C++23-with-Unicode-15" mode, then?
>>> I would say we have a "C++23" mode for compilers and that the Unicode
>>> version is implementation-defined.
>>> > And maybe compiler vendors opt not to support older Unicode modes when
>>> moving forward.
>>> Yes.
>>> >
>>> >>> As a user, I foremost want portability: A program working with
>>> compiler X claiming
>>> >>> conformance to C++ZZ should work unchanged on a different compiler Y
>>> also claiming
>>> >>> conformance to C++ZZ. That portability argument is the only reason
>>> we have WG21
>>> >>> to start with. If compiler X gives me newer Unicode than compiler
>>> Y, I may have
>>> >>> used newer named-universal-characters or relied on newer Unicode
>>> algorithm behavior
>>> >>> when developing my program, just to see it break down when moving to
>>> compiler Y
>>> >>> that hasn't gotten around to upgrading to the new Unicode version,
>>> yet.
>>> >> I think these concerns are adequately addressed by specifying a
>>> minimum Unicode version. Note that implementations are always free to
>>> accept additional character names as a conforming extension (a diagnostic
>>> for use of such names can be issued).
>>> > There's no such thing as a "conforming extension" in C++.
>>> Not that is recognized by the standard, but that is the terminology
>>> commonly used when an implementation gives meaning to code that is
>>> ill-formed according to the standard.
>>> >
>>> > [lex.charset] p5 requires that a program be flagged as ill-formed
>>> > if a named-universal-character is spelled that doesn't exist (yet).
>>> > So, it's not that a "diagnostic can be issued"; it must be issued.
>>> Yes.
>>> >
>>> >>> That's bad, and in my view much worse than having the users of
>>> compiler X wait
>>> >>> three years until they get the new feature. Again, compiler vendors
>>> have options
>>> >>> to offer post-standard features to their audience if they so choose;
>>> everybody
>>> >>> opting in to such options is aware that their code might be
>>> non-portable.
>>> >> I think the attention placed on backward compatibility by the Unicode
>>> Consortium suffices here; I think their efforts are at least on par with
>>> WG21.
>>> > I don't think Unicode will (or can) consider possible ABI breaks in
>>> C++ implementations
>>> > of their algorithms, should we ever get there. Note that the
>>> availability of templates
>>> > in C++ might establish ABI boundaries at surprising locations in the
>>> view of
>>> > implementations in other programming languages (or in less
>>> template-heavy C++).
>>> Agreed. That is why I previously suggested we might want to be careful
>>> to ensure that Unicode features that don't have a strong stability
>>> policy are isolate behind an ABI boundary. For the case Jonathan
>>> reported, it looks like we are in the clear. Someone please correct me
>>> if I'm mistaken.
>>> >
>>> >> I view the change in behavior that spawned this email thread as more
>>> of a bug fix than a new feature.
>>> > We've expressly refrained from fixing bugs in std::regex because of
>>> > ABI break concerns, if I remember correctly. Are we delegating that
>>> > choice to Unicode in some areas?
>>>
>>> I think we should strive not to do so. std::regex is a good example of a
>>> failure to isolate ABI concerns.
>>>
>>
>> Right.
>> We know that Unicode can change, in some bounded ways, which I think
>> Unicode is doing a pretty good job describing on a per-algorithm basis.
>> Given that, implementers can (and should) hide implementation details
>> from ABI, or not extend ABI promises to Unicode algorithms.
>>
>>
>>
>>>
>>> Tom.
>>>
>>> >
>>> >> Thank you for filing the CWG issue. There are clearly nuances and
>>> perspectives that warrant additional discussion. I'm going to add this
>>> topic as one of the agenda items for this week's SG16 meeting.
>>> > Thanks,
>>> > Jens
>>> >
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2024-01-10 20:42:22