C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 10 Jan 2024 16:38:14 -0500
Thank you for summarizing that outcome. I had not followed the
discussion in the Zoom chat during the meeting, so I wasn't sure how it
concluded. I'll try to incorporate this into the meeting summary.

Tom.

On 1/10/24 3:42 PM, Eddie Nolan via SG16 wrote:
> My email above was based on some assumptions that were corrected at
> today's telecon:
>
> * I thought that implementers were allowed to add |constexpr| to
> standard library functions as an extension; they are not
> * I thought that the libstdc++/libc++ implementers had implemented a
> |constexpr| |std::format| that depended on the Unicode version for
> width estimation; in fact, the |constexpr| functions they are
> referring to are user-inaccessible implementation details, and the
> top-level |std::format| is not constexpr
> * Corentin explained that even a hypothetical constexpr
> |std::format| implementation could avoid runtime ABI problems
> using |if consteval|, and that differences in the |constexpr|
> |std::format| width estimation would not be considered a
> significant ABI issue.
>
>
> On Wed, Jan 10, 2024 at 2:24 PM Eddie Nolan <eddiejnolan_at_[hidden]> wrote:
>
> It seems problematic to allow |constexpr| implementations of any
> Unicode functionality for which the Unicode standard hasn’t
> guaranteed stability. While careful implementations can use the
> technique of deferring to a non-inline function to hide those
> details from ABI, |constexpr| requires that the entire
> implementation is inline, which means that every Unicode update
> can causes an ABI break.
>
> Also, it seems like a misuse of |constexpr| to apply it to
> functionality that happens to be constant at the time of
> compilation but which can change over time. If Unicode updates can
> change the result, then it shouldn’t be |constexpr| for the same
> reason we wouldn’t want to make the time zone database accessible
> via |constexpr|.
>
> If we apply the proposed resolution to CWG2843 that we fix the
> Unicode version referenced by the standard at version 15.0.0, and
> we also continue to have Unicode functionality in fully inline
> |constexpr| functions in standard library implementations, then we
> might invite the outcome that updates to the standard’s Unicode
> version start being blocked because of ABI concerns.
>
>
> On Wed, Jan 10, 2024 at 3:07 AM Corentin via SG16
> <sg16_at_[hidden]> wrote:
>
>
>
> On Wed, Jan 10, 2024 at 4:24 AM Tom Honermann
> <tom_at_[hidden]> wrote:
>
> On 1/7/24 4:55 PM, Jens Maurer wrote:
> > On 07/01/2024 21.27, Tom Honermann wrote:
> >> The C++ standard includes in its bibliography
> <http://eel.is/c++draft/bibliography> an undated reference
> to the IANA Time Zone Database
> <https://www.iana.org/time-zones> with a linked reference
> in [time.zone.general]p1
> <http://eel.is/c++draft/time#zone.general-1>. I grant that
> is a non-normative reference and the use of it differs
> somewhat from the situation we face with referencing the
> Unicode Standard, but it is an example of specified
> behavior that is intended to change at points that are not
> aligned with the release of C++ standard revisions.
> > I don't think we specify anything at all regarding the
> details of timezone values,
> > and there's quite a strong differentiation between the
> timezone data (which is
> > IANA) and the "algorithms" on top of that data (which
> are C++-specified).
> I agree and acknowledged that there are differences. But
> there are
> similarities as well. std::format may produce different
> output for the
> same chrono time_point value for different implementations
> when timezone
> information is included if the implementations have
> different versions
> of the timezone DB. That isn't so different from different
> output being
> produced for the same code point based on Unicode version.
> >
> >>> Why are features added to Unicode any different,
> conceptually?
> >> It is desirable that programs written and compiled for
> a particular C++ standard revision be able to correctly
> consume text produced in accordance with newer Unicode
> standards subject to limitations imposed by the interfaces
> that we specify. Requiring that programmers migrate their
> code to newer C++ standards in order to take advantage of
> corrections in newer Unicode standards would impose an
> unnecessary hindrance.
> > So, we don't have a "C++23" mode for compilers, we have
> a "C++23-with-Unicode-15" mode, then?
> I would say we have a "C++23" mode for compilers and that
> the Unicode
> version is implementation-defined.
> > And maybe compiler vendors opt not to support older
> Unicode modes when moving forward.
> Yes.
> >
> >>> As a user, I foremost want portability: A program
> working with compiler X claiming
> >>> conformance to C++ZZ should work unchanged on a
> different compiler Y also claiming
> >>> conformance to C++ZZ. That portability argument is
> the only reason we have WG21
> >>> to start with. If compiler X gives me newer Unicode
> than compiler Y, I may have
> >>> used newer named-universal-characters or relied on
> newer Unicode algorithm behavior
> >>> when developing my program, just to see it break down
> when moving to compiler Y
> >>> that hasn't gotten around to upgrading to the new
> Unicode version, yet.
> >> I think these concerns are adequately addressed by
> specifying a minimum Unicode version. Note that
> implementations are always free to accept additional
> character names as a conforming extension (a diagnostic
> for use of such names can be issued).
> > There's no such thing as a "conforming extension" in C++.
> Not that is recognized by the standard, but that is the
> terminology
> commonly used when an implementation gives meaning to code
> that is
> ill-formed according to the standard.
> >
> > [lex.charset] p5 requires that a program be flagged as
> ill-formed
> > if a named-universal-character is spelled that doesn't
> exist (yet).
> > So, it's not that a "diagnostic can be issued"; it must
> be issued.
> Yes.
> >
> >>> That's bad, and in my view much worse than having the
> users of compiler X wait
> >>> three years until they get the new feature. Again,
> compiler vendors have options
> >>> to offer post-standard features to their audience if
> they so choose; everybody
> >>> opting in to such options is aware that their code
> might be non-portable.
> >> I think the attention placed on backward compatibility
> by the Unicode Consortium suffices here; I think their
> efforts are at least on par with WG21.
> > I don't think Unicode will (or can) consider possible
> ABI breaks in C++ implementations
> > of their algorithms, should we ever get there. Note that
> the availability of templates
> > in C++ might establish ABI boundaries at surprising
> locations in the view of
> > implementations in other programming languages (or in
> less template-heavy C++).
> Agreed. That is why I previously suggested we might want
> to be careful
> to ensure that Unicode features that don't have a strong
> stability
> policy are isolate behind an ABI boundary. For the case
> Jonathan
> reported, it looks like we are in the clear. Someone
> please correct me
> if I'm mistaken.
> >
> >> I view the change in behavior that spawned this email
> thread as more of a bug fix than a new feature.
> > We've expressly refrained from fixing bugs in std::regex
> because of
> > ABI break concerns, if I remember correctly. Are we
> delegating that
> > choice to Unicode in some areas?
>
> I think we should strive not to do so. std::regex is a
> good example of a
> failure to isolate ABI concerns.
>
>
> Right.
> We know that Unicode can change, in some bounded ways, which I
> think Unicode is doing a pretty good job describing on a
> per-algorithm basis.
> Given that, implementers can (and should) hide implementation
> details from ABI, or not extend ABI promises to Unicode
> algorithms.
>
>
> Tom.
>
> >
> >> Thank you for filing the CWG issue. There are clearly
> nuances and perspectives that warrant additional
> discussion. I'm going to add this topic as one of the
> agenda items for this week's SG16 meeting.
> > Thanks,
> > Jens
> >
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>

Received on 2024-01-10 21:38:17