My email above was based on some assumptions that were corrected at today's telecon:

I thought that implementers were allowed to add constexpr to standard library functions as an extension; they are not
I thought that the libstdc++/libc++ implementers had implemented a constexpr std::format that depended on the Unicode version for width estimation; in fact, the constexpr functions they are referring to are user-inaccessible implementation details, and the top-level std::format is not constexpr
Corentin explained that even a hypothetical constexpr std::format implementation could avoid runtime ABI problems using if consteval, and that differences in the constexpr std::format width estimation would not be considered a significant ABI issue.

On Wed, Jan 10, 2024 at 2:24 PM Eddie Nolan <eddiejnolan@gmail.com> wrote:

It seems problematic to allow constexpr implementations of any Unicode functionality for which the Unicode standard hasn’t guaranteed stability. While careful implementations can use the technique of deferring to a non-inline function to hide those details from ABI, constexpr requires that the entire implementation is inline, which means that every Unicode update can causes an ABI break.

Also, it seems like a misuse of constexpr to apply it to functionality that happens to be constant at the time of compilation but which can change over time. If Unicode updates can change the result, then it shouldn’t be constexpr for the same reason we wouldn’t want to make the time zone database accessible via constexpr.

If we apply the proposed resolution to CWG2843 that we fix the Unicode version referenced by the standard at version 15.0.0, and we also continue to have Unicode functionality in fully inline constexpr functions in standard library implementations, then we might invite the outcome that updates to the standard’s Unicode version start being blocked because of ABI concerns.

On Wed, Jan 10, 2024 at 3:07 AM Corentin via SG16 <sg16@lists.isocpp.org> wrote:

On Wed, Jan 10, 2024 at 4:24 AM Tom Honermann <tom@honermann.net> wrote:

On 1/7/24 4:55 PM, Jens Maurer wrote:
> On 07/01/2024 21.27, Tom Honermann wrote:
>> The C++ standard includes in its bibliography <http://eel.is/c++draft/bibliography> an undated reference to the IANA Time Zone Database <https://www.iana.org/time-zones> with a linked reference in [time.zone.general]p1 <http://eel.is/c++draft/time#zone.general-1>. I grant that is a non-normative reference and the use of it differs somewhat from the situation we face with referencing the Unicode Standard, but it is an example of specified behavior that is intended to change at points that are not aligned with the release of C++ standard revisions.
> I don't think we specify anything at all regarding the details of timezone values,
> and there's quite a strong differentiation between the timezone data (which is
> IANA) and the "algorithms" on top of that data (which are C++-specified).
I agree and acknowledged that there are differences. But there are
similarities as well. std::format may produce different output for the
same chrono time_point value for different implementations when timezone
information is included if the implementations have different versions
of the timezone DB. That isn't so different from different output being
produced for the same code point based on Unicode version.
>
>>> Why are features added to Unicode any different, conceptually?
>> It is desirable that programs written and compiled for a particular C++ standard revision be able to correctly consume text produced in accordance with newer Unicode standards subject to limitations imposed by the interfaces that we specify. Requiring that programmers migrate their code to newer C++ standards in order to take advantage of corrections in newer Unicode standards would impose an unnecessary hindrance.
> So, we don't have a "C++23" mode for compilers, we have a "C++23-with-Unicode-15" mode, then?
I would say we have a "C++23" mode for compilers and that the Unicode
version is implementation-defined.
> And maybe compiler vendors opt not to support older Unicode modes when moving forward.
Yes.
>
>>> As a user, I foremost want portability: A program working with compiler X claiming
>>> conformance to C++ZZ should work unchanged on a different compiler Y also claiming
>>> conformance to C++ZZ. That portability argument is the only reason we have WG21
>>> to start with. If compiler X gives me newer Unicode than compiler Y, I may have
>>> used newer named-universal-characters or relied on newer Unicode algorithm behavior
>>> when developing my program, just to see it break down when moving to compiler Y
>>> that hasn't gotten around to upgrading to the new Unicode version, yet.
>> I think these concerns are adequately addressed by specifying a minimum Unicode version. Note that implementations are always free to accept additional character names as a conforming extension (a diagnostic for use of such names can be issued).
> There's no such thing as a "conforming extension" in C++.
Not that is recognized by the standard, but that is the terminology
commonly used when an implementation gives meaning to code that is
ill-formed according to the standard.
>
> [lex.charset] p5 requires that a program be flagged as ill-formed
> if a named-universal-character is spelled that doesn't exist (yet).
> So, it's not that a "diagnostic can be issued"; it must be issued.
Yes.
>
>>> That's bad, and in my view much worse than having the users of compiler X wait
>>> three years until they get the new feature. Again, compiler vendors have options
>>> to offer post-standard features to their audience if they so choose; everybody
>>> opting in to such options is aware that their code might be non-portable.
>> I think the attention placed on backward compatibility by the Unicode Consortium suffices here; I think their efforts are at least on par with WG21.
> I don't think Unicode will (or can) consider possible ABI breaks in C++ implementations
> of their algorithms, should we ever get there. Note that the availability of templates
> in C++ might establish ABI boundaries at surprising locations in the view of
> implementations in other programming languages (or in less template-heavy C++).
Agreed. That is why I previously suggested we might want to be careful
to ensure that Unicode features that don't have a strong stability
policy are isolate behind an ABI boundary. For the case Jonathan
reported, it looks like we are in the clear. Someone please correct me
if I'm mistaken.
>
>> I view the change in behavior that spawned this email thread as more of a bug fix than a new feature.
> We've expressly refrained from fixing bugs in std::regex because of
> ABI break concerns, if I remember correctly. Are we delegating that
> choice to Unicode in some areas?

I think we should strive not to do so. std::regex is a good example of a
failure to isolate ABI concerns.

Right.

We know that Unicode can change, in some bounded ways, which I think Unicode is doing a pretty good job describing on a per-algorithm basis.

Given that, implementers can (and should) hide implementation details from ABI, or not extend ABI promises to Unicode algorithms.

Tom.

>
>> Thank you for filing the CWG issue. There are clearly nuances and perspectives that warrant additional discussion. I'm going to add this topic as one of the agenda items for this week's SG16 meeting.
> Thanks,
> Jens
>

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16