C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 6 Jan 2024 14:17:39 -0500
On 1/5/24 6:40 PM, Jonathan Wakely via SG16 wrote:
>
>
> On Fri, 5 Jan 2024 at 20:46, Jens Maurer <jens.maurer_at_[hidden]> wrote:
>
>
>
> On 05/01/2024 18.35, Jonathan Wakely via SG16 wrote:
> >
> >
> > On Fri, 5 Jan 2024, 16:47 Mark de Wever, <koraq_at_[hidden]
> <mailto:koraq_at_[hidden]>> wrote:
> >
> > On Fri, Jan 05, 2024 at 04:26:49PM +0000, Jonathan Wakely
> via SG16 wrote:
> > > Since the adoption of P2736 C++23 and the current C++
> working draft just
> > > refer to "the Unicode Standard", with a URL referring to
> the latest
> > > version. We removed the bibliography entry for TR29
> revision 35. P2736
> > > gives the justification for this that the revision of #29
> included in
> > > Unicode 15 (revision 41) is just a bug fix, so there's no
> problem referring
> > > to that instead.
> > >
> > > That might have been true last year, but the current
> Unicode Standard
> > > (15.1.0) includes revision 43 of UAX #29, which makes
> significant changes
> > > to the extended grapheme cluster breaking rules. A new
> state machine is
> > > needed (and new lookup tables of properties) to implement
> rule GB9c. That's
> > > not just a bug fix, is it?
> > >
> > > Are C++ implementations expected to implement rule GB9c,
> despite it not
> > > being part of the standard when C++23 was published?
> >
> > AFAIK this was indeed intended. The Unicode Standard moves
> at a faster
> > pace than the C++ Standard. This allows C++ to always use
> the latest
> > Unicode features and backport them to older language versions.
> >
> >
> > Maybe the intent was to allow that, but the way I read it we
> *require* that. Is there wording that says that an implementation
> can choose which version to conform to?
> >
> > If not, what stops all existing implementations become
> non-conforming when a new version of unicode gets published?
>
> Nothing, if the new version of Unicode changes behavior that C++
> refers to (as seems to be the case here).
>
> My understanding is that this was intentional; ISO wants us to refer
> to undated standard if possible, too.
>
> If we feel we should "freeze" the Unicode version for each C++
> standard
> release, we could do that. Implementer feedback is certainly welcome
> for that decision.
>
>
> I think I'd prefer if we just somehow say that implementations can
> define which Unicode standard they conform to. That way if a
> conforming C++23 implementation uses Unicode 15.1.0 (the latest
> version today) then it doesn't become non-conforming overnight when a
> new Unicode standard is published. We can recommend that
> implementations pin themselves to a recent Unicode standard, and even
> recommend that implementations should (if possible) update to use
> newer Unicode standards as they become available. But there's no way
> that a discontinued/EOL compiler version can get updated to a newer
> Unicode standard, which is what we seem to be requiring as a condition
> of being a conforming implementation.

I think the closest we get to that currently is in the specification for
the __STDC_ISO_10646__ predefined macro in [cpp.predefined]p2
<http://eel.is/c++draft/cpp.predefined#2>. This doesn't state much for
normative requirements though. We could strengthen it. See the P2736R0
discussion from the 2023-01-25 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings#january-25th-2023> for
some context.

    The following macro names are conditionally defined by the
    implementation:
    ...
    __STDC_ISO_10646__
    An integer literal of the form yyyymmL (for example, 199712L).
    Whether __STDC_ISO_10646__ is predefined and if so, what its value
    is, are implementation-defined.
    ...

Tom.

Received on 2024-01-06 19:17:43