C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sat, 6 Jan 2024 18:37:16 +0100
On 06/01/2024 00.40, Jonathan Wakely wrote:
>
>
> On Fri, 5 Jan 2024 at 20:46, Jens Maurer <jens.maurer_at_[hidden] <mailto:jens.maurer_at_[hidden]>> wrote:
>
>
>
> On 05/01/2024 18.35, Jonathan Wakely via SG16 wrote:
> >
> >
> > On Fri, 5 Jan 2024, 16:47 Mark de Wever, <koraq_at_[hidden] <mailto:koraq_at_[hidden]> <mailto:koraq_at_[hidden] <mailto:koraq_at_[hidden]>>> wrote:
> >
> > On Fri, Jan 05, 2024 at 04:26:49PM +0000, Jonathan Wakely via SG16 wrote:
> > > Since the adoption of P2736 C++23 and the current C++ working draft just
> > > refer to "the Unicode Standard", with a URL referring to the latest
> > > version. We removed the bibliography entry for TR29 revision 35. P2736
> > > gives the justification for this that the revision of #29 included in
> > > Unicode 15 (revision 41) is just a bug fix, so there's no problem referring
> > > to that instead.
> > >
> > > That might have been true last year, but the current Unicode Standard
> > > (15.1.0) includes revision 43 of UAX #29, which makes significant changes
> > > to the extended grapheme cluster breaking rules. A new state machine is
> > > needed (and new lookup tables of properties) to implement rule GB9c. That's
> > > not just a bug fix, is it?
> > >
> > > Are C++ implementations expected to implement rule GB9c, despite it not
> > > being part of the standard when C++23 was published?
> >
> > AFAIK this was indeed intended. The Unicode Standard moves at a faster
> > pace than the C++ Standard. This allows C++ to always use the latest
> > Unicode features and backport them to older language versions.
> >
> >
> > Maybe the intent was to allow that, but the way I read it we *require* that. Is there wording that says that an implementation can choose which version to conform to?
> >
> > If not, what stops all existing implementations become non-conforming when a new version of unicode gets published?
>
> Nothing, if the new version of Unicode changes behavior that C++
> refers to (as seems to be the case here).
>
> My understanding is that this was intentional; ISO wants us to refer
> to undated standard if possible, too.
>
> If we feel we should "freeze" the Unicode version for each C++ standard
> release, we could do that. Implementer feedback is certainly welcome
> for that decision.
>
>
> I think I'd prefer if we just somehow say that implementations can define which Unicode standard they conform to. That way if a conforming C++23 implementation uses Unicode 15.1.0 (the latest version today) then it doesn't become non-conforming overnight when a new Unicode standard is published. We can recommend that implementations pin themselves to a recent Unicode standard, and even recommend that implementations should (if possible) update to use newer Unicode standards as they become available.

Hm... That's not how normative references are supposed to work in an ISO world,
I think ("pick the version you want" -- no), but we could certainly try that.

> But there's no way that a discontinued/EOL compiler version can get updated to a newer Unicode standard, which is what we seem to be requiring as a condition of being a conforming implementation.

I don't think this problem arises in practice. Do we have a conforming implementation
of C++ (which happens to be C++20 at this point in time)? This will stop being conforming
in a few weeks when C++23 is published, at which point C++20 is considered withdrawn /
superseded. And when C++23 is published, it will stay in force for about three years.
Is there a conforming impplementation of C++23 already?
Are compiler versions EOL'd in three years? At least for gcc, that doesn't seem to be
the case.

Jens

Received on 2024-01-06 17:37:21