C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Fri, 5 Jan 2024 23:40:28 +0000
On Fri, 5 Jan 2024 at 20:46, Jens Maurer <jens.maurer_at_[hidden]> wrote:

>
>
> On 05/01/2024 18.35, Jonathan Wakely via SG16 wrote:
> >
> >
> > On Fri, 5 Jan 2024, 16:47 Mark de Wever, <koraq_at_[hidden] <mailto:
> koraq_at_[hidden]>> wrote:
> >
> > On Fri, Jan 05, 2024 at 04:26:49PM +0000, Jonathan Wakely via SG16
> wrote:
> > > Since the adoption of P2736 C++23 and the current C++ working
> draft just
> > > refer to "the Unicode Standard", with a URL referring to the latest
> > > version. We removed the bibliography entry for TR29 revision 35.
> P2736
> > > gives the justification for this that the revision of #29 included
> in
> > > Unicode 15 (revision 41) is just a bug fix, so there's no problem
> referring
> > > to that instead.
> > >
> > > That might have been true last year, but the current Unicode
> Standard
> > > (15.1.0) includes revision 43 of UAX #29, which makes significant
> changes
> > > to the extended grapheme cluster breaking rules. A new state
> machine is
> > > needed (and new lookup tables of properties) to implement rule
> GB9c. That's
> > > not just a bug fix, is it?
> > >
> > > Are C++ implementations expected to implement rule GB9c, despite
> it not
> > > being part of the standard when C++23 was published?
> >
> > AFAIK this was indeed intended. The Unicode Standard moves at a
> faster
> > pace than the C++ Standard. This allows C++ to always use the latest
> > Unicode features and backport them to older language versions.
> >
> >
> > Maybe the intent was to allow that, but the way I read it we *require*
> that. Is there wording that says that an implementation can choose which
> version to conform to?
> >
> > If not, what stops all existing implementations become non-conforming
> when a new version of unicode gets published?
>
> Nothing, if the new version of Unicode changes behavior that C++
> refers to (as seems to be the case here).
>
> My understanding is that this was intentional; ISO wants us to refer
> to undated standard if possible, too.
>
> If we feel we should "freeze" the Unicode version for each C++ standard
> release, we could do that. Implementer feedback is certainly welcome
> for that decision.
>

I think I'd prefer if we just somehow say that implementations can define
which Unicode standard they conform to. That way if a conforming C++23
implementation uses Unicode 15.1.0 (the latest version today) then it
doesn't become non-conforming overnight when a new Unicode standard is
published. We can recommend that implementations pin themselves to a recent
Unicode standard, and even recommend that implementations should (if
possible) update to use newer Unicode standards as they become available.
But there's no way that a discontinued/EOL compiler version can get updated
to a newer Unicode standard, which is what we seem to be requiring as a
condition of being a conforming implementation.

Received on 2024-01-05 23:40:43