C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Sat, 6 Jan 2024 19:23:58 +0000
On Sat, 6 Jan 2024, 17:37 Jens Maurer, <jens.maurer_at_[hidden]> wrote:

>
>
> On 06/01/2024 00.40, Jonathan Wakely wrote:
> >
> >
> > On Fri, 5 Jan 2024 at 20:46, Jens Maurer <jens.maurer_at_[hidden] <mailto:
> jens.maurer_at_[hidden]>> wrote:
> >
> >
> >
> > On 05/01/2024 18.35, Jonathan Wakely via SG16 wrote:
> > >
> > >
> > > On Fri, 5 Jan 2024, 16:47 Mark de Wever, <koraq_at_[hidden] <mailto:
> koraq_at_[hidden]> <mailto:koraq_at_[hidden] <mailto:koraq_at_[hidden]>>> wrote:
> > >
> > > On Fri, Jan 05, 2024 at 04:26:49PM +0000, Jonathan Wakely via
> SG16 wrote:
> > > > Since the adoption of P2736 C++23 and the current C++
> working draft just
> > > > refer to "the Unicode Standard", with a URL referring to the
> latest
> > > > version. We removed the bibliography entry for TR29 revision
> 35. P2736
> > > > gives the justification for this that the revision of #29
> included in
> > > > Unicode 15 (revision 41) is just a bug fix, so there's no
> problem referring
> > > > to that instead.
> > > >
> > > > That might have been true last year, but the current Unicode
> Standard
> > > > (15.1.0) includes revision 43 of UAX #29, which makes
> significant changes
> > > > to the extended grapheme cluster breaking rules. A new state
> machine is
> > > > needed (and new lookup tables of properties) to implement
> rule GB9c. That's
> > > > not just a bug fix, is it?
> > > >
> > > > Are C++ implementations expected to implement rule GB9c,
> despite it not
> > > > being part of the standard when C++23 was published?
> > >
> > > AFAIK this was indeed intended. The Unicode Standard moves at
> a faster
> > > pace than the C++ Standard. This allows C++ to always use the
> latest
> > > Unicode features and backport them to older language versions.
> > >
> > >
> > > Maybe the intent was to allow that, but the way I read it we
> *require* that. Is there wording that says that an implementation can
> choose which version to conform to?
> > >
> > > If not, what stops all existing implementations become
> non-conforming when a new version of unicode gets published?
> >
> > Nothing, if the new version of Unicode changes behavior that C++
> > refers to (as seems to be the case here).
> >
> > My understanding is that this was intentional; ISO wants us to refer
> > to undated standard if possible, too.
> >
> > If we feel we should "freeze" the Unicode version for each C++
> standard
> > release, we could do that. Implementer feedback is certainly welcome
> > for that decision.
> >
> >
> > I think I'd prefer if we just somehow say that implementations can
> define which Unicode standard they conform to. That way if a conforming
> C++23 implementation uses Unicode 15.1.0 (the latest version today) then it
> doesn't become non-conforming overnight when a new Unicode standard is
> published. We can recommend that implementations pin themselves to a recent
> Unicode standard, and even recommend that implementations should (if
> possible) update to use newer Unicode standards as they become available.
>
> Hm... That's not how normative references are supposed to work in an ISO
> world,
> I think ("pick the version you want" -- no), but we could certainly try
> that.
>

I'd be fine with "C++23 refers to unicode 15.0.0", or "it is implementation
defined which unicode standard a C++23 implementation conforms to", but I
don't like the idea of C++23 being a moving target that changes meaning
after publication.

How do I even know which code points I can refer to with a
universal-character-name in a portable C++23 program? Doesn't that depend
on the unicode version?



> > But there's no way that a discontinued/EOL compiler version can get
> updated to a newer Unicode standard, which is what we seem to be requiring
> as a condition of being a conforming implementation.
>
> I don't think this problem arises in practice. Do we have a conforming
> implementation
> of C++ (which happens to be C++20 at this point in time)? This will stop
> being conforming
> in a few weeks when C++23 is published, at which point C++20 is considered
> withdrawn /
> superseded. And when C++23 is published, it will stay in force for about
> three years.
>

But compilers still offer support for previous standards. We don't say
"sorry, C++23 is out, you can't use -std=c++17 now".

Should I interpret "C++23 requires you to use the latest unicode standard"
as only being true until 2026? That makes it tempting to not even try to
conform to C++23 until 2026, when it stops being a moving target ;-)

More seriously, I think what you're saying is that an implementation's
"C++20 mode" is already a non-standard thing that has impl-defined meaning,
because the standard only defines one version of C++ at a time. So an
implementation can choose what its "C++20 mode" means, and pinning it to a
version of unicode that was current in 2020 is OK.

But I still find it unsettling that the definition of "C++" will change
under our feet between 2023 and 2026. It effectively means that everything
the unicode consortium does is immediately adopted as a DR against the
current C++ standard with no involvement from WG21.


Is there a conforming impplementation of C++23 already?
>

Are you suggesting that because an implementation doesn't conform 100% to
the standard yet, that it doesn't matter if remaining conforming is
difficult/impractical?

That feels like "until you conform, you don't get to complain that it's
hard to conform" :-)


Are compiler versions EOL'd in three years? At least for gcc, that doesn't
> seem to be
> the case.
>

Yes, it's just over 3 years of upstream support and fixes for each GCC
release. GCC 10.1 was released 2020-05 and then went EOL with 10.5 in
2023-07. GCC 11 was released 2021 and will be EOL late this year. But a
close-to-EOL release is not going to receive major updates to make it use a
new unicode standard. In practice, I'm probably not going to make such
changes to a stable release branch at all. Once GCC 14.1 is released in a
few months, it might stick with unicode 15.1.0 for its three year lifespan.
So the window for making updates to a shipping release is smaller than 3
years.

Some vendors continue to support EOL releases past the end of upstream
support (e.g. in an enterprise distro like RHEL). But they're unlikely to
make significant code changes, like updating to use a new unicode standard.

Received on 2024-01-06 19:24:15