On Fri, Jan 05, 2024 at 04:26:49PM +0000, Jonathan Wakely via SG16 wrote:
> Since the adoption of P2736 C++23 and the current C++ working draft just
> refer to "the Unicode Standard", with a URL referring to the latest
> version. We removed the bibliography entry for TR29 revision 35. P2736
> gives the justification for this that the revision of #29 included in
> Unicode 15 (revision 41) is just a bug fix, so there's no problem referring
> to that instead.
>
> That might have been true last year, but the current Unicode Standard
> (15.1.0) includes revision 43 of UAX #29, which makes significant changes
> to the extended grapheme cluster breaking rules. A new state machine is
> needed (and new lookup tables of properties) to implement rule GB9c. That's
> not just a bug fix, is it?
>
> Are C++ implementations expected to implement rule GB9c, despite it not
> being part of the standard when C++23 was published?
AFAIK this was indeed intended. The Unicode Standard moves at a faster
pace than the C++ Standard. This allows C++ to always use the latest
Unicode features and backport them to older language versions.
Recently I wanted to update libc++ to Unicode 15.1.0 and noticed the
same changes you did. I put this on hold since I need to investigate the
required ABI tags. Otherwise I would have implemented these changes for
libc++18.
Cheers,
Mark
Did we actually specify something for C++23 that depends on or provides the breaking algorithms?
My recollection is that we kicked the can down the road, with the intent of having implementations declare which version of the standard they support at a particular time and version, with the option of a conforming implementation providing a later one, because otherwise you can't process current text.