Thank you for this, Robin! I found this useful and educational.
My favorite part:
As C++ implementers are now consumers of the UCD and implementers of various Unicode algorithms (grapheme cluster segmentation, if only as a best practice, by [format.string.std] paragraph 13, normalization by [lex.name] paragraph 1), I plan to keep this mailing list informed of relevant Public Review Issues as we go through the 16.0 release cycle.
That will be amazingly helpful!
At the moment, I think the C++ standard is dependent on the following Unicode features, so notification of any significant changes to them would be highly useful.
Anything I missed above?
Tom.
Dear ISO/IEC JTC 1/SC 22/WG 21/SG 16,
As was discussed at length in the 2024-01-10 meeting, in the ideal ISO/IEC world where C++11 vanished from existence ten years ago and defect reports (at least as handled by C++) do not exist, compilers implementing C++23 will become nonconformant every year on the second Tuesday in September until they pick up the new version of the Unicode Standard.
While I do not think anyone in the real world should expect their compiler and standard libraries to update on the day of the Unicode release (indeed, libraries published by the Unicode Consortium itself, such as ICU, are typically a few weeks behind when everything goes well), I would like to shed some light on the Unicode review process, which should allow implementers to iron out kinks ahead of time. (Besides, a better mutual understanding of the standardization processes involved on both sides of this liaison is likely a good thing.)
Importantly, this process allows for feedback from implementers. Because of the Unicode Consortium’s stability policies, some feedback can only ever be addressed if it is received at the appropriate stage. In any case, feedback received after the review periods can generally only be addressed in a subsequent version of Unicode (though note that the Unicode Standard has corrigenda, which, just like its versions work a bit differently from ISO editions, work a bit differently from ISO/IEC technical corrigenda and corrected versions).
The following timeline is based on L2/23-264, approved by Unicode Technical Committee decision 177-C1.The next version of Unicode will be Version 16.0, scheduled to be released on 2024-09-10 (the second Tuesday in September, as is tradition since 2021).There are two review periods:
- alpha review will run from 2024-02-06 to 2024-04-02;
- beta review will run from 2024-05-21 to 2024-07-02.
Decisions are made at UTC meetings, nowadays mostly on the recommendation of various groups, most relevant here being the PAG. The time between the end of review periods and the UTC meeting allows for these groups to process review feedback. UTC #178 is coming (January 23–35), UTC #179, April 23–25, will take decisions based on alpha feedback, and UTC #180, July 23–25, will finalize the content of Unicode 16.0 based on beta feedback.
As described in https://www.unicode.org/versions/beta.html, alpha review is focused on the review of the character repertoire, and beta review is focused on the review of properties and algorithms based on a mostly* stable repertoire.
However, in recent years, the Unicode Technical Committee has been able to provide a consistent draft of most of the Unicode Character Database as early as alpha review; this could allow implementers to test parts of their implementations earlier, in particular when it comes to normalization, which is tied to encoding model decisions and thus to the repertoire. (I expect that I will have more to say on that later this month.)
Notable issues for reviewers are called out on the alpha review background document and beta landing page. Of course mind the caveats in these documents: “It is inappropriate to cite these files as other than a work in progress. No products or implementations should be released based on the beta UCD data files.”
As C++ implementers are now consumers of the UCD and implementers of various Unicode algorithms (grapheme cluster segmentation, if only as a best practice, by [format.string.std] paragraph 13, normalization by [lex.name] paragraph 1), I plan to keep this mailing list informed of relevant Public Review Issues as we go through the 16.0 release cycle.
Best regards,
Robin Leroy
—* Changes to the repertoire after beta are highly unlikely, but have happened, and indeed very recently so, as 47 CJK ideographs were removed and 66 were added to Extension I between 15.1β and 15.1. This unusual possibility was foreseen going into beta review. Generally Extension I was encoded under extraordinary circumstances; the interested reader can refer to the following paper trail:
- L2/23-011 Section 18) (search for “This is an extraordinarily bad thing.”) with UTC action item 174-A56;
- L2/23-082 Section 03);
- L2/23-106 = ISO IEC JTC 1/SC 2/WG 2 N5214 with UTC decision 175-C10;
- L2/23-163 Section 01);
- L2/23-114R = ISO/IEC JTC 1/SC 2/WG 2 N5214R2 with UTC decision 176-C1, superseding 175-C10 and amending the repertoire after beta.