C++ Logo

sg16

Advanced search

Some notes on the Unicode review and release process

From: Robin Leroy <eggrobin_at_[hidden]>
Date: Thu, 11 Jan 2024 17:30:16 +0100
Dear ISO/IEC JTC 1/SC 22/WG 21/SG 16,

As was discussed at length in the 2024-01-10 meeting, in the ideal ISO/IEC
world where C++11 vanished from existence ten years ago and defect reports
(at least as handled by C++) do not exist, compilers implementing C++23
will become nonconformant every year on the second Tuesday in September
until they pick up the new version of the Unicode Standard.

While I do not think anyone in the real world should expect their compiler
and standard libraries to update on the day of the Unicode release (indeed,
libraries published by the Unicode Consortium itself, such as ICU, are
typically a few weeks behind when everything goes well), I would like to
shed some light on the Unicode review process, which should allow
implementers to iron out kinks ahead of time. (Besides, a better mutual
understanding of the standardization processes involved on both sides of
this liaison is likely a good thing.)

Importantly, this process allows for feedback from implementers. Because of
the Unicode Consortium’s stability policies
<https://www.unicode.org/policies/stability_policy.html>, some feedback can
only ever be addressed if it is received at the appropriate stage. In any
case, feedback received after the review periods can generally only be
addressed in a subsequent version of Unicode (though note that the Unicode
Standard has corrigenda <https://www.unicode.org/versions/#Corrigenda>,
which, just like its versions work a bit differently from ISO editions,
work a bit differently from ISO/IEC technical corrigenda and corrected
versions).

The following timeline is based on L2/23-264
<https://www.unicode.org/L2/L2023/23264-relmgmt-report.pdf>, approved by
Unicode Technical Committee decision 177-C1
<https://www.unicode.org/L2/L2023/23231.htm#177-C1>.
The next version of Unicode will be Version 16.0, scheduled to be released
on 2024-09-10 (the second Tuesday in September, as is tradition since 2021).
There are two review periods:

   - alpha review will run from 2024-02-06 to 2024-04-02;
   - beta review will run from 2024-05-21 to 2024-07-02.

Decisions are made at UTC meetings, nowadays mostly on the recommendation
of various groups, most relevant here being the PAG
<https://www.unicode.org/consortium/props-algorithms.html>. The time
between the end of review periods and the UTC meeting allows for these
groups to process review feedback. UTC #178 is coming (January 23–35), UTC
#179, April 23–25, will take decisions based on alpha feedback, and UTC
#180, July 23–25, will finalize the content of Unicode 16.0 based on beta
feedback.

As described in https://www.unicode.org/versions/beta.html, alpha review is
focused on the review of the character repertoire, and beta review is
focused on the review of properties and algorithms based on a mostly*
stable repertoire.

However, in recent years, the Unicode Technical Committee has been able to
provide a consistent draft of most of the Unicode Character Database as
early as alpha review; this could allow implementers to test parts of their
implementations earlier, in particular when it comes to normalization,
which is tied to encoding model decisions and thus to the repertoire. (I
expect that I will have more to say on that later this month.)

Notable issues for reviewers are called out on the alpha review background
document and beta landing page. Of course mind the caveats in these
documents: “It is inappropriate to cite these files as other than a work in
progress. No products or implementations should be released based on the
beta UCD data files.”

As C++ implementers are now consumers of the UCD and implementers of
various Unicode algorithms (grapheme cluster segmentation, if only as a
best practice, by [format.string.std] paragraph 13
<https://eel.is/c++draft/format.string.std#13>, normalization by [lex.name]
paragraph 1 <https://eel.is/c++draft/lex.name#1>), I plan to keep this
mailing list informed of relevant Public Review Issues
<https://www.unicode.org/review/> as we go through the 16.0 release cycle.

Best regards,

Robin Leroy


* Changes to the repertoire after beta are highly unlikely, but have
happened, and indeed very recently so, as 47 CJK ideographs were removed
and 66 were added to Extension I between 15.1β and 15.1. This unusual
possibility was foreseen going into beta review. Generally Extension I was
encoded under extraordinary circumstances; the interested reader can refer
to the following paper trail:

   - L2/23-011
   <https://www.unicode.org/L2/L2023/23011-cjk-unihan-group-utc174.pdf> Section
   18) (search for “This is an extraordinarily bad thing.”) with UTC action
   item 174-A56 <https://www.unicode.org/L2/L2023/23005.htm#174-A56>;
   - L2/23-082
   <https://www.unicode.org/L2/L2023/23082-cjk-unihan-group-utc175.pdf> Section
   03);
   - L2/23-106 = ISO IEC JTC 1/SC 2/WG 2 N5214
   <https://www.unicode.org/L2/L2023/23106-unc-extension-i.pdf> with UTC
   decision 175-C10 <https://www.unicode.org/L2/L2023/23076.htm#175-C10>;
   - L2/23-163
   <https://www.unicode.org/L2/L2023/23163-cjk-unihan-group-utc176.pdf> Section
   01);
   - L2/23-114R = ISO/IEC JTC 1/SC 2/WG 2 N5214R2
   <https://www.unicode.org/L2/L2023/23114r-unc-extension-i.pdf> with UTC
   decision 176-C1 <https://www.unicode.org/L2/L2023/23157.htm#176-C1>,
   superseding 175-C10 and amending the repertoire after beta.

Received on 2024-01-11 16:30:36