C++ Logo

sg16

Advanced search

Re: Some notes on the Unicode review and release process

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 12 Jan 2024 13:09:03 -0500
Thank you for this, Robin! I found this useful and educational.

My favorite part:

    As C++ implementers are now consumers of the UCD and implementers of
    various Unicode algorithms (grapheme cluster segmentation, if only
    as a best practice, by [format.string.std] paragraph 13
    <https://eel.is/c++draft/format.string.std#13>, normalization by
    [lex.name] paragraph 1 <https://eel.is/c++draft/lex.name#1>), I plan
    to keep this mailing list informed of relevant Public Review Issues
    <https://www.unicode.org/review/> as we go through the 16.0 release
    cycle.

That will be amazingly helpful!

At the moment, I think the C++ standard is dependent on the following
Unicode features, so notification of any significant changes to them
would be highly useful.

  * The UTF forms and schemes.
  * The XID identifier start and continue properties and UAX #31
    (Unicode Identifiers and Syntax) <https://unicode.org/reports/tr31/>.
  * The whitespace character properties (once P2348 (Whitespaces Wording
    Revamp) <https://wg21.link/p2348> is approved).
  * The EGC properties (as already noted for use by std::format()).
  * The character names (as used for /named-universal-character/).
  * The General_Category Separator (Z) and Other (C) properties and the
    Grapheme_Extend property (as used for std::format() escape
    formatting to identify non-printable characters).

Anything I missed above?

Tom.

On 1/11/24 11:30 AM, Robin Leroy via SG16 wrote:
> Dear ISO/IEC JTC 1/SC 22/WG 21/SG 16,
>
> As was discussed at length in the 2024-01-10 meeting, in the ideal
> ISO/IEC world where C++11 vanished from existence ten years ago and
> defect reports (at least as handled by C++) do not exist, compilers
> implementing C++23 will become nonconformant every year on the second
> Tuesday in September until they pick up the new version of the Unicode
> Standard.
>
> While I do not think anyone in the real world should expect their
> compiler and standard libraries to update on the day of the Unicode
> release (indeed, libraries published by the Unicode Consortium itself,
> such as ICU, are typically a few weeks behind when everything goes
> well), I would like to shed some light on the Unicode review process,
> which should allow implementers to iron out kinks ahead of time.
> (Besides, a better mutual understanding of the standardization
> processes involved on both sides of this liaison is likely a good thing.)
>
> Importantly, this process allows for feedback from implementers.
> Because of the Unicode Consortium’s stability policies
> <https://www.unicode.org/policies/stability_policy.html>, some
> feedback can only ever be addressed if it is received at the
> appropriate stage. In any case, feedback received after the review
> periods can generally only be addressed in a subsequent version of
> Unicode (though note that the Unicode Standard has corrigenda
> <https://www.unicode.org/versions/#Corrigenda>, which, just like its
> versions work a bit differently from ISO editions, work a bit
> differently from ISO/IEC technical corrigenda and corrected versions).
>
> The following timeline is based on L2/23-264
> <https://www.unicode.org/L2/L2023/23264-relmgmt-report.pdf>, approved
> by Unicode Technical Committee decision 177-C1
> <https://www.unicode.org/L2/L2023/23231.htm#177-C1>.
> The next version of Unicode will be Version 16.0, scheduled to be
> released on 2024-09-10 (the second Tuesday in September, as is
> tradition since 2021).
> There are two review periods:
>
> * alpha review will run from 2024-02-06 to 2024-04-02;
> * beta review will run from 2024-05-21 to 2024-07-02.
>
> Decisions are made at UTC meetings, nowadays mostly on the
> recommendation of various groups, most relevant here being the PAG
> <https://www.unicode.org/consortium/props-algorithms.html>. The time
> between the end of review periods and the UTC meeting allows for these
> groups to process review feedback. UTC #178 is coming (January 23–35),
> UTC #179, April 23–25, will take decisions based on alpha feedback,
> and UTC #180, July 23–25, will finalize the content of Unicode 16.0
> based on beta feedback.
>
> As described in https://www.unicode.org/versions/beta.html, alpha
> review is focused on the review of the character repertoire, and beta
> review is focused on the review of properties and algorithms based on
> a mostly* stable repertoire.
>
> However, in recent years, the Unicode Technical Committee has been
> able to provide a consistent draft of most of the Unicode Character
> Database as early as alpha review; this could allow implementers to
> test parts of their implementations earlier, in particular when it
> comes to normalization, which is tied to encoding model decisions and
> thus to the repertoire. (I expect that I will have more to say on that
> later this month.)
>
> Notable issues for reviewers are called out on the alpha review
> background document and beta landing page. Of course mind the caveats
> in these documents: “It is inappropriate to cite these files as other
> than a work in progress. No products or implementations should be
> released based on the beta UCD data files.”
>
> As C++ implementers are now consumers of the UCD and implementers of
> various Unicode algorithms (grapheme cluster segmentation, if only as
> a best practice, by [format.string.std] paragraph 13
> <https://eel.is/c++draft/format.string.std#13>, normalization by
> [lex.name] paragraph 1 <https://eel.is/c++draft/lex.name#1>), I plan
> to keep this mailing list informed of relevant Public Review Issues
> <https://www.unicode.org/review/> as we go through the 16.0 release cycle.
>
> Best regards,
>
> Robin Leroy
>
> —
> * Changes to the repertoire after beta are highly unlikely, but have
> happened, and indeed very recently so, as 47 CJK ideographs were
> removed and 66 were added to Extension I between 15.1β and 15.1. This
> unusual possibility was foreseen going into beta review. Generally
> Extension I was encoded under extraordinary circumstances; the
> interested reader can refer to the following paper trail:
>
> * L2/23-011
> <https://www.unicode.org/L2/L2023/23011-cjk-unihan-group-utc174.pdf> Section
> 18) (search for “This is an extraordinarily bad thing.”) with UTC
> action item 174-A56
> <https://www.unicode.org/L2/L2023/23005.htm#174-A56>;
> * L2/23-082
> <https://www.unicode.org/L2/L2023/23082-cjk-unihan-group-utc175.pdf> Section
> 03);
> * L2/23-106 = ISO IEC JTC 1/SC 2/WG 2 N5214
> <https://www.unicode.org/L2/L2023/23106-unc-extension-i.pdf> with
> UTC decision 175-C10
> <https://www.unicode.org/L2/L2023/23076.htm#175-C10>;
> * L2/23-163
> <https://www.unicode.org/L2/L2023/23163-cjk-unihan-group-utc176.pdf> Section
> 01);
> * L2/23-114R = ISO/IEC JTC 1/SC 2/WG 2 N5214R2
> <https://www.unicode.org/L2/L2023/23114r-unc-extension-i.pdf> with
> UTC decision 176-C1
> <https://www.unicode.org/L2/L2023/23157.htm#176-C1>, superseding
> 175-C10 and amending the repertoire after beta.
>
>

Received on 2024-01-12 18:09:07