C++ Logo

sg16

Advanced search

Agenda for the 2022-05-25 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 20 May 2022 12:34:23 -0400
SG16 will hold a telecon on Wednesday, May 25th, at 19:30 UTC (timezone
conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20220525T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).

The agenda is:

  * D2572R0: std::format() fill character allowances
    <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
      o Continue review pending the availability of an updated revision.
  * L2/22-072R: Proposal for amendments to UAX#9 and UAX#31
    <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf>
      o Review for familiarity and relevance to P1949: C++ Identifier
        Syntax using Unicode Standard Annex 31 <https://wg21.link/p1949>.

L2/22-072R <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf>
was produced by the Unicode Source Code Ad-Hoc Group and adopted in
April into the proposed updates for Unicode 15 per the Draft Minutes of
UTC Meeting 171 <https://www.unicode.org/L2/L2022/22061.htm#171-C25>.
Thanks are owed to Robin Leroy (CC'd) for bringing this paper to our
attention. The paper discusses handling of source code that contains
characters that have right-to-left (RTL) directionality. The changes
made to UAX#9 (Unicode Bidirectional Algorithm)
<https://www.unicode.org/reports/tr9/proposed.html#HL4Example2> (in
yellow highlight) are concerned with presentation of source code and is
therefore more of a concern for SG15 (Tooling) where it would be
applicable to compilers (e.g., in diagnostics), editors, code review
tools, etc... The changes to UAX#31 (Unicode Identifier and Pattern
Syntax)
<https://www.unicode.org/reports/tr31/proposed.html#Pattern_Syntax> (in
yellow highlight) clarify that rule UAX31-R3
<https://unicode.org/reports/tr31/#R3> is applicable to programming
languages and present an example illustrating how use of LEFT-TO-RIGHT
MARK (LRM) and RIGHT-TO-LEFT MARK (RLM) as whitespace characters (but
not in isolation) may be desirable so that source code rendered as plain
text does not present the source code in a confusing or surprising
manner. The adopted changes suggest (at least) the following items for
us to consider:

 1. [uaxid.pattern]p2 <http://eel.is/c++draft/uaxid.pattern#2>, as added
    by P1949 <https://wg21.link/p1949>, states that UAX31-R3
    <https://unicode.org/reports/tr31/#R3> is not applicable to C++ but
    in light of the updates above, that is not correct. The entry should
    be updated to state our conformance and possibly declare a profile
    for our use of Pattern_White_Space
    <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_White_Space%3A%5D&g=&i=>
    and Pattern_Syntax
    <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_Syntax%3A%5D&g=&i=>
    characters.
 2. Per the example added to UAX31-R3
    <https://unicode.org/reports/tr31/#R3>, consider allowing LRM and
    RLM to appear in whitespace (this would be an additional change to
    consider on top of P2348: Whitespaces Wording Revamp
    <https://wg21.link/p2348> after C++23 pending updated Unicode guidance).
 3. Consider proposing recommended display behaviors to SG15; presumably
    inline with HL4 from UAX#9 section 4.3, "Higher-Level Protocols"
    <https://unicode.org/reports/tr9/#Higher-Level_Protocols>. My
    understanding is that Microsoft Visual Studio implements this
    behavior. Opportunities for diagnostic improvements can be seen at
    https://godbolt.org/z/MM1xE5dM1 (note that the carat position is not
    aligned with the identifier it intends to highlight; this is because
    the code display and carat location are not in sync with regard to
    how RTL characters affect presentation).

Tom.

Received on 2022-05-20 16:34:24