Date: Fri, 20 May 2022 13:18:49 -0400
On 5/20/22 12:34 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, May 25th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220525T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * D2572R0: std::format() fill character allowances
> <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
> o Continue review pending the availability of an updated revision.
> * L2/22-072R: Proposal for amendments to UAX#9 and UAX#31
> <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf>
> o Review for familiarity and relevance to P1949: C++ Identifier
> Syntax using Unicode Standard Annex 31 <https://wg21.link/p1949>.
>
> L2/22-072R
> <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf> was
> produced by the Unicode Source Code Ad-Hoc Group and adopted in April
> into the proposed updates for Unicode 15 per the Draft Minutes of UTC
> Meeting 171 <https://www.unicode.org/L2/L2022/22061.htm#171-C25>.
> Thanks are owed to Robin Leroy (CC'd) for bringing this paper to our
> attention. The paper discusses handling of source code that contains
> characters that have right-to-left (RTL) directionality. The changes
> made to UAX#9 (Unicode Bidirectional Algorithm)
> <https://www.unicode.org/reports/tr9/proposed.html#HL4Example2> (in
> yellow highlight) are concerned with presentation of source code and
> is therefore more of a concern for SG15 (Tooling) where it would be
> applicable to compilers (e.g., in diagnostics), editors, code review
> tools, etc... The changes to UAX#31 (Unicode Identifier and Pattern
> Syntax)
> <https://www.unicode.org/reports/tr31/proposed.html#Pattern_Syntax>
> (in yellow highlight) clarify that rule UAX31-R3
> <https://unicode.org/reports/tr31/#R3> is applicable to programming
> languages and present an example illustrating how use of LEFT-TO-RIGHT
> MARK (LRM) and RIGHT-TO-LEFT MARK (RLM) as whitespace characters (but
> not in isolation) may be desirable so that source code rendered as
> plain text does not present the source code in a confusing or
> surprising manner. The adopted changes suggest (at least) the
> following items for us to consider:
>
> 1. [uaxid.pattern]p2 <http://eel.is/c++draft/uaxid.pattern#2>, as
> added by P1949 <https://wg21.link/p1949>, states that UAX31-R3
> <https://unicode.org/reports/tr31/#R3> is not applicable to C++
> but in light of the updates above, that is not correct. The entry
> should be updated to state our conformance and possibly declare a
> profile for our use of Pattern_White_Space
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_White_Space%3A%5D&g=&i=>
> and Pattern_Syntax
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_Syntax%3A%5D&g=&i=>
> characters.
> 2. Per the example added to UAX31-R3
> <https://unicode.org/reports/tr31/#R3>, consider allowing LRM and
> RLM to appear in whitespace (this would be an additional change to
> consider on top of P2348: Whitespaces Wording Revamp
> <https://wg21.link/p2348> after C++23 pending updated Unicode
> guidance).
> 3. Consider proposing recommended display behaviors to SG15;
> presumably inline with HL4 from UAX#9 section 4.3, "Higher-Level
> Protocols"
> <https://unicode.org/reports/tr9/#Higher-Level_Protocols>. My
> understanding is that Microsoft Visual Studio implements this
> behavior. Opportunities for diagnostic improvements can be seen at
> https://godbolt.org/z/MM1xE5dM1 (note that the carat position is
> not aligned with the identifier it intends to highlight; this is
> because the code display and carat location are not in sync with
> regard to how RTL characters affect presentation).
>
With regard to these last two items, https://godbolt.org/z/vzo996Gnr
demonstrates what current compilers do if a LRM is inserted after the
undefined identifier. All three compilers reject the LRM, but its
presence corrects the source code display such that the carat alignment
works as intended.
Tom.
>
> SG16 will hold a telecon on Wednesday, May 25th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220525T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * D2572R0: std::format() fill character allowances
> <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
> o Continue review pending the availability of an updated revision.
> * L2/22-072R: Proposal for amendments to UAX#9 and UAX#31
> <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf>
> o Review for familiarity and relevance to P1949: C++ Identifier
> Syntax using Unicode Standard Annex 31 <https://wg21.link/p1949>.
>
> L2/22-072R
> <https://www.unicode.org/L2/L2022/22072r-uax9-uax31-amd.pdf> was
> produced by the Unicode Source Code Ad-Hoc Group and adopted in April
> into the proposed updates for Unicode 15 per the Draft Minutes of UTC
> Meeting 171 <https://www.unicode.org/L2/L2022/22061.htm#171-C25>.
> Thanks are owed to Robin Leroy (CC'd) for bringing this paper to our
> attention. The paper discusses handling of source code that contains
> characters that have right-to-left (RTL) directionality. The changes
> made to UAX#9 (Unicode Bidirectional Algorithm)
> <https://www.unicode.org/reports/tr9/proposed.html#HL4Example2> (in
> yellow highlight) are concerned with presentation of source code and
> is therefore more of a concern for SG15 (Tooling) where it would be
> applicable to compilers (e.g., in diagnostics), editors, code review
> tools, etc... The changes to UAX#31 (Unicode Identifier and Pattern
> Syntax)
> <https://www.unicode.org/reports/tr31/proposed.html#Pattern_Syntax>
> (in yellow highlight) clarify that rule UAX31-R3
> <https://unicode.org/reports/tr31/#R3> is applicable to programming
> languages and present an example illustrating how use of LEFT-TO-RIGHT
> MARK (LRM) and RIGHT-TO-LEFT MARK (RLM) as whitespace characters (but
> not in isolation) may be desirable so that source code rendered as
> plain text does not present the source code in a confusing or
> surprising manner. The adopted changes suggest (at least) the
> following items for us to consider:
>
> 1. [uaxid.pattern]p2 <http://eel.is/c++draft/uaxid.pattern#2>, as
> added by P1949 <https://wg21.link/p1949>, states that UAX31-R3
> <https://unicode.org/reports/tr31/#R3> is not applicable to C++
> but in light of the updates above, that is not correct. The entry
> should be updated to state our conformance and possibly declare a
> profile for our use of Pattern_White_Space
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_White_Space%3A%5D&g=&i=>
> and Pattern_Syntax
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3APattern_Syntax%3A%5D&g=&i=>
> characters.
> 2. Per the example added to UAX31-R3
> <https://unicode.org/reports/tr31/#R3>, consider allowing LRM and
> RLM to appear in whitespace (this would be an additional change to
> consider on top of P2348: Whitespaces Wording Revamp
> <https://wg21.link/p2348> after C++23 pending updated Unicode
> guidance).
> 3. Consider proposing recommended display behaviors to SG15;
> presumably inline with HL4 from UAX#9 section 4.3, "Higher-Level
> Protocols"
> <https://unicode.org/reports/tr9/#Higher-Level_Protocols>. My
> understanding is that Microsoft Visual Studio implements this
> behavior. Opportunities for diagnostic improvements can be seen at
> https://godbolt.org/z/MM1xE5dM1 (note that the carat position is
> not aligned with the identifier it intends to highlight; this is
> because the code display and carat location are not in sync with
> regard to how RTL characters affect presentation).
>
With regard to these last two items, https://godbolt.org/z/vzo996Gnr
demonstrates what current compilers do if a LRM is inserted after the
undefined identifier. All three compilers reject the LRM, but its
presence corrects the source code display such that the carat alignment
works as intended.
Tom.
Received on 2022-05-20 17:18:52