Date: Wed, 26 Oct 2022 11:55:20 -0400
Please review the following. If you agree with the proposed change and
have no further information to add, then there is no need to respond. If
you disagree with the proposed change, have corrections or new
information to offer, or have comments on the candidate polls, then
*please reply by Monday, October 31st*.
FR 22.14.6.4 [format.string.escaped]
<http://eel.is/c++draft/format.string.escaped> aggressive escaping
GitHub nbballot issue #408
<https://github.com/cplusplus/nbballot/issues/408>.
Comment:
The intent of "escaped strings" is unclear and unsuited to many spoken
languages As it stands, this features escapes
* Invalid code unit sequences
* Non printable characters
* All combining marks and other grapheme extenders
* Whitespaces using different techniques
* escape \ and "
It is not clear that all these behaviours should be regrouped in the
same facility. In particular, escaping all combining codepoints is not
suitable for many languages, especially Korean.
Proposed change:
Please consider either:
* Not escaping combining codepoints which are part of a grapheme cluster.
* Removing this feature and replace it by a set of facility performing
each of these operations in a future C++ standard
SG16 chair notes:
The discussion and polls taken during the 2022-10-19 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings#october-19th-2022> are
consistent with the first bullet of the proposed change:
* Poll 2.4: [US-38] SG16 agrees that combining code points shall not
be escaped unless there is no leading code point or the previous
character was escaped.
o Attendees: 8
o No objection to unanimous consent.
The poll consensus indicates the following behavior for these examples
(assuming UTF-8 for the ordinary literal encoding):
* std::format("{:?}", "\u0301"); // "\u0301" (the combining
character is escaped since it does not follow a non-escaped
non-combining character)
* std::format("{:?}", "\\\u0301"); // "\\\u301" (the combining
character is escaped since it follows an escaped character)
* std::format("e{:?}", "\u0301"); // "e\u301" (the combining character
is escaped since it does not follow a non-escaped non-combining
character produced from the same field)
* std::format("{:?}", "e\u0301\u0323"); // "ẹ́" (U+0065, U+0301,
U+0323; the combining characters are not escaped since they follow a
non-combining character)
Candidate polls:
* [FR-XX]: SG16 recommends accepting the comment in the direction
presented in the first bullet of the proposed change and as
recommended in the polls for US-38.
Tom.
have no further information to add, then there is no need to respond. If
you disagree with the proposed change, have corrections or new
information to offer, or have comments on the candidate polls, then
*please reply by Monday, October 31st*.
FR 22.14.6.4 [format.string.escaped]
<http://eel.is/c++draft/format.string.escaped> aggressive escaping
GitHub nbballot issue #408
<https://github.com/cplusplus/nbballot/issues/408>.
Comment:
The intent of "escaped strings" is unclear and unsuited to many spoken
languages As it stands, this features escapes
* Invalid code unit sequences
* Non printable characters
* All combining marks and other grapheme extenders
* Whitespaces using different techniques
* escape \ and "
It is not clear that all these behaviours should be regrouped in the
same facility. In particular, escaping all combining codepoints is not
suitable for many languages, especially Korean.
Proposed change:
Please consider either:
* Not escaping combining codepoints which are part of a grapheme cluster.
* Removing this feature and replace it by a set of facility performing
each of these operations in a future C++ standard
SG16 chair notes:
The discussion and polls taken during the 2022-10-19 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings#october-19th-2022> are
consistent with the first bullet of the proposed change:
* Poll 2.4: [US-38] SG16 agrees that combining code points shall not
be escaped unless there is no leading code point or the previous
character was escaped.
o Attendees: 8
o No objection to unanimous consent.
The poll consensus indicates the following behavior for these examples
(assuming UTF-8 for the ordinary literal encoding):
* std::format("{:?}", "\u0301"); // "\u0301" (the combining
character is escaped since it does not follow a non-escaped
non-combining character)
* std::format("{:?}", "\\\u0301"); // "\\\u301" (the combining
character is escaped since it follows an escaped character)
* std::format("e{:?}", "\u0301"); // "e\u301" (the combining character
is escaped since it does not follow a non-escaped non-combining
character produced from the same field)
* std::format("{:?}", "e\u0301\u0323"); // "ẹ́" (U+0065, U+0301,
U+0323; the combining characters are not escaped since they follow a
non-combining character)
Candidate polls:
* [FR-XX]: SG16 recommends accepting the comment in the direction
presented in the first bullet of the proposed change and as
recommended in the polls for US-38.
Tom.
Received on 2022-10-26 15:55:23