FYI I consider that the direction already accepted for US-38 is a suitable resolution to that issue.

On Wed, Oct 26, 2022 at 5:55 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
Please review the following. If you agree with the proposed change and have no further information to add, then there is no need to respond. If you disagree with the proposed change, have corrections or new information to offer, or have comments on the candidate polls, then please reply by Monday, October 31st.

FR 22.14.6.4 [format.string.escaped] aggressive escaping

GitHub nbballot issue #408.

Comment:

The intent of "escaped strings" is unclear and unsuited to many spoken languages As it stands, this features escapes

  • Invalid code unit sequences
  • Non printable characters
  • All combining marks and other grapheme extenders
  • Whitespaces using different techniques
  • escape \ and "

It is not clear that all these behaviours should be regrouped in the same facility. In particular, escaping all combining codepoints is not suitable for many languages, especially Korean.

Proposed change:

Please consider either:

  • Not escaping combining codepoints which are part of a grapheme cluster.
  • Removing this feature and replace it by a set of facility performing each of these operations in a future C++ standard

SG16 chair notes:

The discussion and polls taken during the 2022-10-19 SG16 telecon are consistent with the first bullet of the proposed change:

  • Poll 2.4: [US-38] SG16 agrees that combining code points shall not be escaped unless there is no leading code point or the previous character was escaped.
    • Attendees: 8
    • No objection to unanimous consent.

The poll consensus indicates the following behavior for these examples (assuming UTF-8 for the ordinary literal encoding):

  • std::format("{:?}", "\u0301");        // "\u0301" (the combining character is escaped since it does not follow a non-escaped non-combining character)
  • std::format("{:?}", "\\\u0301");      // "\\\u301" (the combining character is escaped since it follows an escaped character)
  • std::format("e{:?}", "\u0301");       // "e\u301" (the combining character is escaped since it does not follow a non-escaped non-combining character produced from the same field)
  • std::format("{:?}", "e\u0301\u0323"); // "ẹ́" (U+0065, U+0301, U+0323; the combining characters are not escaped since they follow a non-combining character)

Candidate polls:

  • [FR-XX]: SG16 recommends accepting the comment in the direction presented in the first bullet of the proposed change and as recommended in the polls for US-38.

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16