C++ Logo


Advanced search

NB comment review: FR [format.string.escaped] aggressive escaping

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 26 Oct 2022 11:55:20 -0400
Please review the following. If you agree with the proposed change and
have no further information to add, then there is no need to respond. If
you disagree with the proposed change, have corrections or new
information to offer, or have comments on the candidate polls, then
*please reply by Monday, October 31st*.

  FR [format.string.escaped]
  <http://eel.is/c++draft/format.string.escaped> aggressive escaping

GitHub nbballot issue #408


The intent of "escaped strings" is unclear and unsuited to many spoken
languages As it stands, this features escapes

  * Invalid code unit sequences
  * Non printable characters
  * All combining marks and other grapheme extenders
  * Whitespaces using different techniques
  * escape \ and "

It is not clear that all these behaviours should be regrouped in the
same facility. In particular, escaping all combining codepoints is not
suitable for many languages, especially Korean.

    Proposed change:

Please consider either:

  * Not escaping combining codepoints which are part of a grapheme cluster.
  * Removing this feature and replace it by a set of facility performing
    each of these operations in a future C++ standard

    SG16 chair notes:

The discussion and polls taken during the 2022-10-19 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings#october-19th-2022> are
consistent with the first bullet of the proposed change:

  * Poll 2.4: [US-38] SG16 agrees that combining code points shall not
    be escaped unless there is no leading code point or the previous
    character was escaped.
      o Attendees: 8
      o No objection to unanimous consent.

The poll consensus indicates the following behavior for these examples
(assuming UTF-8 for the ordinary literal encoding):

  * std::format("{:?}", "\u0301"); // "\u0301" (the combining
    character is escaped since it does not follow a non-escaped
    non-combining character)
  * std::format("{:?}", "\\\u0301"); // "\\\u301" (the combining
    character is escaped since it follows an escaped character)
  * std::format("e{:?}", "\u0301"); // "e\u301" (the combining character
    is escaped since it does not follow a non-escaped non-combining
    character produced from the same field)
  * std::format("{:?}", "e\u0301\u0323"); // "ẹ́" (U+0065, U+0301,
    U+0323; the combining characters are not escaped since they follow a
    non-combining character)

    Candidate polls:

  * [FR-XX]: SG16 recommends accepting the comment in the direction
    presented in the first bullet of the proposed change and as
    recommended in the polls for US-38.


Received on 2022-10-26 15:55:23