C++ Logo

sg16

Advanced search

Re: NB comment review: FR 22.14.6.4 [format.string.escaped] aggressive escaping

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 26 Oct 2022 19:48:28 +0200
FYI I consider that the direction already accepted for US-38 is a suitable
resolution to that issue.

On Wed, Oct 26, 2022 at 5:55 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> Please review the following. If you agree with the proposed change and
> have no further information to add, then there is no need to respond. If
> you disagree with the proposed change, have corrections or new information
> to offer, or have comments on the candidate polls, then *please reply by
> Monday, October 31st*.
> FR 22.14.6.4 [format.string.escaped]
> <http://eel.is/c++draft/format.string.escaped> aggressive escaping
>
> GitHub nbballot issue #408
> <https://github.com/cplusplus/nbballot/issues/408>.
> Comment:
>
> The intent of "escaped strings" is unclear and unsuited to many spoken
> languages As it stands, this features escapes
>
> - Invalid code unit sequences
> - Non printable characters
> - All combining marks and other grapheme extenders
> - Whitespaces using different techniques
> - escape \ and "
>
> It is not clear that all these behaviours should be regrouped in the same
> facility. In particular, escaping all combining codepoints is not suitable
> for many languages, especially Korean.
> Proposed change:
>
> Please consider either:
>
> - Not escaping combining codepoints which are part of a grapheme
> cluster.
> - Removing this feature and replace it by a set of facility performing
> each of these operations in a future C++ standard
>
> SG16 chair notes:
>
> The discussion and polls taken during the 2022-10-19 SG16 telecon
> <https://github.com/sg16-unicode/sg16-meetings#october-19th-2022> are
> consistent with the first bullet of the proposed change:
>
> - Poll 2.4: [US-38] SG16 agrees that combining code points shall not
> be escaped unless there is no leading code point or the previous character
> was escaped.
> - Attendees: 8
> - No objection to unanimous consent.
>
> The poll consensus indicates the following behavior for these examples
> (assuming UTF-8 for the ordinary literal encoding):
>
> - std::format("{:?}", "\u0301"); // "\u0301" (the combining
> character is escaped since it does not follow a non-escaped non-combining
> character)
> - std::format("{:?}", "\\\u0301"); // "\\\u301" (the combining
> character is escaped since it follows an escaped character)
> - std::format("e{:?}", "\u0301"); // "e\u301" (the combining
> character is escaped since it does not follow a non-escaped
> non-combining character produced from the same field)
> - std::format("{:?}", "e\u0301\u0323"); // "ẹ́" (U+0065, U+0301,
> U+0323; the combining characters are not escaped since they follow a
> non-combining character)
>
> Candidate polls:
>
> - [FR-XX]: SG16 recommends accepting the comment in the direction
> presented in the first bullet of the proposed change and as recommended in
> the polls for US-38.
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-10-26 17:48:42