C++ Logo

sg16

Advanced search

Suggested wording change for non-Unicode cases in P2286R7: Formatting Ranges

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 26 Apr 2022 16:31:48 -0400
The proposed wording for [format.string.escaped]p4 in P2286R7:
Formatting Ranges
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
currently states:

> The escaped character and escaped string representations of a
> character or string in a non-Unicode encoding is unspecified.
I would like this to be better specified to ensure implementations
behave consistently.

The wording below is suggested as a replacement for
[format.string.escaped]p2-p4 (link to p2
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html#pnum_12>)
and is intended to cover both the Unicode and non-Unicode cases.

The escaped string /E/ representation of a string /S/ is constructed by
encoding a sequence of characters in the associated character encoding
/CE/ for charT ([lex.string.literal]) as follows:

  * /E/ is initialized with U+0022 QUOTATION MARK (").
  * For each code unit sequence /X/ in /S/ that either encodes a single
    character or that is a sequence of ill-formed code units:
      o If /X/ encodes a single character /C/, then:
          + If /C/ is in the table below, then its corresponding
            two-character escape sequence is appended to /E/.
            <insert table here>
          + Otherwise, if /C/ is not U+0020 SPACE and
              # /CE/ is a Unicode encoding and C corresponds to a UCS
                scalar value whose Unicode property General_Category has
                a value in the groups Separator (Z) or Other (C), as
                described by table 12 of UAX#44, or
              # /CE/ is not a Unicode encoding and C is one of an
                implementation-defined set of separator or non-printable
                characters
          + then the sequence \u{/simple-hexadecimal-digit-sequence/} is
            appended to /E/ where /simple-hexadecimal-digit-sequence/ is
            the code point value of /C/ formatted as-if by a standard
            format specifier ([[format.string.std]]) of "{x}".
          + Otherwise, /C/ is appended to /E/.
      o Otherwise /X/ is a sequence of ill-formed code units. For each
        code unit /U/, the sequence
        \x{/simple-hexadecimal-digit-sequence/} is appended to /E/ where
        /simple-hexadecimal-digit-sequence/ is the code point value of
        /U/ formatted as-if by a standard format specifier
        ([[format.string.std]]) of "{x}".
  * U+0022 QUOTATION MARK (") is appended to /E/.

Please offer your thoughts, I would like to discuss this in tomorrow's
SG16 meeting.

Tom.

Received on 2022-04-26 20:31:50