Date: Tue, 26 Apr 2022 16:31:48 -0400
The proposed wording for [format.string.escaped]p4 in P2286R7:
Formatting Ranges
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
currently states:
> The escaped character and escaped string representations of a
> character or string in a non-Unicode encoding is unspecified.
I would like this to be better specified to ensure implementations
behave consistently.
The wording below is suggested as a replacement for
[format.string.escaped]p2-p4 (link to p2
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html#pnum_12>)
and is intended to cover both the Unicode and non-Unicode cases.
The escaped string /E/ representation of a string /S/ is constructed by
encoding a sequence of characters in the associated character encoding
/CE/ for charT ([lex.string.literal]) as follows:
* /E/ is initialized with U+0022 QUOTATION MARK (").
* For each code unit sequence /X/ in /S/ that either encodes a single
character or that is a sequence of ill-formed code units:
o If /X/ encodes a single character /C/, then:
+ If /C/ is in the table below, then its corresponding
two-character escape sequence is appended to /E/.
<insert table here>
+ Otherwise, if /C/ is not U+0020 SPACE and
# /CE/ is a Unicode encoding and C corresponds to a UCS
scalar value whose Unicode property General_Category has
a value in the groups Separator (Z) or Other (C), as
described by table 12 of UAX#44, or
# /CE/ is not a Unicode encoding and C is one of an
implementation-defined set of separator or non-printable
characters
+ then the sequence \u{/simple-hexadecimal-digit-sequence/} is
appended to /E/ where /simple-hexadecimal-digit-sequence/ is
the code point value of /C/ formatted as-if by a standard
format specifier ([[format.string.std]]) of "{x}".
+ Otherwise, /C/ is appended to /E/.
o Otherwise /X/ is a sequence of ill-formed code units. For each
code unit /U/, the sequence
\x{/simple-hexadecimal-digit-sequence/} is appended to /E/ where
/simple-hexadecimal-digit-sequence/ is the code point value of
/U/ formatted as-if by a standard format specifier
([[format.string.std]]) of "{x}".
* U+0022 QUOTATION MARK (") is appended to /E/.
Please offer your thoughts, I would like to discuss this in tomorrow's
SG16 meeting.
Tom.
Formatting Ranges
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
currently states:
> The escaped character and escaped string representations of a
> character or string in a non-Unicode encoding is unspecified.
I would like this to be better specified to ensure implementations
behave consistently.
The wording below is suggested as a replacement for
[format.string.escaped]p2-p4 (link to p2
<https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html#pnum_12>)
and is intended to cover both the Unicode and non-Unicode cases.
The escaped string /E/ representation of a string /S/ is constructed by
encoding a sequence of characters in the associated character encoding
/CE/ for charT ([lex.string.literal]) as follows:
* /E/ is initialized with U+0022 QUOTATION MARK (").
* For each code unit sequence /X/ in /S/ that either encodes a single
character or that is a sequence of ill-formed code units:
o If /X/ encodes a single character /C/, then:
+ If /C/ is in the table below, then its corresponding
two-character escape sequence is appended to /E/.
<insert table here>
+ Otherwise, if /C/ is not U+0020 SPACE and
# /CE/ is a Unicode encoding and C corresponds to a UCS
scalar value whose Unicode property General_Category has
a value in the groups Separator (Z) or Other (C), as
described by table 12 of UAX#44, or
# /CE/ is not a Unicode encoding and C is one of an
implementation-defined set of separator or non-printable
characters
+ then the sequence \u{/simple-hexadecimal-digit-sequence/} is
appended to /E/ where /simple-hexadecimal-digit-sequence/ is
the code point value of /C/ formatted as-if by a standard
format specifier ([[format.string.std]]) of "{x}".
+ Otherwise, /C/ is appended to /E/.
o Otherwise /X/ is a sequence of ill-formed code units. For each
code unit /U/, the sequence
\x{/simple-hexadecimal-digit-sequence/} is appended to /E/ where
/simple-hexadecimal-digit-sequence/ is the code point value of
/U/ formatted as-if by a standard format specifier
([[format.string.std]]) of "{x}".
* U+0022 QUOTATION MARK (") is appended to /E/.
Please offer your thoughts, I would like to discuss this in tomorrow's
SG16 meeting.
Tom.
Received on 2022-04-26 20:31:50