Subject: Re: New draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-29 22:52:04
On 6/28/20 2:03 AM, Corentin Jabot wrote:
> On Sun, 28 Jun 2020 at 07:37, Corentin Jabot <corentinjabot_at_[hidden]
> <mailto:corentinjabot_at_[hidden]>> wrote:
> On Sun, Jun 28, 2020, 06:50 Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> A new draft revision of P2029 (Proposed resolution for core
> issues 411, 1656, and 2333; numeric and universal character
> escapes in character and string literals) is now available at
> This addresses the CWG feedback provided during the March
> 23rd, 2020 core issues processing teleconference
> Wording review feedback prior to the next Core issues
> processing teleconference would be much appreciated!
> I really like the overall direction, a few comments:
> - Can we not make conditionally supported escape sequences part of
> the grammar?
This was requested by Core in the 2020-01-16 issues processing telecon
> What I would do:
> Â Â Â any member of the basic source character set other than u, U, x,
> and the members of octal-digit
> And in 5.13, keep
> Escape sequences not listed in Table 9 are conditionally supported,
> with implementation-defined semantics
What problem would that solve?
> I would also keep
> An escape sequence specifies a single fcode unit.
The ability for a conditional escape sequence to specify a code unit
sequence was discussed during the 2020-03-23 issues processing telecon
Since such sequences are implementation-defined anyway, I don't know of
any reason to prohibit them expanding to multiple code units.Â For
sequences that specify a character, whether a single code unit is
encoded or multiple are should be determined by the character encoding.Â
If we want to enforce such a restriction, I think it belongs in
[lex.charset]p3 <http://eel.is/c++draft/lex.charset#3> (I thought we
already had normative wording that requires members of the basic source
character set be encoded as a single code unit, but I don't see it now).
> - Can we not add notesÂ for stateful encodings? It doesn't add
Stateful encodings were discussed in the 2020-03-23 issues processing
> - Wide multi character literals were not a thing, let's not make
> them one now. same forÂ conditional character literals and
> conditional wide character literals.
> Instead, please add text inÂ (Z) to describeÂ them?
> -ordinary and wide characters literal consisting of a single
> basic-c-char, simple-escape-sequence, or universal-character-name
> that specifies a character that either lacks representation in the
> associated character encodingÂ or that cannot be encoded as a
> single code unit
> are conditionallyÂ supportedÂ and have an implementation-defined value
> - A wide character literal consisting of multiple c-chars is
> conditionally-supported and has an implementation-defined value.
Giving these odd literals a name was suggested by Core.Â I agree with
their suggested direction; giving them a name makes it easier to discuss
and define them.
> Please change
> The sequence of characters denoted by each contiguous sequence of
> basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and
> universal-character-names ([lex.charset]) is encoded to a code
> unit sequence
> Each basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]),
> and universal-character-names ([lex.charset]) is encoded to a code
> unit sequence
The intent is to make it clear that these sequences are encoded as a
group.Â This is necessary for stateful encodings with SI/SO characters
since such characters don't necessarily contribute a code unit sequence
on their own.Â This was also requested during the 2020-03-23 issues
> - please replaceÂ applicableÂ character encoding byÂ character encoding
That doesn't seem correct to me; the wording needs to indicate which
character encoding.Â Note that there are three occurrences of
"applicable associated character encoding"; I'm not sure which use you
were referring to.
> - not sure replacing `\0` by null character is an improvement
It avoids a correction to state something like, "a '\0', L\'0', u8'\0',
u'\0', or U'\0' is appended ...". [lex.charset]p3
<http://eel.is/c++draft/lex.charset#3> defines /null character/ (though
the definition there isn't perfect either, I think it is an improvement).
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
SG16 list run by firstname.lastname@example.org