This was requested by Core in the 2020-01-16 issues processing telecon.
On Sun, 28 Jun 2020 at 07:37, Corentin Jabot <corentinjabot@gmail.com> wrote:
On Sun, Jun 28, 2020, 06:50 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
A new draft revision of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available at https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html. This addresses the CWG feedback provided during the March 23rd, 2020 core issues processing teleconference.
Wording review feedback prior to the next Core issues processing teleconference would be much appreciated!
I really like the overall direction, a few comments:- Can we not make conditionally supported escape sequences part of the grammar?
What problem would that solve?
What I would do:simple-escape-sequence:
any member of the basic source character set other than u, U, x, and the members of octal-digit
And in 5.13, keepEscape sequences not listed in Table 9 are conditionally supported, with implementation-defined semantics
The ability for a conditional escape sequence to specify a code unit sequence was discussed during the 2020-03-23 issues processing telecon. Since such sequences are implementation-defined anyway, I don't know of any reason to prohibit them expanding to multiple code units. For sequences that specify a character, whether a single code unit is encoded or multiple are should be determined by the character encoding. If we want to enforce such a restriction, I think it belongs in [lex.charset]p3 (I thought we already had normative wording that requires members of the basic source character set be encoded as a single code unit, but I don't see it now).
I would also keepAn escape sequence specifies a single fcode unit.
Stateful encodings were discussed in the 2020-03-23 issues processing telecon.
- Can we not add notes for stateful encodings? It doesn't add anything.
Giving these odd literals a name was suggested by Core. I agree with their suggested direction; giving them a name makes it easier to discuss and define them.- Wide multi character literals were not a thing, let's not make them one now. same for conditional character literals and conditional wide character literals.
Instead, please add text in (Z) to describe them?
ie:
-ordinary and wide characters literal consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name that specifies a character that either lacks representation in the associated character encoding or that cannot be encoded as a single code unitare conditionally supported and have an implementation-defined value- A wide character literal consisting of multiple c-chars is conditionally-supported and has an implementation-defined value.
The intent is to make it clear that these sequences are encoded as a group. This is necessary for stateful encodings with SI/SO characters since such characters don't necessarily contribute a code unit sequence on their own. This was also requested during the 2020-03-23 issues processing telecon.
Please changeThe sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
To
Each basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
That doesn't seem correct to me; the wording needs to indicate which character encoding. Note that there are three occurrences of "applicable associated character encoding"; I'm not sure which use you were referring to.
- please replace applicable character encoding by character encoding
- not sure replacing `\0` by null character is an improvement
It avoids a correction to state something like, "a '\0', L\'0',
u8'\0', u'\0', or U'\0' is appended ...". [lex.charset]p3
defines null character (though the definition there isn't
perfect either, I think it is an improvement).
Tom.
Corentin
--
Tom.
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16