On 6/28/20 2:03 AM, Corentin Jabot wrote:


On Sun, 28 Jun 2020 at 07:37, Corentin Jabot <corentinjabot@gmail.com> wrote:


On Sun, Jun 28, 2020, 06:50 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

A new draft revision of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available at https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html.  This addresses the CWG feedback provided during the March 23rd, 2020 core issues processing teleconference.

Wording review feedback prior to the next Core issues processing teleconference would be much appreciated!

I really like the overall direction, a few comments:
- Can we not make conditionally supported escape sequences part of the grammar?
This was requested by Core in the 2020-01-16 issues processing telecon.

What I would do:
simple-escape-sequence:
    any member of the basic source character set other than u, U, x, and the members of octal-digit

And in 5.13, keep 
Escape sequences not listed in Table 9 are conditionally supported, with implementation-defined semantics
What problem would that solve?

I would also keep 
An escape sequence specifies a single fcode unit.
The ability for a conditional escape sequence to specify a code unit sequence was discussed during the 2020-03-23 issues processing telecon.  Since such sequences are implementation-defined anyway, I don't know of any reason to prohibit them expanding to multiple code units.  For sequences that specify a character, whether a single code unit is encoded or multiple are should be determined by the character encoding.  If we want to enforce such a restriction, I think it belongs in [lex.charset]p3 (I thought we already had normative wording that requires members of the basic source character set be encoded as a single code unit, but I don't see it now).



 
- Can we not add notes for stateful encodings? It doesn't add anything.
Stateful encodings were discussed in the 2020-03-23 issues processing telecon
- Wide multi character literals were not a thing, let's not make them one now. same for  conditional character literals and conditional wide character literals.

Instead, please add text in (Z) to describe them?
ie:

-ordinary and wide characters literal consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name that specifies a character that either lacks representation in the associated character encoding or that cannot be encoded as a single code unit
are conditionally supported and have an implementation-defined value
- A wide character literal consisting of multiple c-chars is conditionally-supported and has an implementation-defined value.
Giving these odd literals a name was suggested by Core.  I agree with their suggested direction; giving them a name makes it easier to discuss and define them.


Please change 
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
To
Each basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
The intent is to make it clear that these sequences are encoded as a group.  This is necessary for stateful encodings with SI/SO characters since such characters don't necessarily contribute a code unit sequence on their own.  This was also requested during the 2020-03-23 issues processing telecon.



- please replace applicable character encoding by character encoding
That doesn't seem correct to me; the wording needs to indicate which character encoding.  Note that there are three occurrences of "applicable associated character encoding"; I'm not sure which use you were referring to.
- not sure replacing `\0` by null character is an improvement

It avoids a correction to state something like, "a '\0', L\'0', u8'\0', u'\0', or U'\0' is appended ...".  [lex.charset]p3 defines null character (though the definition there isn't perfect either, I think it is an improvement).

Tom.