On 6/28/20 2:03 AM, Corentin Jabot wrote:

On Sun, 28 Jun 2020 at 07:37, Corentin Jabot <corentinjabot@gmail.com> wrote:

On Sun, Jun 28, 2020, 06:50 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

A new draft revision of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available at https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html. This addresses the CWG feedback provided during the March 23rd, 2020 core issues processing teleconference.

Wording review feedback prior to the next Core issues processing teleconference would be much appreciated!

I really like the overall direction, a few comments:

- Can we not make conditionally supported escape sequences part of the grammar?

This was requested by Core in the 2020-01-16 issues processing telecon.

What I would do:

simple-escape-sequence:

any member of the basic source character set other than u, U, x, and the members of octal-digit

And in 5.13, keep

Escape sequences not listed in Table 9 are conditionally supported, with implementation-defined semantics

What problem would that solve?

I would also keep

An escape sequence specifies a single fcode unit.

The ability for a conditional escape sequence to specify a code unit sequence was discussed during the 2020-03-23 issues processing telecon. Since such sequences are implementation-defined anyway, I don't know of any reason to prohibit them expanding to multiple code units. For sequences that specify a character, whether a single code unit is encoded or multiple are should be determined by the character encoding. If we want to enforce such a restriction, I think it belongs in [lex.charset]p3 (I thought we already had normative wording that requires members of the basic source character set be encoded as a single code unit, but I don't see it now).

- Can we not add notes for stateful encodings? It doesn't add anything.

Stateful encodings were discussed in the 2020-03-23 issues processing telecon.

- Wide multi character literals were not a thing, let's not make them one now. same for conditional character literals and conditional wide character literals.

Instead, please add text in (Z) to describe them?

ie:

-ordinary and wide characters literal consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name that specifies a character that either lacks representation in the associated character encoding or that cannot be encoded as a single code unit

are conditionally supported and have an implementation-defined value

- A wide character literal consisting of multiple c-chars is conditionally-supported and has an implementation-defined value.

Giving these odd literals a name was suggested by Core. I agree with their suggested direction; giving them a name makes it easier to discuss and define them.

Please change
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
To
Each basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence

The intent is to make it clear that these sequences are encoded as a group. This is necessary for stateful encodings with SI/SO characters since such characters don't necessarily contribute a code unit sequence on their own. This was also requested during the 2020-03-23 issues processing telecon.

- please replace applicable character encoding by character encoding

That doesn't seem correct to me; the wording needs to indicate which character encoding. Note that there are three occurrences of "applicable associated character encoding"; I'm not sure which use you were referring to.

- not sure replacing `\0` by null character is an improvement

It avoids a correction to state something like, "a '\0', L\'0', u8'\0', u'\0', or U'\0' is appended ...". [lex.charset]p3 defines null character (though the definition there isn't perfect either, I think it is an improvement).

Tom.

Corentin

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16