On Sat, Nov 6, 2021 at 3:05 AM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

The current R2 draft has this:
A multicharacter literal shall not have an encoding prefix. Each character represented by a basic-c-char or a universal-character-name in a multicharacter literal shall be encodable as a single code unit in the narrow literal encoding.

The above does not provide a restriction on conditional-escape-sequences and numeric-escape-sequences in multicharacter literals. We presumably only want to allow ones that are valid as the sole c-char in a character-literal with no encoding prefix. Indeed, that general description may be sufficient for all forms of c-char.

Why should it?

My only goal is to forbid multi characters literals visually indistinguishable from single character literals, in scenarios where multiple codepoints results in a single glyph.

Given the implementation-defined nature of multi characters, I do not think adding further restrictions on numeric-escape-sequences has any value in this scenario. What would be the gain / pitfall avoided by further restriction?

Also, the title of the paper is not particularly helpful in terms of indicating what it proposes. I think something like "Support only straightforward multicharacter literals and encodable string literals" would be better.

-- HT