I really like the overall direction, a few comments:
- Can we not make conditionally supported escape sequences part of the grammar?
- Can we not add notes for stateful encodings? It doesn't add anything.
- Wide multi character literals were not a thing, let's not make them one now. same for conditional character literals and conditional wide character literals.
Instead, please add text in (Z) to describe them?
ie:
-ordinary and wide characters literal consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name that specifies a character that either lacks representation in the associated character encoding or that cannot be encoded as a single code unit
are conditionally supported and have an implementation-defined value
- A wide character literal consisting of multiple c-chars is conditionally-supported and has an implementation-defined value.
Please change
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence
To
Each basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence