On Sat, Nov 6, 2021 at 4:07 PM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 06/11/2021 16.22, Hubert Tong via SG16 wrote:
> Anyhow, if the intent really is to help only with the visual ambiguity problem, then it would be more consistent to allow /universal-character-name/s that encode to more than one code unit in multicharacter literals (because it's in a multicharacter literal already).

If we use a UCN, we have no source code visual ambiguity
(because a UCN is expressed in basic characters).
Is that a correct understanding of the situation / motivation?

Yes.
 
I can't connect your parenthetical remark to that.

The UCN does not itself contribute to the visual ambiguity of the character literal as being a single c-char.
 

> With a focus on the visual ambiguity problem (thanks for reminding), the previous wording to limit /basic-c-char/s to the basic character set is more capable because lots of Unicode display shenanigans will get through the current formulation if the ordinary literal encoding is UCS-2 or UTF-16 (which is possible if CHAR_BIT is large enough).

Do we have sufficient implementation experience / understanding of
existing practice to estimate how much code will break if we
restrict multi-character literals to the basic character set?
(Note that neither @ or $ are in the basic character set.)

(I'm all for restricting multi-character literals as much as possible,
but we should probably avoid stepping on people's toes for non-portable
features that don't really hurt anyone.)

We could just restrict "problematic" Unicode characters?
 

Jens