sg16: Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sat, 6 Nov 2021 18:21:16 -0400

On Sat, Nov 6, 2021 at 4:07 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 06/11/2021 16.22, Hubert Tong via SG16 wrote:
> > Anyhow, if the intent really is to help only with the visual ambiguity
> problem, then it would be more consistent to allow
> /universal-character-name/s that encode to more than one code unit in
> multicharacter literals (because it's in a multicharacter literal already).
>
> If we use a UCN, we have no source code visual ambiguity
> (because a UCN is expressed in basic characters).
> Is that a correct understanding of the situation / motivation?
>

Yes.

> I can't connect your parenthetical remark to that.
>

The UCN does not itself contribute to the visual ambiguity of the character
literal as being a single *c-char*.

>
> > With a focus on the visual ambiguity problem (thanks for reminding), the
> previous wording to limit /basic-c-char/s to the basic character set is
> more capable because lots of Unicode display shenanigans will get through
> the current formulation if the ordinary literal encoding is UCS-2 or UTF-16
> (which is possible if CHAR_BIT is large enough).
>
> Do we have sufficient implementation experience / understanding of
> existing practice to estimate how much code will break if we
> restrict multi-character literals to the basic character set?
> (Note that neither @ or $ are in the basic character set.)
>
> (I'm all for restricting multi-character literals as much as possible,
> but we should probably avoid stepping on people's toes for non-portable
> features that don't really hurt anyone.)
>

We could just restrict "problematic" Unicode characters?

>
> Jens
>

Received on 2021-11-06 17:21:47