sg16: Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sun, 7 Nov 2021 10:24:53 -0500

On Sun, Nov 7, 2021 at 8:55 AM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 06/11/2021 23.21, Hubert Tong wrote:
> > On Sat, Nov 6, 2021 at 4:07 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>> wrote:
> >
> > On 06/11/2021 16.22, Hubert Tong via SG16 wrote:
> > > Anyhow, if the intent really is to help only with the visual
> ambiguity problem, then it would be more consistent to allow
> /universal-character-name/s that encode to more than one code unit in
> multicharacter literals (because it's in a multicharacter literal already).
> >
> > If we use a UCN, we have no source code visual ambiguity
> > (because a UCN is expressed in basic characters).
> > Is that a correct understanding of the situation / motivation?
> >
> >
> > Yes.
> >
> >
> > I can't connect your parenthetical remark to that.
> >
> >
> > The UCN does not itself contribute to the visual ambiguity of the
> character literal as being a single /c-char/.
> >
> >
> >
> > > With a focus on the visual ambiguity problem (thanks for
> reminding), the previous wording to limit /basic-c-char/s to the basic
> character set is more capable because lots of Unicode display shenanigans
> will get through the current formulation if the ordinary literal encoding
> is UCS-2 or UTF-16 (which is possible if CHAR_BIT is large enough).
> >
> > Do we have sufficient implementation experience / understanding of
> > existing practice to estimate how much code will break if we
> > restrict multi-character literals to the basic character set?
> > (Note that neither @ or $ are in the basic character set.)
> >
> > (I'm all for restricting multi-character literals as much as
> possible,
> > but we should probably avoid stepping on people's toes for
> non-portable
> > features that don't really hurt anyone.)
> >
> >
> > We could just restrict "problematic" Unicode characters?
>
> Those are ones that take more than one code unit, I presume?
>

I meant the ones that don't display. After all, the code units may be the
ones of the UTF-16 or UTF-32 encoding form.

>
> Jens
>
>

Received on 2021-11-07 09:25:25