sg16: Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 29 Oct 2021 11:45:42 +0200

On Fri, Oct 29, 2021 at 11:25 AM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 29/10/2021 10.13, Corentin wrote:
> >
> >
> > On Fri, Oct 29, 2021 at 9:52 AM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]>> wrote:
> >
> > On 29/10/2021 04.53, Hubert Tong via SG16 wrote:
> > > Thanks Corentin for the paper. I hope this feedback helps the
> discussion.
> > >
> > > With respect to the contents of multicharacter literals, the paper
> does not give much motivation for disallowing numeric escape sequences
> which fit within a single unsigned char. Also, the wording says "shall be a
> member of the basic literal character set": this property of "being" is
> rather ambiguous in terms of authorial intent regarding the treatment of
> UCNs, etc. that designate members of the basic literal character set (a
> name for something is usually not the same as the thing it names).
> >
> > I agree that the restriction
> >
> > "Each c-char in a multicharacter literal shall be a member of the
> basic literal character set."
> >
> >
> > The goal is to avoid 'é' - which might be 2 code units - hence a
> multicharacter literal.
>
> If that isn't in the prose text, it should appear there.
>
> > - Change
> > > If a character lacks representation in the associated character
> encoding, then the string-literal is ill-formed.
> > To
> > > If a character lacks representation in the associated character
> encoding <ins>or is not representable as a single code unit</ins>, then the
> string-literal is ill-formed.
>
> That change sounds good to me.
>

Actually never mind, this is for string literals, so this makes no sense.
I haven't had my coffee yet.

I think we want something along the line of:

A multicharacter literal shall not have an encoding prefix. Each character
represented by a basic-c-char or a universal-character-name in a
multicharacter literal
shall be encodable as a single code unit in the narrow literal encoding.

https://isocpp.org/files/papers/D1854R2.pdf

>
> > (This works because all combining diacritics are represented in more
> than a code unit in all encodings I'm aware of.)
>
> That's not really relevant. Also, ISO-8859-1 does represent 'é' as a
> single code unit,
> and that should be a totally fine single-character literal in this
> encoding.
>
> Jens
>
>

Received on 2021-10-29 04:45:55