C++ Logo

sg16

Advanced search

Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 29 Oct 2021 15:30:18 +0200
On Fri, Oct 29, 2021 at 3:15 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 29/10/2021 11.45, Corentin wrote:
> >
> >
> > On Fri, Oct 29, 2021 at 11:25 AM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]>> wrote:
> >
> > On 29/10/2021 10.13, Corentin wrote:
> > >
> > >
> > > On Fri, Oct 29, 2021 at 9:52 AM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>>> wrote:
> > >
> > > On 29/10/2021 04.53, Hubert Tong via SG16 wrote:
> > > > Thanks Corentin for the paper. I hope this feedback helps
> the discussion.
> > > >
> > > > With respect to the contents of multicharacter literals, the
> paper does not give much motivation for disallowing numeric escape
> sequences which fit within a single unsigned char. Also, the wording says
> "shall be a member of the basic literal character set": this property of
> "being" is rather ambiguous in terms of authorial intent regarding the
> treatment of UCNs, etc. that designate members of the basic literal
> character set (a name for something is usually not the same as the thing it
> names).
> > >
> > > I agree that the restriction
> > >
> > > "Each c-char in a multicharacter literal shall be a member of
> the basic literal character set."
> > >
> > >
> > > The goal is to avoid 'é' - which might be 2 code units - hence a
> multicharacter literal.
> >
> > If that isn't in the prose text, it should appear there.
> >
> > > - Change
> > > > If a character lacks representation in the associated
> character encoding, then the string-literal is ill-formed.
> > > To
> > > > If a character lacks representation in the associated
> character encoding <ins>or is not representable as a single code
> unit</ins>, then the string-literal is ill-formed.
> >
> > That change sounds good to me.
> >
> >
> > Actually never mind, this is for string literals, so this makes no sense.
> > I haven't had my coffee yet.
> >
> > I think we want something along the line of:
> >
> > A multicharacter literal shall not have an encoding prefix. Each
> character represented by a basic-c-char or a universal-character-name in a
> multicharacter literal
> > shall be encodable as a single code unit in the narrow literal encoding.
> >
> >
> > https://isocpp.org/files/papers/D1854R2.pdf <
> https://isocpp.org/files/papers/D1854R2.pdf>
>
> The prose text still says
>
> "Instead, we propose that multicharacters literals can only contain
> characters from
> the basic-literal character sets."
>
> Also, in the normative text, "narrow literal encoding" -> "ordinary
> literal encoding".
>
> Also,
>
> "Each
> character represented by a basic-c-char or a universal-character-name in a
> multicharacter literal
> shall be encodable as a single code unit in the narrow literal encoding."
>
> excludes simple-escape-sequences, which seems unexpected.
>
> I think we just want to say "If a character literal contains a c-char
> other than a numeric-escape-sequence
> that cannot be encoded as a single code unit in the ordinary literal
> encoding, the program is ill-formed."
>

Agreed

>
> Jens
>

Received on 2021-10-29 08:30:32