C++ Logo

sg16

Advanced search

Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 29 Oct 2021 15:15:03 +0200
On 29/10/2021 11.45, Corentin wrote:
>
>
> On Fri, Oct 29, 2021 at 11:25 AM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 29/10/2021 10.13, Corentin wrote:
> >
> >
> > On Fri, Oct 29, 2021 at 9:52 AM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>>> wrote:
> >
> > On 29/10/2021 04.53, Hubert Tong via SG16 wrote:
> > > Thanks Corentin for the paper. I hope this feedback helps the discussion.
> > >
> > > With respect to the contents of multicharacter literals, the paper does not give much motivation for disallowing numeric escape sequences which fit within a single unsigned char. Also, the wording says "shall be a member of the basic literal character set": this property of "being" is rather ambiguous in terms of authorial intent regarding the treatment of UCNs, etc. that designate members of the basic literal character set (a name for something is usually not the same as the thing it names).
> >
> > I agree that the restriction
> >
> > "Each c-char in a multicharacter literal shall be a member of the basic literal character set."
> >
> >
> > The goal is to avoid 'é' - which might be 2 code units - hence a multicharacter literal.
>
> If that isn't in the prose text, it should appear there.
>
> > - Change
> > > If a character lacks representation in the associated character encoding, then the string-literal is ill-formed.
> > To
> > > If a character lacks representation in the associated character encoding <ins>or is not representable as a single code unit</ins>, then the string-literal is ill-formed.
>
> That change sounds good to me.
>
>
> Actually never mind, this is for string literals, so this makes no sense.
> I haven't had my coffee yet.
>
> I think we want something along the line of:
>
> A multicharacter literal shall not have an encoding prefix. Each character represented by a basic-c-char or a universal-character-name in a multicharacter literal
> shall be encodable as a single code unit in the narrow literal encoding.
>
>
> https://isocpp.org/files/papers/D1854R2.pdf <https://isocpp.org/files/papers/D1854R2.pdf>

The prose text still says

"Instead, we propose that multicharacters literals can only contain characters from
the basic-literal character sets."

Also, in the normative text, "narrow literal encoding" -> "ordinary literal encoding".

Also,

"Each
character represented by a basic-c-char or a universal-character-name in a multicharacter literal
shall be encodable as a single code unit in the narrow literal encoding."

excludes simple-escape-sequences, which seems unexpected.

I think we just want to say "If a character literal contains a c-char other than a numeric-escape-sequence
that cannot be encoded as a single code unit in the ordinary literal encoding, the program is ill-formed."

Jens

Received on 2021-10-29 08:15:12