sg16: Re: [SG16] Feedback on P1854: Conversion to literal encoding should not lead to loss of meaning

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 29 Oct 2021 11:25:25 +0200

On 29/10/2021 10.13, Corentin wrote:
>
>
> On Fri, Oct 29, 2021 at 9:52 AM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 29/10/2021 04.53, Hubert Tong via SG16 wrote:
> > Thanks Corentin for the paper. I hope this feedback helps the discussion.
> >
> > With respect to the contents of multicharacter literals, the paper does not give much motivation for disallowing numeric escape sequences which fit within a single unsigned char. Also, the wording says "shall be a member of the basic literal character set": this property of "being" is rather ambiguous in terms of authorial intent regarding the treatment of UCNs, etc. that designate members of the basic literal character set (a name for something is usually not the same as the thing it names).
>
> I agree that the restriction
>
> "Each c-char in a multicharacter literal shall be a member of the basic literal character set."
>
>
> The goal is to avoid 'é' - which might be 2 code units - hence a multicharacter literal.

If that isn't in the prose text, it should appear there.

> - Change
> > If a character lacks representation in the associated character encoding, then the string-literal is ill-formed.
> To
> > If a character lacks representation in the associated character encoding <ins>or is not representable as a single code unit</ins>, then the string-literal is ill-formed.

That change sounds good to me.

> (This works because all combining diacritics are represented in more than a code unit in all encodings I'm aware of.)

That's not really relevant. Also, ISO-8859-1 does represent 'é' as a single code unit,
and that should be a totally fine single-character literal in this encoding.

Jens

Received on 2021-10-29 04:25:30