sg16: Re: [SG16] P2194R0 The character set of C++ source code is Unicode

From: Steve Downey <sdowney_at_[hidden]>
Date: Mon, 24 Aug 2020 13:13:25 -0400

A nit, but the encoding of literals is unspecified, rather than
implementation defined. The same compiler might do different things.
MSVC used to even have a #pragma to change it, although I think it
affected the whole translation unit.

On Mon, Aug 24, 2020 at 12:32 PM Peter Brett via SG16
<sg16_at_[hidden]> wrote:
>
> Hi Alisdair,
>
> Thank you for the feedback. That's a very good suggestion, thank you. It ties into the suggested change to processing of UCNs that we've discussed a few times.
>
> When you have a u8"" literal, the associated literal encoding is UTF-8. When you have a 'plain' "" string literal, the associated literal encoding is implementation-defined.
>
> Best regards,
>
> Peter
>
> > -----Original Message-----
> > From: Alisdair Meredith <alisdairm_at_[hidden]>
> > Sent: 24 August 2020 17:29
> > To: SG16 <sg16_at_[hidden]>
> > Cc: Peter Brett <pbrett_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
> > Subject: Re: [SG16] P2194R0 The character set of C++ source code is Unicode
> >
> > EXTERNAL MAIL
> >
> >
> > Minor suggestion on the wording,
> >
> > You strike the mapping of non-basic source code characters to
> > universal-character-name, including the cross-reference to such
> > mappings reverting in raw string literals (5.4). I suggest making
> > a matching edit to strike the reference in (5.4)p3 as well, so that
> > the only thing reverted is line splicing in phase 2.
> >
> > That said, with these changes, I am curious what the difference
> > is between a u8 string literal and a plain ‘char’ string literal, as
> > the contents of that literal are now going to be unicode source
> > Text (rather than requesting a mapping from source to unicode
> > of literal’s contents)?
> >
> > AlisdairM
> >
> > > On Aug 24, 2020, at 08:31, Peter Brett via SG16 <sg16_at_[hidden]>
> > wrote:
> > >
> > > Hi all,
> > >
> > > In this week's meeting, we are going to discuss the remaining
> > > proposals from P2178R1 "Misc lexing and string handling improvements".
> > > In particular, we will discuss proposal 9:
> > >
> > > Proposal 9: Reaffirming Unicode as the character set of the
> > > internal representation
> > >
> > > In anticipation of a lively discussion, Corentin and I have written a
> > > short new paper which will be appearing in the September mailing.
> > >
> > > P2194R0 The character set of C++ source code is Unicode
> > >
> > https://urldefense.com/v3/__https://isocpp.org/files/papers/P2194R0.pdf__;!!
> > EHscmS1ygiU1lA!WEw_cTYDWjEYbwMusvXFTtvDdDjE3jRwp1m4_TAlO-8sXXE-
> > 55f2FH74uxdpLQ$
> > >
> > > We hope that the study group finds this contribution helpful and
> > > informative.
> > >
> > > Best regards,
> > >
> > > Peter
> > >
> > > --
> > > SG16 mailing list
> > > SG16_at_[hidden]
> > >
> > https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
> > 16__;!!EHscmS1ygiU1lA!WEw_cTYDWjEYbwMusvXFTtvDdDjE3jRwp1m4_TAlO-8sXXE-
> > 55f2FH7Fxs6f2w$
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-08-24 12:17:07