C++ Logo

sg16

Advanced search

Re: [SG16] Concatenating unicode string literals

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 9 Jul 2020 22:33:04 +0200
On Thu, 9 Jul 2020 at 22:16, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 09/07/2020 22.12, Corentin Jabot wrote:
> >
> >
> > On Thu, Jul 9, 2020, 21:44 Tom Honermann via SG16 <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>> wrote:
> >
> > On 7/9/20 3:16 PM, Jens Maurer wrote:
> > > On 09/07/2020 18.28, Tom Honermann wrote:
> > >> On 7/8/20 3:15 PM, Jens Maurer wrote:
> > >>> Since all four well-known C++ implementations appear to
> > >>> produce an error for the test cases at
> > >>> https://compiler-explorer.com/z/4NDo-4
> > >>> I'm fine with specifying these as ill-formed.
> > >> I'm fine with that as well.
> > >>
> > >> Jens, would you consider such a change as evolutionary given that
> we don't know of any implementations (so far) that actually support these
> concatenations?
> > > I'm not the one to make the call here.
> > I know, I was just looking for an opinion from a CWG regular. Thank
> you.
> > > Strictly speaking, it changes the standard for some feature from
> > > "conditionally-supported" to "ill-formed", which does sound a bit
> > > evolutionary, in particular since we depart a little further from
> > > C here.
> > >
> > > However, personally, I'm ok with this going to Core right away.
> > >
> > > JF should make the call here.
> >
> > Agreed.
> >
> > We don't have a paper for this yet. If we have a volunteer to write
> a
> > paper to make concatenations involving mixed L"", u8"", u"", and U""
> > concatenations ill-formed, I'll be happy to discuss with JF with
> > encouragement to take it straight to Core.
> >
> >
> > There is one
>
> If we want to maintain the option of going straight to Core,
> we can't mix this isolated issue with anything else that
> might be more controversial.
>

True.
There is no urgency though, no point even trying until we can word against
P2029


>
> > and as Jens said we can't do the wording for that right now.
> > The wording paper should also make sure that the order of operations is
> correct.
> >
> > ( Replacement of escape sequences, concatenation, encoding)
>
> Unfortunately, it's not that easy, because numeric-escape-sequences
> produce individual code units, not to-be-encoded characters.
>

Indeed. Very good point
Let me try that again:

They should actually be encoded separately, but the encoding is determined
before.
And then concatenated, and then a null-terminator added.
(that still has the issue of potentially generated unnecessary shift-state,
but i think that doesn't need to be dealt with normatively.




>
> Jens
>
>

Received on 2020-07-09 15:36:30