Subject: Re: Concatenating unicode string literals
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-07-09 15:33:04
On Thu, 9 Jul 2020 at 22:16, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 09/07/2020 22.12, Corentin Jabot wrote:
> > On Thu, Jul 9, 2020, 21:44 Tom Honermann via SG16 <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>> wrote:
> > On 7/9/20 3:16 PM, Jens Maurer wrote:
> > > On 09/07/2020 18.28, Tom Honermann wrote:
> > >> On 7/8/20 3:15 PM, Jens Maurer wrote:
> > >>> Since all four well-known C++ implementations appear to
> > >>> produce an error for the test cases at
> > >>> https://compiler-explorer.com/z/4NDo-4
> > >>> I'm fine with specifying these as ill-formed.
> > >> I'm fine with that as well.
> > >>
> > >> Jens, would you consider such a change as evolutionary given that
> we don't know of any implementations (so far) that actually support these
> > > I'm not the one to make the call here.
> > I know, I was just looking for an opinion from a CWG regular. Thank
> > > Strictly speaking, it changes the standard for some feature from
> > > "conditionally-supported" to "ill-formed", which does sound a bit
> > > evolutionary, in particular since we depart a little further from
> > > C here.
> > >
> > > However, personally, I'm ok with this going to Core right away.
> > >
> > > JF should make the call here.
> > Agreed.
> > We don't have a paper for this yet. If we have a volunteer to write
> > paper to make concatenations involving mixed L"", u8"", u"", and U""
> > concatenations ill-formed, I'll be happy to discuss with JF with
> > encouragement to take it straight to Core.
> > There is one
> If we want to maintain the option of going straight to Core,
> we can't mix this isolated issue with anything else that
> might be more controversial.
There is no urgency though, no point even trying until we can word against
> > and as Jens said we can't do the wording for that right now.
> > The wording paper should also make sure that the order of operations is
> > ( Replacement of escape sequences, concatenation, encoding)
> Unfortunately, it's not that easy, because numeric-escape-sequences
> produce individual code units, not to-be-encoded characters.
Indeed. Very good point
Let me try that again:
They should actually be encoded separately, but the encoding is determined
And then concatenated, and then a null-terminator added.
(that still has the issue of potentially generated unnecessary shift-state,
but i think that doesn't need to be dealt with normatively.
SG16 list run by firstname.lastname@example.org