Subject: Re: Handling literals throughout the translation phases
From: Steve Downey (sdowney_at_[hidden])
Date: 2021-01-04 16:42:16
Also, if the change in behavior happens anywhere, it's likely to be in code
that is nearly impossible to fix, because someone is doing something legal
but terrible with the preprocessor.
On Mon, Jan 4, 2021 at 5:28 PM Tom Honermann via SG16 <sg16_at_[hidden]>
> On 1/4/21 9:54 AM, Peter Brett via SG16 wrote:
> > Please could someone remind me of the *downsides* of allowing escape
> sequences to be synthesized into string literals through pre-processor
> The only way to end a hexadecimal escape sequence is with the end of the
> string literal, or a character other than a hex digit (a-f, A-F, or
> 0-9). If concatentation was performed before recognition of escape
> sequences, then encoding any of the hex digits following a hexadecimal
> escape sequence would require specifying them using an escape sequence.
> Similar concerns exist for octal escape sequences, but could be avoided
> by always using a maximal length octal escape sequence.
> > Many thanks,
> > Peter
> >> -----Original Message-----
> >> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer
> via SG16
> >> Sent: 19 December 2020 22:45
> >> To: Corentin Jabot <corentinjabot_at_[hidden]>; SG16 <
> >> Cc: Jens Maurer <Jens.Maurer_at_[hidden]>
> >> Subject: Re: [SG16] Handling literals throughout the translation phases
> >> EXTERNAL MAIL
> >> On 18/12/2020 10.33, Corentin Jabot wrote:
> >>> On Thu, Dec 17, 2020, 22:33 Jens Maurer via SG16 <
> >> <mailto:sg16_at_[hidden]>> wrote:
> >>> I'm working on a paper that switches C++ to a modified "model B"
> >> approach for
> >>> universal-character-names as described in the C99 Rationale v5.10,
> >> section 5.2.1.
> >>> I thought sg16 agreed to not replace ucn until phase 5 a few meetings
> >> did I completely missunderstood what sg16 agreed ?
> >> The difference is that we do not produce UCNs is phase 1.
> >> Instead, phase 1 simply produces Unicode scalar values.
> >> Any UCNs that appeared in the original source are replaced later.
> >>> My current idea is to focus on the creation of the string literal
> >>> object; that's when transcoding to execution (literal) encoding
> >>> happens. All other uses of string-literals don't produce objects,
> >>> so aren't transcoded.
> >>> In order to be able to interpret escape-sequences in phase 5/6,
> >>> we need a "tunnel" for numeric-escape-sequences. One idea would
> >>> be to add "code unit characters" to the translation character set,
> >>> where each such character represents a code unit coming from a
> >>> numeric-escape-sequence. The sole purpose is to keep the
> >>> code units safe until we produce the initializer for the
> >>> string literal object.
> >>> The alternative would be to delay all interpretation of escape-
> >>> sequences to when we produce the initializer for the string
> >>> literal object, but that also means we need to delay string
> >>> literal concatenation until that time (see first item above).
> >>> Would that cause any issue? This would otherwise be my preferred
> >> We currently support operator "" "" "" in [over.literal], for
> >> We'd need to make string-literal concatenation first-class citizens
> >> in phase 7 (e.g. making it a constant expression or so), which is a
> >> large hammer.
> >> Jens
> >> --
> >> SG16 mailing list
> >> SG16_at_[hidden]
> >> 16__;!!EHscmS1ygiU1lA!UD-
> >> 5R2q135Y6KFqLCSPTdN4MoF1skMz9Clm4f_oANDvBoEzgrct6vMkc9NQQMw$
> SG16 mailing list
SG16 list run by email@example.com