sg16: Re: [SG16] Handling literals throughout the translation phases

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 4 Jan 2021 17:28:41 -0500

On 1/4/21 9:54 AM, Peter Brett via SG16 wrote:
> Please could someone remind me of the *downsides* of allowing escape sequences to be synthesized into string literals through pre-processor concatenation?

The only way to end a hexadecimal escape sequence is with the end of the
string literal, or a character other than a hex digit (a-f, A-F, or
0-9). If concatentation was performed before recognition of escape
sequences, then encoding any of the hex digits following a hexadecimal
escape sequence would require specifying them using an escape sequence.
Similar concerns exist for octal escape sequences, but could be avoided
by always using a maximal length octal escape sequence.

Tom.

>
> Many thanks,
>
> Peter
>
>> -----Original Message-----
>> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via SG16
>> Sent: 19 December 2020 22:45
>> To: Corentin Jabot <corentinjabot_at_[hidden]>; SG16 <sg16_at_[hidden]>
>> Cc: Jens Maurer <Jens.Maurer_at_[hidden]>
>> Subject: Re: [SG16] Handling literals throughout the translation phases
>>
>> EXTERNAL MAIL
>>
>>
>> On 18/12/2020 10.33, Corentin Jabot wrote:
>>> On Thu, Dec 17, 2020, 22:33 Jens Maurer via SG16 <sg16_at_[hidden]
>> <mailto:sg16_at_[hidden]>> wrote:
>>>
>>> I'm working on a paper that switches C++ to a modified "model B"
>> approach for
>>> universal-character-names as described in the C99 Rationale v5.10,
>> section 5.2.1.
>>>
>>> I thought sg16 agreed to not replace ucn until phase 5 a few meetings ago,
>> did I completely missunderstood what sg16 agreed ?
>>
>> The difference is that we do not produce UCNs is phase 1.
>> Instead, phase 1 simply produces Unicode scalar values.
>> Any UCNs that appeared in the original source are replaced later.
>>
>>> My current idea is to focus on the creation of the string literal
>>> object; that's when transcoding to execution (literal) encoding
>>> happens. All other uses of string-literals don't produce objects,
>>> so aren't transcoded.
>>>
>>> In order to be able to interpret escape-sequences in phase 5/6,
>>> we need a "tunnel" for numeric-escape-sequences. One idea would
>>> be to add "code unit characters" to the translation character set,
>>> where each such character represents a code unit coming from a
>>> numeric-escape-sequence. The sole purpose is to keep the
>>> code units safe until we produce the initializer for the
>>> string literal object.
>>>
>>> The alternative would be to delay all interpretation of escape-
>>> sequences to when we produce the initializer for the string
>>> literal object, but that also means we need to delay string
>>> literal concatenation until that time (see first item above).
>>>
>>>
>>> Would that cause any issue? This would otherwise be my preferred solution!
>> We currently support operator "" "" "" in [over.literal], for example.
>> We'd need to make string-literal concatenation first-class citizens
>> in phase 7 (e.g. making it a constant expression or so), which is a fairly
>> large hammer.
>>
>> Jens
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
>> 16__;!!EHscmS1ygiU1lA!UD-
>> 5R2q135Y6KFqLCSPTdN4MoF1skMz9Clm4f_oANDvBoEzgrct6vMkc9NQQMw$

Received on 2021-01-04 16:28:44