sg16: Re: [SG16] Handling literals throughout the translation phases

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Tue, 19 Jan 2021 16:49:25 +0100

Damn, I finally see the issue.
Terribly sorry it took this long
Which leads me to think that the current order of operation is a better
place to be in, unless we find a better mechanism

I think that the status quo in terms of observable behavior pertaining to
escape sequences is correct
I don't feel so good about the idea of introducing weird wording hacks such
as more abstract characters to achieve that behavior while swapping
operations.

I think we already decided that in phase 5, each character is encoded
individually, and in any case there cannot be partial code unit sequences
anywhere in each string.
Therefore, maybe the current order of operation makes sense as it cannot be
observed.

One of the issues we had was stateful encodings; I am of the mind that
this can be left in the realm of implementation discretion and that there
seems to be limited value in the standard specifying a behavior there.
Seems to be working fine currently.

TL;DR: I feel like I have been wrong

On Mon, Jan 4, 2021 at 11:42 PM Steve Downey via SG16 <sg16_at_[hidden]>
wrote:

> Also, if the change in behavior happens anywhere, it's likely to be in
> code that is nearly impossible to fix, because someone is doing something
> legal but terrible with the preprocessor.
>
> On Mon, Jan 4, 2021 at 5:28 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On 1/4/21 9:54 AM, Peter Brett via SG16 wrote:
>> > Please could someone remind me of the *downsides* of allowing escape
>> sequences to be synthesized into string literals through pre-processor
>> concatenation?
>>
>> The only way to end a hexadecimal escape sequence is with the end of the
>> string literal, or a character other than a hex digit (a-f, A-F, or
>> 0-9). If concatentation was performed before recognition of escape
>> sequences, then encoding any of the hex digits following a hexadecimal
>> escape sequence would require specifying them using an escape sequence.
>> Similar concerns exist for octal escape sequences, but could be avoided
>> by always using a maximal length octal escape sequence.
>>
>> Tom.
>>
>> >
>> > Many thanks,
>> >
>> > Peter
>> >
>> >> -----Original Message-----
>> >> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer
>> via SG16
>> >> Sent: 19 December 2020 22:45
>> >> To: Corentin Jabot <corentinjabot_at_[hidden]>; SG16 <
>> sg16_at_[hidden]>
>> >> Cc: Jens Maurer <Jens.Maurer_at_[hidden]>
>> >> Subject: Re: [SG16] Handling literals throughout the translation phases
>> >>
>> >> EXTERNAL MAIL
>> >>
>> >>
>> >> On 18/12/2020 10.33, Corentin Jabot wrote:
>> >>> On Thu, Dec 17, 2020, 22:33 Jens Maurer via SG16 <
>> sg16_at_[hidden]
>> >> <mailto:sg16_at_[hidden]>> wrote:
>> >>>
>> >>> I'm working on a paper that switches C++ to a modified "model B"
>> >> approach for
>> >>> universal-character-names as described in the C99 Rationale
>> v5.10,
>> >> section 5.2.1.
>> >>>
>> >>> I thought sg16 agreed to not replace ucn until phase 5 a few meetings
>> ago,
>> >> did I completely missunderstood what sg16 agreed ?
>> >>
>> >> The difference is that we do not produce UCNs is phase 1.
>> >> Instead, phase 1 simply produces Unicode scalar values.
>> >> Any UCNs that appeared in the original source are replaced later.
>> >>
>> >>> My current idea is to focus on the creation of the string literal
>> >>> object; that's when transcoding to execution (literal) encoding
>> >>> happens. All other uses of string-literals don't produce objects,
>> >>> so aren't transcoded.
>> >>>
>> >>> In order to be able to interpret escape-sequences in phase 5/6,
>> >>> we need a "tunnel" for numeric-escape-sequences. One idea would
>> >>> be to add "code unit characters" to the translation character
>> set,
>> >>> where each such character represents a code unit coming from a
>> >>> numeric-escape-sequence. The sole purpose is to keep the
>> >>> code units safe until we produce the initializer for the
>> >>> string literal object.
>> >>>
>> >>> The alternative would be to delay all interpretation of escape-
>> >>> sequences to when we produce the initializer for the string
>> >>> literal object, but that also means we need to delay string
>> >>> literal concatenation until that time (see first item above).
>> >>>
>> >>>
>> >>> Would that cause any issue? This would otherwise be my preferred
>> solution!
>> >> We currently support operator "" "" "" in [over.literal], for
>> example.
>> >> We'd need to make string-literal concatenation first-class citizens
>> >> in phase 7 (e.g. making it a constant expression or so), which is a
>> fairly
>> >> large hammer.
>> >>
>> >> Jens
>> >>
>> >>
>> >> --
>> >> SG16 mailing list
>> >> SG16_at_[hidden]
>> >>
>> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
>> >> 16__;!!EHscmS1ygiU1lA!UD-
>> >> 5R2q135Y6KFqLCSPTdN4MoF1skMz9Clm4f_oANDvBoEzgrct6vMkc9NQQMw$
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-01-19 09:49:38