sg16: Re: [SG16] P2348: Feedback on r1 draft

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Thu, 9 Sep 2021 18:16:48 -0400

On Thu, Sep 9, 2021 at 5:32 PM Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Thu, Sep 9, 2021 at 10:55 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
>> On Thu, Sep 9, 2021 at 3:52 AM Corentin <corentin.jabot_at_[hidden]> wrote:
>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 1:30 AM Hubert Tong <
>>> hubert.reinterpretcast_at_[hidden]> wrote:
>>>
>>>> My thanks to the author for addressing the feedback given on r0. I now
>>>> have feedback (below) on r1.
>>>>
>>>> -- HT
>>>>
>>>> I believe the author is already aware that the comment grammar should
>>>> be corrected to indicate a sequence of characters within the comment.
>>>> I believe similarly with regards to objections to using the grammar
>>>> term in prose.
>>>>
>>>> As noted in the SG 16 meeting today, the paper removes the freedom for
>>>> an implementation to consider a preprocessing directive containing vertical
>>>> tab or form feed as being an error. This is a design decision that the
>>>> paper does not acknowledge as such. I believe that noting this change
>>>> clearly in the paper's design section is sufficient to remedy this.
>>>>
>>>> I have the following additional concerns about the wording:
>>>>
>>>> (1)
>>>> In [lex.phases],
>>>> "introducing new-line characters for end-of-line indicators"
>>>> is removed.
>>>>
>>>> I do not believe that this is helpful. The presumed intent remains that
>>>> implementations may introduce line-breaks of one form or another for
>>>> end-of-line indicators that do not correspond particularly to any "physical
>>>> source file character".
>>>>
>>>
>>> I do not believe the existing wording is helpful either.
>>> "implementation-defined mapping" lets implementers add any codepoint for
>>> any reason, dataset being one of them. My issue with that wording, is that
>>> i find it confusing. "end-of-line-indicator" not being defined anywhere.
>>> If we think this use case is sufficiently important to be called out
>>> explicitly, would you be amenable to, for example, a footnote?
>>> [Footnote: If the physical source file is a set of records,
>>> implementation may introduce line breaks to denote the end of records] - or
>>> something along those lines
>>>
>>
>> How about:
>> Physical source file characters are mapped, in an implementation-defined
>> manner, to the translation character set (introducing new-line characters
>> for end-of-line indicators).
>> =>
>> The physical source file is mapped, in an implementation-defined manner,
>> to a sequence of translation character set elements.
>>
>
> That would be fine by me!
>
>
>>
>>
>>>
>>>> Note that [cpp.line] contains a reference ("or introduced") to this
>>>> text.
>>>>
>>>
>>>
>>>
>>>>
>>>> (2)
>>>> In the new [lex.whitespaces] subclause, the following is added:
>>>> whitespaces are ignored except as they serve to separate tokens
>>>>
>>>> This seems to have come from the text being removed out of [lex.token]
>>>> (where it was excusable). Whitespace separation is significant in
>>>> [cpp.replace.general], etc. This sentence should at best be a note in
>>>> relation to phase 7 of translation.
>>>>
>>>
>>> I am happy to remove it entirely.
>>> It's certainly not needed for phase 7. And I think phase 3 wording says
>>> something similar.
>>>
>>> In [cpp.pre], Jens objected to my removal of
>>>
>>> > The only whitespace characters that shall appear between preprocessing
>>> tokens within a preprocessing directive (from just after the
>>> directive-introducing token through just before the terminating new-line
>>> character) are space and horizontal-tab (including spaces that have
>>> replaced comments or possibly other whitespace characters in translation
>>> phase 3).
>>>
>>> I am not able to convince myself than the grammar described at the start
>>> at [cpp.pre] allows line-breaks to appear between preprocessing tokens
>>> within a preprocessing directive,
>>> but I'm happy to replaced the striked paragraph by
>>>
>>> Only *horizontal-whitespace*s shall appear between preprocessing tokens
>>> within a preprocessing directive.
>>>
>>
>> This is neither necessary nor harmless. The term "preprocessing
>> directive" is defined right after the grammar in [cpp.pre]. Its definition
>> precludes the presence of line-breaks outside of /**/ between preprocessing
>> tokens within the directive. We do want to allow comments in preprocessing
>> directives.
>>
>
>> This seems to point to another problem with the wording:
>> (3)
>> The removal of the comment replacement in phase 3 means that the
>> definition of preprocessing directives requires an update to qualify that
>> it wants to talk about line-breaks (but not those that are inside /**/
>> comments).
>>
>
> In phase 3 we say
>
> The source file is decomposed into preprocessing tokens and /whitespace/s.
>
>
> So even if they are not pp-token, whitespaces and comments are tokenized
> and do not exist as sequences of individual tokens that could be decomposed
> further by the preprocessor
> At least that was the intent.
>

Is "whitespace" the grammar term intended to be a sequence of whitespaces?
I am not completely sure which grammar formulation issue fixes are in
flight.

The issue with "containing" is that it has a problem similar with CWG's
struggles re: "top level". The concept of there being "atomic" whitespace
"tokens" is useful, but I think the exposition is lacking. Once there is
enough to support it, something along the lines of "where one of the
constituent whitespace tokens is a line-break" would help to avoid
"containing".

>
>
>>
>>
>>>
>>> If people think that avoids confusion.
>>>
>>>
>>>
>>

Received on 2021-09-09 17:17:17