On Thu, Sep 9, 2021 at 5:32 PM Corentin <corentin.jabot@gmail.com> wrote:

On Thu, Sep 9, 2021 at 10:55 PM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
On Thu, Sep 9, 2021 at 3:52 AM Corentin <corentin.jabot@gmail.com> wrote:

On Thu, Sep 9, 2021 at 1:30 AM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
My thanks to the author for addressing the feedback given on r0. I now have feedback (below) on r1.

-- HT

I believe the author is already aware that the comment grammar should be corrected to indicate a sequence of characters within the comment.
I believe similarly with regards to objections to using the grammar term in prose.

As noted in the SG 16 meeting today, the paper removes the freedom for an implementation to consider a preprocessing directive containing vertical tab or form feed as being an error. This is a design decision that the paper does not acknowledge as such. I believe that noting this change clearly in the paper's design section is sufficient to remedy this.

I have the following additional concerns about the wording:

(1)
In [lex.phases],
"introducing new-line characters for end-of-line indicators"
is removed.

I do not believe that this is helpful. The presumed intent remains that implementations may introduce line-breaks of one form or another for end-of-line indicators that do not correspond particularly to any "physical source file character".

I do not believe the existing wording is helpful either. "implementation-defined mapping" lets implementers add any codepoint for any reason, dataset being one of them. My issue with that wording, is that i find it confusing. "end-of-line-indicator" not being defined anywhere.
If we think this use case is sufficiently important to be called out explicitly, would you be amenable to, for example, a footnote?
[Footnote: If the physical source file is a set of records, implementation may introduce line breaks to denote the end of records] - or something along those lines

How about:
Physical source file characters are mapped, in an implementation-defined manner, to the translation character set (introducing new-line characters for end-of-line indicators).
=>
The physical source file is mapped, in an implementation-defined manner, to a sequence of translation character set elements.

That would be fine by me!

Note that [cpp.line] contains a reference ("or introduced") to this text.

(2)
In the new [lex.whitespaces] subclause, the following is added:
whitespaces are ignored except as they serve to separate tokens

This seems to have come from the text being removed out of [lex.token] (where it was excusable). Whitespace separation is significant in [cpp.replace.general], etc. This sentence should at best be a note in relation to phase 7 of translation.

I am happy to remove it entirely.
It's certainly not needed for phase 7. And I think phase 3 wording says something similar.

In [cpp.pre], Jens objected to my removal of

> The only whitespace characters that shall appear between preprocessing tokens within a preprocessing directive (from just after the directive-introducing token through just before the terminating new-line character) are space and horizontal-tab (including spaces that have replaced comments or possibly other whitespace characters in translation phase 3).

I am not able to convince myself than the grammar described at the start at [cpp.pre] allows line-breaks to appear between preprocessing tokens within a preprocessing directive,
but I'm happy to replaced the striked paragraph by

Only horizontal-whitespaces shall appear between preprocessing tokens within a preprocessing directive.

This is neither necessary nor harmless. The term "preprocessing directive" is defined right after the grammar in [cpp.pre]. Its definition precludes the presence of line-breaks outside of /**/ between preprocessing tokens within the directive. We do want to allow comments in preprocessing directives.

This seems to point to another problem with the wording:
(3)
The removal of the comment replacement in phase 3 means that the definition of preprocessing directives requires an update to qualify that it wants to talk about line-breaks (but not those that are inside /**/ comments).

In phase 3 we say

The source file is decomposed into preprocessing tokens and /whitespace/s.

So even if they are not pp-token, whitespaces and comments are tokenized and do not exist as sequences of individual tokens that could be decomposed further by the preprocessor
At least that was the intent.

Is "whitespace" the grammar term intended to be a sequence of whitespaces? I am not completely sure which grammar formulation issue fixes are in flight.

The issue with "containing" is that it has a problem similar with CWG's struggles re: "top level". The concept of there being "atomic" whitespace "tokens" is useful, but I think the exposition is lacking. Once there is enough to support it, something along the lines of "where one of the constituent whitespace tokens is a line-break" would help to avoid "containing".

If people think that avoids confusion.