Hey Hubert.
Thanks a lot for the feedback. It was really helpful.
I think I addressed all your comments. Just in time for the mailing deadline.
I am hoping any further issue can be dealt with between SG-16 approval and core review.

Regards, 
Corentin


On Wed, Sep 15, 2021 at 2:13 AM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
Corentin, thank you. With the further clarity re: what whitespace is meant to represent, I was able to do a more complete review.

Comments:
The removal of the first sentence in [lex.token] seems like a drive-by change that I am not quite understanding the motivation for.

Added a note to explain why I'm proposing to remove that.
If CWG wants me to revert it, that is easily done.
 

The comment grammar presentation is still, under the existing conventions of reading, describing only comments with a "payload" of a single character. Please refer to how r-char-sequence is defined for inspiration.

Ok, I see it now.
I've fixed that. I'm not sure having a grammar for comments constitutes a simplification now... but I still like the idea of having it. 

 
The addition of "whitespaces are ignored except as they serve to separate tokens" in [lex.whitespaces] that was to be removed is still there (and the sentence still begins with the lowercase grammar term).

Removed
 
The "whitespace containing" issue in [cpp.pre] still occurs in another instance occurring after "followed by" (as opposed to the fixed instance that occurred after "follows a"):
followed by whitespace containing a line-break
=>
followed by whitespaces containing a line-break

Fixed
 

The "[only] horizontal-whitespaces" sentence in [cpp.pre] should be removed or made a note (replacing "shall" with "can"). Also, as noted in the SG 16 meeting, the removal of the existing diagnosable rule does require changes to GCC's -pedantic-errors implementation: The "Do not perform whitespace replacement" part of the design section of the paper makes a statement that appears to be contrary to this fact.

Made it a note
I also added that point to the motivation.

 
For [cpp.replace.general]:
same number, ordering, spelling, and whitespace separation, where all whitespace separations are considered identical
=>
same  number, ordering, spelling, and separation with whitespaces, where all sequences of one or more whitespaces are considered identical

Fixed
 

There shall be whitespace
=>
There shall be one or more whitespaces

Fixed 

Any whitespace preceding
=>
Any whitespaces preceding

Fixed 

Important (!), in [cpp.stringize]:
Each occurence of whitespace
=>
Each sequence of one or more whitespaces

Indeed! Fixed
 

In [cpp.line]:
number of line-break read or introduced in translation phase 1
=>
number of line-breaks resulting from translation phase 1

I had no strong opinion there, changed! 

 

On Mon, Sep 13, 2021 at 6:02 AM Corentin <corentin.jabot@gmail.com> wrote:
Here is a draft of changes as requested by SG16, Jens, and Hubert

I found it better for /whitespace/ to refer to a single whitespace instead of describing a sequence. I have adjusted the pluralisation of everything accordingly.

I've added some notes to clarify the intent in important places
I've used Hubert's excellent suggestion for phase 1 of translation
I've put back some prose to describe multi-line comments
I made sure whitespace does not appear at the start of a sentence
I introduced the grammar term line-break-character to describe single-codepoint line-breaks (\n, \r) independently of line-breaks sequences (like \r\n)

Hopefully we can take this of the hands of SG16!

Thanks again for the feedback,

Corentin


On Fri, Sep 10, 2021 at 9:36 AM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 09/09/2021 22.54, Hubert Tong wrote:

>         (2)
>         In the new [lex.whitespaces] subclause, the following is added:
>         whitespaces are ignored except as they serve to separate tokens
>
>         This seems to have come from the text being removed out of [lex.token] (where it was excusable). Whitespace separation is significant in [cpp.replace.general], etc. This sentence should at best be a note in relation to phase 7 of translation.
>
>
>     I am happy to remove it entirely.
>     It's certainly not needed for phase 7. And I think phase 3 wording says something similar.
>
>     In [cpp.pre], Jens objected to my removal of
>
>     > The only whitespace characters that shall appear between preprocessing tokens within a preprocessing directive (from just after the directive-introducing token through just before the terminating new-line character) are space and horizontal-tab (including spaces that have replaced comments or possibly other whitespace characters in translation phase 3).
>
>     I am not able to convince myself than the grammar described at the start at [cpp.pre] allows line-breaks to appear between preprocessing tokens within a preprocessing directive,
>     but I'm happy to replaced the striked paragraph by
>
>     Only /horizontal-whitespace/s shall appear between preprocessing tokens within a preprocessing directive.
>
>
> This is neither necessary nor harmless. The term "preprocessing directive" is defined right after the grammar in [cpp.pre]. Its definition precludes the presence of line-breaks outside of /**/ between preprocessing tokens within the directive. We do want to allow comments in preprocessing directives.

Good point; I was missing [cpp.pre] p1, which seems to say everything we want to say.
So, I'm good with the removal now.

> This seems to point to another problem with the wording:
> (3)
> The removal of the comment replacement in phase 3 means that the definition of preprocessing directives requires an update to qualify that it wants to talk about line-breaks (but not those that are inside /**/ comments).

Indeed, the status quo seems to allow new-lines within /* comments */ inside a preprocessing-directive.

Maybe it would be clearer/easier to retain the replacement of comments with a single space
character in phase 3?  While we do need to differentiate different kinds of whitespace
(horizontal whitespace vs. new-lines) in phase 4, there's no point in talking about comments
separately beyond phase 3.

Jens