Corentin, thank you. With the further clarity re: what whitespace is meant to represent, I was able to do a more complete review.Comments:The removal of the first sentence in [lex.token] seems like a drive-by change that I am not quite understanding the motivation for.
The comment grammar presentation is still, under the existing conventions of reading, describing only comments with a "payload" of a single character. Please refer to how r-char-sequence is defined for inspiration.
The addition of "whitespaces are ignored except as they serve to separate tokens" in [lex.whitespaces] that was to be removed is still there (and the sentence still begins with the lowercase grammar term).
The "whitespace containing" issue in [cpp.pre] still occurs in another instance occurring after "followed by" (as opposed to the fixed instance that occurred after "follows a"):followed by whitespace containing a line-break=>followed by whitespaces containing a line-break
The "[only] horizontal-whitespaces" sentence in [cpp.pre] should be removed or made a note (replacing "shall" with "can"). Also, as noted in the SG 16 meeting, the removal of the existing diagnosable rule does require changes to GCC's -pedantic-errors implementation: The "Do not perform whitespace replacement" part of the design section of the paper makes a statement that appears to be contrary to this fact.
For [cpp.replace.general]:same number, ordering, spelling, and whitespace separation, where all whitespace separations are considered identical=>same number, ordering, spelling, and separation with whitespaces, where all sequences of one or more whitespaces are considered identical
There shall be whitespace=>There shall be one or more whitespaces
Any whitespace preceding=>Any whitespaces preceding
Important (!), in [cpp.stringize]:Each occurence of whitespace=>Each sequence of one or more whitespaces
In [cpp.line]:number of line-break read or introduced in translation phase 1=>number of line-breaks resulting from translation phase 1
On Mon, Sep 13, 2021 at 6:02 AM Corentin <corentin.jabot@gmail.com> wrote:Here is a draft of changes as requested by SG16, Jens, and HubertI found it better for /whitespace/ to refer to a single whitespace instead of describing a sequence. I have adjusted the pluralisation of everything accordingly.I've added some notes to clarify the intent in important placesI've used Hubert's excellent suggestion for phase 1 of translationI've put back some prose to describe multi-line commentsI made sure whitespace does not appear at the start of a sentenceI introduced the grammar term line-break-character to describe single-codepoint line-breaks (\n, \r) independently of line-breaks sequences (like \r\n)Hopefully we can take this of the hands of SG16!Thanks again for the feedback,CorentinOn Fri, Sep 10, 2021 at 9:36 AM Jens Maurer <Jens.Maurer@gmx.net> wrote:On 09/09/2021 22.54, Hubert Tong wrote:
> (2)
> In the new [lex.whitespaces] subclause, the following is added:
> whitespaces are ignored except as they serve to separate tokens
>
> This seems to have come from the text being removed out of [lex.token] (where it was excusable). Whitespace separation is significant in [cpp.replace.general], etc. This sentence should at best be a note in relation to phase 7 of translation.
>
>
> I am happy to remove it entirely.
> It's certainly not needed for phase 7. And I think phase 3 wording says something similar.
>
> In [cpp.pre], Jens objected to my removal of
>
> > The only whitespace characters that shall appear between preprocessing tokens within a preprocessing directive (from just after the directive-introducing token through just before the terminating new-line character) are space and horizontal-tab (including spaces that have replaced comments or possibly other whitespace characters in translation phase 3).
>
> I am not able to convince myself than the grammar described at the start at [cpp.pre] allows line-breaks to appear between preprocessing tokens within a preprocessing directive,
> but I'm happy to replaced the striked paragraph by
>
> Only /horizontal-whitespace/s shall appear between preprocessing tokens within a preprocessing directive.
>
>
> This is neither necessary nor harmless. The term "preprocessing directive" is defined right after the grammar in [cpp.pre]. Its definition precludes the presence of line-breaks outside of /**/ between preprocessing tokens within the directive. We do want to allow comments in preprocessing directives.
Good point; I was missing [cpp.pre] p1, which seems to say everything we want to say.
So, I'm good with the removal now.
> This seems to point to another problem with the wording:
> (3)
> The removal of the comment replacement in phase 3 means that the definition of preprocessing directives requires an update to qualify that it wants to talk about line-breaks (but not those that are inside /**/ comments).
Indeed, the status quo seems to allow new-lines within /* comments */ inside a preprocessing-directive.
Maybe it would be clearer/easier to retain the replacement of comments with a single space
character in phase 3? While we do need to differentiate different kinds of whitespace
(horizontal whitespace vs. new-lines) in phase 4, there's no point in talking about comments
separately beyond phase 3.
Jens