Date: Fri, 26 Mar 2021 13:57:36 +0100
On 26/03/2021 12.00, Corentin via SG16 wrote:
> I believe there are 2 options in terms of wording - both mechanisms being indistinguishable from each other.
>
> 1/ Specify that a new-line is a specific set of character sequences(lf, crlf, cr, nel) and make it a grammar element which is then used in [lex] and [cpp] where /new-line/ and new-line are currently mentioned
> 2/ Specify that in phase 1 line terminators are replaced by LF and replace all mention of new-line pertaining to lexing by LINE FEED (but not evaluated raw string literals).
I think (2) is what the status quo wording does.
While we believe "new-line" is slightly hazy,
[lex.ccon] p4 table 10 clearly associates
"new-line" with the single character "NL(LF)"
(whatever that means), not with a sequence of
characters.
For example, that also means you need to use
"\r\n" on DOS to get a DOS-style line ending,
not just "\n".
We can certainly reconsider this state of affairs
(in particular, we can make "new-line" a lexing
element that is some character sequence), which
would allow/require retaining the exact shape
of the character sequence for raw string literals,
but that's not what compilers current do, I think.
(But maybe that's a bug.)
> In any case I think we want to specify what a _whitespace_ is as a grammar element and replace all mention of whitespace, whitespaces, whitespace characters by /whitespace./
Sounds reasonable.
> For simplicity, it's probably useful to define /horizontal-whitespace/ and /whitespace, /maybe in [lex.token]
>
> /horizontal-whitespace/
> /horizontal-whitespace/
> SPACE
> HORIZONTAL TAB
>
> /whitespace/
> / horizontal-whitespace/
> LINE FEED
>
> If we want to keep exact line terminators in phase 1, we can do the same for new-line (note, there is currently a grammar production for new-line in [cpp]: /new-line/: the new-line character)
>
> We could simplify further by adding comments to whitespaces, but there is no grammar for that :(
We could add some grammar.
Jens
> I believe there are 2 options in terms of wording - both mechanisms being indistinguishable from each other.
>
> 1/ Specify that a new-line is a specific set of character sequences(lf, crlf, cr, nel) and make it a grammar element which is then used in [lex] and [cpp] where /new-line/ and new-line are currently mentioned
> 2/ Specify that in phase 1 line terminators are replaced by LF and replace all mention of new-line pertaining to lexing by LINE FEED (but not evaluated raw string literals).
I think (2) is what the status quo wording does.
While we believe "new-line" is slightly hazy,
[lex.ccon] p4 table 10 clearly associates
"new-line" with the single character "NL(LF)"
(whatever that means), not with a sequence of
characters.
For example, that also means you need to use
"\r\n" on DOS to get a DOS-style line ending,
not just "\n".
We can certainly reconsider this state of affairs
(in particular, we can make "new-line" a lexing
element that is some character sequence), which
would allow/require retaining the exact shape
of the character sequence for raw string literals,
but that's not what compilers current do, I think.
(But maybe that's a bug.)
> In any case I think we want to specify what a _whitespace_ is as a grammar element and replace all mention of whitespace, whitespaces, whitespace characters by /whitespace./
Sounds reasonable.
> For simplicity, it's probably useful to define /horizontal-whitespace/ and /whitespace, /maybe in [lex.token]
>
> /horizontal-whitespace/
> /horizontal-whitespace/
> SPACE
> HORIZONTAL TAB
>
> /whitespace/
> / horizontal-whitespace/
> LINE FEED
>
> If we want to keep exact line terminators in phase 1, we can do the same for new-line (note, there is currently a grammar production for new-line in [cpp]: /new-line/: the new-line character)
>
> We could simplify further by adding comments to whitespaces, but there is no grammar for that :(
We could add some grammar.
Jens
Received on 2021-03-26 07:57:43