C++ Logo


Advanced search

Re: [SG16] On whitespaces and new-line

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 26 Mar 2021 13:57:36 +0100
On 26/03/2021 12.00, Corentin via SG16 wrote:
> I believe there are 2 options in terms of wording - both mechanisms being indistinguishable from each other.
> 1/ Specify that a new-line is a specific set of character sequences(lf, crlf, cr, nel) and make it a grammar element which is then used in [lex] and [cpp] where /new-line/ and new-line are currently mentioned
> 2/ Specify that in phase 1 line terminators are replaced by LF and replace all mention of new-line pertaining to lexing by LINE FEED (but not evaluated raw string literals).

I think (2) is what the status quo wording does.
While we believe "new-line" is slightly hazy,
[lex.ccon] p4 table 10 clearly associates
"new-line" with the single character "NL(LF)"
(whatever that means), not with a sequence of

For example, that also means you need to use
"\r\n" on DOS to get a DOS-style line ending,
not just "\n".

We can certainly reconsider this state of affairs
(in particular, we can make "new-line" a lexing
element that is some character sequence), which
would allow/require retaining the exact shape
of the character sequence for raw string literals,
but that's not what compilers current do, I think.
(But maybe that's a bug.)

> In any case I think we want to specify what a _whitespace_ is as a grammar element and replace all mention of whitespace, whitespaces, whitespace characters by /whitespace./

Sounds reasonable.

> For simplicity, it's probably useful to define /horizontal-whitespace/ and /whitespace, /maybe in [lex.token]
> /horizontal-whitespace/
> /horizontal-whitespace/
> /whitespace/
> / horizontal-whitespace/
> If we want to keep exact line terminators in phase 1, we can do the same for new-line (note, there is currently a grammar production for new-line in [cpp]: /new-line/: the new-line character)
> We could simplify further by adding comments to whitespaces, but there is no grammar for that :(

We could add some grammar.


Received on 2021-03-26 07:57:43