C++ Logo

SG16

Advanced search

Subject: Re: On whitespaces and new-line
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-03-27 10:59:52


On Fri, Mar 26, 2021 at 1:57 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 26/03/2021 12.00, Corentin via SG16 wrote:
> > I believe there are 2 options in terms of wording - both
> mechanisms being indistinguishable from each other.
> >
> > 1/ Specify that a new-line is a specific set of character sequences(lf,
> crlf, cr, nel) and make it a grammar element which is then used in [lex]
> and [cpp] where /new-line/ and new-line are currently mentioned
> > 2/ Specify that in phase 1 line terminators are replaced by LF and
> replace all mention of new-line pertaining to lexing by LINE FEED (but not
> evaluated raw string literals).
>
> I think (2) is what the status quo wording does.
> While we believe "new-line" is slightly hazy,
> [lex.ccon] p4 table 10 clearly associates
> "new-line" with the single character "NL(LF)"
> (whatever that means), not with a sequence of
> characters.
>
> For example, that also means you need to use
> "\r\n" on DOS to get a DOS-style line ending,
> not just "\n".
>

Which i think is the desired behavior

We can certainly reconsider this state of affairs
> (in particular, we can make "new-line" a lexing
> element that is some character sequence), which
> would allow/require retaining the exact shape
> of the character sequence for raw string literals,
> but that's not what compilers current do, I think.
> (But maybe that's a bug.)
>
> > In any case I think we want to specify what a _whitespace_ is as a
> grammar element and replace all mention of whitespace, whitespaces,
> whitespace characters by /whitespace./
>
> Sounds reasonable.
>
> > For simplicity, it's probably useful to define /horizontal-whitespace/
> and /whitespace, /maybe in [lex.token]
> >
> > /horizontal-whitespace/
> > /horizontal-whitespace/
> > SPACE
> > HORIZONTAL TAB
> >
> > /whitespace/
> > / horizontal-whitespace/
> > LINE FEED
> >
> > If we want to keep exact line terminators in phase 1, we can do the same
> for new-line (note, there is currently a grammar production for new-line in
> [cpp]: /new-line/: the new-line character)
> >
> > We could simplify further by adding comments to whitespaces, but there
> is no grammar for that :(
>
> We could add some grammar.
>

https://isocpp.org/files/papers/D2348R0.pdf

I spent quite a bit of time on that.
After some reflection I decided to conserve line-break as a grammar element
instead of referring to LINE FEED directly.
I decided to use the term line-break so that it doesn't collide with
new-line in string literals.
While string-literals use LINE FEED for new-line, I think it's valid for
that to be mapped to for example NEXT LINE in phase 5, so we probably want
to keep the term new-line,
as it is later referred to in the library part (to mean whatever line feed
maps to, rather than specifically line feed).
(of course, it's an early draft, but I am hoping both SG16 and core would
like the direction)

>
> Jens
>



SG16 list run by sg16-owner@lists.isocpp.org