C++ Logo


Advanced search

Re: [SG16] On whitespaces and new-line

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sat, 27 Mar 2021 16:59:52 +0100
On Fri, Mar 26, 2021 at 1:57 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 26/03/2021 12.00, Corentin via SG16 wrote:
> > I believe there are 2 options in terms of wording - both
> mechanisms being indistinguishable from each other.
> >
> > 1/ Specify that a new-line is a specific set of character sequences(lf,
> crlf, cr, nel) and make it a grammar element which is then used in [lex]
> and [cpp] where /new-line/ and new-line are currently mentioned
> > 2/ Specify that in phase 1 line terminators are replaced by LF and
> replace all mention of new-line pertaining to lexing by LINE FEED (but not
> evaluated raw string literals).
> I think (2) is what the status quo wording does.
> While we believe "new-line" is slightly hazy,
> [lex.ccon] p4 table 10 clearly associates
> "new-line" with the single character "NL(LF)"
> (whatever that means), not with a sequence of
> characters.
> For example, that also means you need to use
> "\r\n" on DOS to get a DOS-style line ending,
> not just "\n".

Which i think is the desired behavior

We can certainly reconsider this state of affairs
> (in particular, we can make "new-line" a lexing
> element that is some character sequence), which
> would allow/require retaining the exact shape
> of the character sequence for raw string literals,
> but that's not what compilers current do, I think.
> (But maybe that's a bug.)
> > In any case I think we want to specify what a _whitespace_ is as a
> grammar element and replace all mention of whitespace, whitespaces,
> whitespace characters by /whitespace./
> Sounds reasonable.
> > For simplicity, it's probably useful to define /horizontal-whitespace/
> and /whitespace, /maybe in [lex.token]
> >
> > /horizontal-whitespace/
> > /horizontal-whitespace/
> >
> > /whitespace/
> > / horizontal-whitespace/
> >
> > If we want to keep exact line terminators in phase 1, we can do the same
> for new-line (note, there is currently a grammar production for new-line in
> [cpp]: /new-line/: the new-line character)
> >
> > We could simplify further by adding comments to whitespaces, but there
> is no grammar for that :(
> We could add some grammar.


I spent quite a bit of time on that.
After some reflection I decided to conserve line-break as a grammar element
instead of referring to LINE FEED directly.
I decided to use the term line-break so that it doesn't collide with
new-line in string literals.
While string-literals use LINE FEED for new-line, I think it's valid for
that to be mapped to for example NEXT LINE in phase 5, so we probably want
to keep the term new-line,
as it is later referred to in the library part (to mean whatever line feed
maps to, rather than specifically line feed).
(of course, it's an early draft, but I am hoping both SG16 and core would
like the direction)

> Jens

Received on 2021-03-27 11:00:10