Subject: Re: On whitespaces and new-line
From: Tom Honermann (tom_at_[hidden])
Date: 2021-03-27 18:33:14
> On Mar 27, 2021, at 11:59 AM, Corentin <corentin.jabot_at_[hidden]> wrote:
>> On Fri, Mar 26, 2021 at 1:57 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>> On 26/03/2021 12.00, Corentin via SG16 wrote:
>> > I believe there are 2 options in terms of wording - both mechanisms being indistinguishable from each other.
>> > 1/ Specify that a new-line is a specific set of character sequences(lf, crlf, cr, nel) and make it a grammar element which is then used in [lex] and [cpp] where /new-line/ and new-line are currently mentioned
>> > 2/ Specify that in phase 1 line terminators are replaced by LF and replace all mention of new-line pertaining to lexing by LINE FEED (but not evaluated raw string literals).
>> I think (2) is what the status quo wording does.
>> While we believe "new-line" is slightly hazy,
>> [lex.ccon] p4 table 10 clearly associates
>> "new-line" with the single character "NL(LF)"
>> (whatever that means), not with a sequence of
>> For example, that also means you need to use
>> "\r\n" on DOS to get a DOS-style line ending,
>> not just "\n".
> Which i think is the desired behavior
>> We can certainly reconsider this state of affairs
>> (in particular, we can make "new-line" a lexing
>> element that is some character sequence), which
>> would allow/require retaining the exact shape
>> of the character sequence for raw string literals,
>> but that's not what compilers current do, I think.
>> (But maybe that's a bug.)
>> > In any case I think we want to specify what a _whitespace_ is as a grammar element and replace all mention of whitespace, whitespaces, whitespace characters by /whitespace./
>> Sounds reasonable.
>> > For simplicity, it's probably useful to define /horizontal-whitespace/ and /whitespace, /maybe in [lex.token]
>> > /horizontal-whitespace/
>> > /horizontal-whitespace/
>> > SPACE
>> > HORIZONTAL TAB
>> > /whitespace/
>> > / horizontal-whitespace/
>> > LINE FEED
>> > If we want to keep exact line terminators in phase 1, we can do the same for new-line (note, there is currently a grammar production for new-line in [cpp]: /new-line/: the new-line character)
>> > We could simplify further by adding comments to whitespaces, but there is no grammar for that :(
>> We could add some grammar.
> I spent quite a bit of time on that.
> After some reflection I decided to conserve line-break as a grammar element instead of referring to LINE FEED directly.
> I decided to use the term line-break so that it doesn't collide with new-line in string literals.
> While string-literals use LINE FEED for new-line, I think it's valid for that to be mapped to for example NEXT LINE in phase 5, so we probably want to keep the term new-line,
> as it is later referred to in the library part (to mean whatever line feed maps to, rather than specifically line feed).
> (of course, it's an early draft, but I am hoping both SG16 and core would like the direction)
SG16 list run by email@example.com