Date: Mon, 1 Jun 2020 14:53:48 +0200
The standard doesn't specify what the new-line character is.
According to Unicode, the following codepoint sequences should be
considered lines terminators
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Similarly, the standard defines "white spaces" loosely as "blanks,
horizontal and vertical tabs", however there are more white space
characters in unicode https://en.wikipedia.org/wiki/Whitespace_character
What I would like to do:
* Define new-line and white-spaces as grammar term, with an explicit list
of codepoint sequences.
* In phase 2, replace all characters which represent a line termination
with Line Feed (which is reverted later for raw string literals). this
would notably fix https://wg21.link/cwg1655
* It would also help to mandate that trailing whitespaces are removed in
phase 2
Does that make sense to anyone ?
According to Unicode, the following codepoint sequences should be
considered lines terminators
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Similarly, the standard defines "white spaces" loosely as "blanks,
horizontal and vertical tabs", however there are more white space
characters in unicode https://en.wikipedia.org/wiki/Whitespace_character
What I would like to do:
* Define new-line and white-spaces as grammar term, with an explicit list
of codepoint sequences.
* In phase 2, replace all characters which represent a line termination
with Line Feed (which is reverted later for raw string literals). this
would notably fix https://wg21.link/cwg1655
* It would also help to mandate that trailing whitespaces are removed in
phase 2
Does that make sense to anyone ?
Received on 2020-06-01 07:57:06