On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core <core@lists.isocpp.org> wrote:

Thanks, Corentin.

I agree that, if ~all existing implementations already treat a lone CR as a new-line, then we might as well standardize it. However, if some don't, then we'll be adding a (probably small) implementation burden for something that I suspect is rare. LF and CR+LF are common occurrences. Do you have data that shows that lone CR is 1) recognized by ~all existing implementations, and 2) is used sufficiently often that it is worth standardizing? Do we want to encourage use of lone CR as a portable new-line? As mentioned, implementations can still support it regardless. Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and U+2029 (PARAGRAPH SEPARATOR) as line-break characters.

I think it would be worth adding such analysis to a future revision of P2348.

In the interest of time, is anyone opposed to the CWG direction of requiring both LF and CR+LF in portable UTF-8 source files for C++23 with support for other new-line sequences left to a future standard?


Actually, CWG changed direction in the late afternoon session and decided to accept CR as a line-termination character. I'm about to upload drafting implementing that direction for discussion today.

I don't know about the ubiquity of that support, but the EDG front end has it as a build-time configuration option that customers can enable or not, as they choose. Here's the description of the flag (note that it cites gcc's processing as its basis):

/*
Flag that is TRUE to indicate that carriage return or carriage return
followed by newline can be used as a line terminator in GNU-compatible
modes.  This feature is provided to allow files with old MacOS line
terminators to be accepted.  The implementation is compatible with the way
in which the GNU compiler handles such line terminators.  It is disabled by
default because it is not required by most users.
*/