Date: Tue, 8 Nov 2022 10:32:05 -0500
On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core <
core_at_[hidden]> wrote:
> Thanks, Corentin.
>
> I agree that, if ~all existing implementations already treat a lone CR as
> a new-line, then we might as well standardize it. However, if some don't,
> then we'll be adding a (probably small) implementation burden for something
> that I suspect is rare. LF and CR+LF are common occurrences. Do you have
> data that shows that lone CR is 1) recognized by ~all existing
> implementations, and 2) is used sufficiently often that it is worth
> standardizing? Do we want to encourage use of lone CR as a portable
> new-line? As mentioned, implementations can still support it regardless.
> Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and
> U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
>
> I think it would be worth adding such analysis to a future revision of
> P2348.
>
> In the interest of time, is anyone opposed to the CWG direction of
> requiring both LF and CR+LF in portable UTF-8 source files for C++23 with
> support for other new-line sequences left to a future standard?
>
Actually, CWG changed direction in the late afternoon session and decided
to accept CR as a line-termination character. I'm about to upload drafting
implementing that direction for discussion today.
I don't know about the ubiquity of that support, but the EDG front end has
it as a build-time configuration option that customers can enable or not,
as they choose. Here's the description of the flag (note that it cites
gcc's processing as its basis):
/*
> Flag that is TRUE to indicate that carriage return or carriage return
> followed by newline can be used as a line terminator in GNU-compatible
> modes. This feature is provided to allow files with old MacOS line
> terminators to be accepted. The implementation is compatible with the way
> in which the GNU compiler handles such line terminators. It is disabled by
> default because it is not required by most users.
> */
>
core_at_[hidden]> wrote:
> Thanks, Corentin.
>
> I agree that, if ~all existing implementations already treat a lone CR as
> a new-line, then we might as well standardize it. However, if some don't,
> then we'll be adding a (probably small) implementation burden for something
> that I suspect is rare. LF and CR+LF are common occurrences. Do you have
> data that shows that lone CR is 1) recognized by ~all existing
> implementations, and 2) is used sufficiently often that it is worth
> standardizing? Do we want to encourage use of lone CR as a portable
> new-line? As mentioned, implementations can still support it regardless.
> Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and
> U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
>
> I think it would be worth adding such analysis to a future revision of
> P2348.
>
> In the interest of time, is anyone opposed to the CWG direction of
> requiring both LF and CR+LF in portable UTF-8 source files for C++23 with
> support for other new-line sequences left to a future standard?
>
Actually, CWG changed direction in the late afternoon session and decided
to accept CR as a line-termination character. I'm about to upload drafting
implementing that direction for discussion today.
I don't know about the ubiquity of that support, but the EDG front end has
it as a build-time configuration option that customers can enable or not,
as they choose. Here's the description of the flag (note that it cites
gcc's processing as its basis):
/*
> Flag that is TRUE to indicate that carriage return or carriage return
> followed by newline can be used as a line terminator in GNU-compatible
> modes. This feature is provided to allow files with old MacOS line
> terminators to be accepted. The implementation is compatible with the way
> in which the GNU compiler handles such line terminators. It is disabled by
> default because it is not required by most users.
> */
>
Received on 2022-11-08 15:32:16