C++ Logo


Advanced search

Re: [isocpp-core] US 3-030: New-line character sequences in UTF-8 source files

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 8 Nov 2022 14:09:24 -0500
On 11/8/22 10:32 AM, William M. (Mike) Miller wrote:
> On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core
> <core_at_[hidden]> wrote:
> Thanks, Corentin.
> I agree that, if ~all existing implementations already treat a
> lone CR as a new-line, then we might as well standardize it.
> However, if some don't, then we'll be adding a (probably small)
> implementation burden for something that I suspect is rare. LF and
> CR+LF are common occurrences. Do you have data that shows that
> lone CR is 1) recognized by ~all existing implementations, and 2)
> is used sufficiently often that it is worth standardizing? Do we
> want to encourage use of lone CR as a portable new-line? As
> mentioned, implementations can still support it regardless.
> Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE
> SEPARATOR), and U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
> I think it would be worth adding such analysis to a future
> revision of P2348.
> In the interest of time, is anyone opposed to the CWG direction of
> requiring both LF and CR+LF in portable UTF-8 source files for
> C++23 with support for other new-line sequences left to a future
> standard?
> Actually, CWG changed direction in the late afternoon session and
> decided to accept CR as a line-termination character. I'm about to
> upload drafting implementing that direction for discussion today.

Ah, thank you, I'm sorry I missed that discussion.

That change resolves the inconsistency with P2348 given Corentin's
explicit claim of the intent in that paper.

I'm personally happy with this new direction so long as implementors
have no concerns (and it seems we already have confirmation that EDG and
Clang have no concerns).

Given that we already had consensus for P2348 in SG16 and EWG, assuming
no new objections are raised, ship it.


> I don't know about the ubiquity of that support, but the EDG front end
> has it as a build-time configuration option that customers can enable
> or not, as they choose. Here's the description of the flag (note that
> it cites gcc's processing as its basis):
> /*
> Flag that is TRUE to indicate that carriage return or carriage return
> followed by newline can be used as a line terminator in GNU-compatible
> modes. This feature is provided to allow files with old MacOS line
> terminators to be accepted. The implementation is compatible with
> the way
> in which the GNU compiler handles such line terminators. It is
> disabled by
> default because it is not required by most users.
> */

Received on 2022-11-08 19:09:26