C++ Logo


Advanced search

Re: [isocpp-core] US 3-030: New-line character sequences in UTF-8 source files

From: Richard Smith <richardsmith_at_[hidden]>
Date: Tue, 8 Nov 2022 12:51:36 -0800
On Tue, 8 Nov 2022 at 11:09, Tom Honermann via Core <core_at_[hidden]>

> On 11/8/22 10:32 AM, William M. (Mike) Miller wrote:
> On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core <
> core_at_[hidden]> wrote:
>> Thanks, Corentin.
>> I agree that, if ~all existing implementations already treat a lone CR as
>> a new-line, then we might as well standardize it. However, if some don't,
>> then we'll be adding a (probably small) implementation burden for something
>> that I suspect is rare. LF and CR+LF are common occurrences. Do you have
>> data that shows that lone CR is 1) recognized by ~all existing
>> implementations, and 2) is used sufficiently often that it is worth
>> standardizing? Do we want to encourage use of lone CR as a portable
>> new-line? As mentioned, implementations can still support it regardless.
>> Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and
>> U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
>> I think it would be worth adding such analysis to a future revision of
>> P2348.
>> In the interest of time, is anyone opposed to the CWG direction of
>> requiring both LF and CR+LF in portable UTF-8 source files for C++23 with
>> support for other new-line sequences left to a future standard?
> Actually, CWG changed direction in the late afternoon session and decided
> to accept CR as a line-termination character. I'm about to upload drafting
> implementing that direction for discussion today.
> Ah, thank you, I'm sorry I missed that discussion.
> That change resolves the inconsistency with P2348 given Corentin's
> explicit claim of the intent in that paper.
> I'm personally happy with this new direction so long as implementors have
> no concerns (and it seems we already have confirmation that EDG and Clang
> have no concerns).
> Given that we already had consensus for P2348 in SG16 and EWG, assuming no
> new objections are raised, ship it.
Not an objection, mostly just clarifying intent: given a source file that
contains "#define a \<LF><CR> b", is a conforming implementation required
to treat the "a" macro as being empty and the "b" as being on a separate
line (as GCC does), or is it still permitted to treat the "b" as being on
the same line as the "a" because the <LF><CR> is treated as an escaped
new-line sequence (as Clang does)?

I don't think LF CR is at all common these days -- I think it was only
really used on the BBC Micro and on Acorn RISC PCs, but those both still
exist, and Wikipedia says the Acorn C/C++ compiler suite had a release
earlier this year. Hopefully we're not going to break line continuations in
all of their macros :)

> Tom.
> I don't know about the ubiquity of that support, but the EDG front end has
> it as a build-time configuration option that customers can enable or not,
> as they choose. Here's the description of the flag (note that it cites
> gcc's processing as its basis):
> /*
>> Flag that is TRUE to indicate that carriage return or carriage return
>> followed by newline can be used as a line terminator in GNU-compatible
>> modes. This feature is provided to allow files with old MacOS line
>> terminators to be accepted. The implementation is compatible with the way
>> in which the GNU compiler handles such line terminators. It is disabled
>> by
>> default because it is not required by most users.
>> */
> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2022/11/13459.php

Received on 2022-11-08 20:51:48