On Tue, 8 Nov 2022 at 15:43, Tom Honermann <tom@honermann.net> wrote:

On 11/8/22 4:16 PM, Jens Maurer wrote:
>
> On 08/11/2022 21.51, Richard Smith via SG16 wrote:
>> On Tue, 8 Nov 2022 at 11:09, Tom Honermann via Core <core@lists.isocpp.org <mailto:core@lists.isocpp.org>> wrote:
>>
>> On 11/8/22 10:32 AM, William M. (Mike) Miller wrote:
>>> On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core <core@lists.isocpp.org <mailto:core@lists.isocpp.org>> wrote:
>>>
>>> Thanks, Corentin.
>>>
>>> I agree that, if ~all existing implementations already treat a lone CR as a new-line, then we might as well standardize it. However, if some don't, then we'll be adding a (probably small) implementation burden for something that I suspect is rare. LF and CR+LF are common occurrences. Do you have data that shows that lone CR is 1) recognized by ~all existing implementations, and 2) is used sufficiently often that it is worth standardizing? Do we want to encourage use of lone CR as a portable new-line? As mentioned, implementations can still support it regardless. Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
>>>
>>> I think it would be worth adding such analysis to a future revision of P2348.
>>>
>>> In the interest of time, is anyone opposed to the CWG direction of requiring both LF and CR+LF in portable UTF-8 source files for C++23 with support for other new-line sequences left to a future standard?
>>>
>>>
>>> Actually, CWG changed direction in the late afternoon session and decided to accept CR as a line-termination character. I'm about to upload drafting implementing that direction for discussion today.
>> Ah, thank you, I'm sorry I missed that discussion.
>>
>> That change resolves the inconsistency with P2348 given Corentin's explicit claim of the intent in that paper.
>>
>> I'm personally happy with this new direction so long as implementors have no concerns (and it seems we already have confirmation that EDG and Clang have no concerns).
>>
>> Given that we already had consensus for P2348 in SG16 and EWG, assuming no new objections are raised, ship it.
>>
>> Not an objection, mostly just clarifying intent: given a source file that contains "#define a \<LF><CR> b", is a conforming implementation required to treat the "a" macro as being empty and the "b" as being on a separate line (as GCC does), or is it still permitted to treat the "b" as being on the same line as the "a" because the <LF><CR> is treated as an escaped new-line sequence (as Clang does)?
> No, not if the source file is a "UTF-8 file" per phase 1.

Richard's question wasn't a yes/no question, but Jens' response appears
to favor the gcc behavior in which <LF> and <CR> each contribute a
new-line. I agree.

Since Unicode does not recognize LF+CR as a single new-line (as it does
for CR+LF), I think the gcc behavior is preferred for portable UTF-8 files.

I assume you're referring to UTR#13 here? Yeah, seems reasonable to follow that in portable Unicode UTF-8 mode.