On 11/8/22 9:09 PM, Richard Smith via SG16 wrote:
On Tue, 8 Nov 2022 at 15:43, Tom Honermann <tom@honermann.net> wrote:
On 11/8/22 4:16 PM, Jens Maurer wrote:
>
> On 08/11/2022 21.51, Richard Smith via SG16 wrote:
>> On Tue, 8 Nov 2022 at 11:09, Tom Honermann via Core <core@lists.isocpp.org <mailto:core@lists.isocpp.org>> wrote:
>>
>>      On 11/8/22 10:32 AM, William M. (Mike) Miller wrote:
>>>      On Tue, Nov 8, 2022 at 12:41 AM Tom Honermann via Core <core@lists.isocpp.org <mailto:core@lists.isocpp.org>> wrote:
>>>
>>>          Thanks, Corentin.
>>>
>>>          I agree that, if ~all existing implementations already treat a lone CR as a new-line, then we might as well standardize it. However, if some don't, then we'll be adding a (probably small) implementation burden for something that I suspect is rare. LF and CR+LF are common occurrences. Do you have data that shows that lone CR is 1) recognized by ~all existing implementations, and 2) is used sufficiently often that it is worth standardizing? Do we want to encourage use of lone CR as a portable new-line? As mentioned, implementations can still support it regardless. Unicode also recognizes U+0085 (NEXT LINE), U+2028 (LINE SEPARATOR), and U+2029 (PARAGRAPH SEPARATOR) as line-break characters.
>>>
>>>          I think it would be worth adding such analysis to a future revision of P2348.
>>>
>>>          In the interest of time, is anyone opposed to the CWG direction of requiring both LF and CR+LF in portable UTF-8 source files for C++23 with support for other new-line sequences left to a future standard?
>>>
>>>
>>>      Actually, CWG changed direction in the late afternoon session and decided to accept CR as a line-termination character. I'm about to upload drafting implementing that direction for discussion today.
>>      Ah, thank you, I'm sorry I missed that discussion.
>>
>>      That change resolves the inconsistency with P2348 given Corentin's explicit claim of the intent in that paper.
>>
>>      I'm personally happy with this new direction so long as implementors have no concerns (and it seems we already have confirmation that EDG and Clang have no concerns).
>>
>>      Given that we already had consensus for P2348 in SG16 and EWG, assuming no new objections are raised, ship it.
>>
>> Not an objection, mostly just clarifying intent: given a source file that contains "#define a \<LF><CR> b", is a conforming implementation required to treat the "a" macro as being empty and the "b" as being on a separate line (as GCC does), or is it still permitted to treat the "b" as being on the same line as the "a" because the <LF><CR> is treated as an escaped new-line sequence (as Clang does)?
> No, not if the source file is a "UTF-8 file" per phase 1.

Richard's question wasn't a yes/no question, but Jens' response appears
to favor the gcc behavior in which <LF> and <CR> each contribute a
new-line. I agree.

Since Unicode does not recognize LF+CR as a single new-line (as it does
for CR+LF), I think the gcc behavior is preferred for portable UTF-8 files.

I assume you're referring to UTR#13 here? Yeah, seems reasonable to follow that in portable Unicode UTF-8 mode.

UAX #14 (Unicode Line Breaking Algorithm) actually (UAX #13 (Unicode Newline Guidelines) was incorporated into the core specification). I had consulted the "Non-tailorable Line Breaking Classes" section of Table 1.

Tom.

 
> (Note that clang is internally inconsistent here; see the __LINE__ example on the core wiki,
> which shows that LF CR is considered two lines in other contexts.)
>
>> I don't think LF CR is at all common these days -- I think it was only really used on the BBC Micro and on Acorn RISC PCs, but those both still exist, and Wikipedia says the Acorn C/C++ compiler suite had a release earlier this year. Hopefully we're not going to break line continuations in all of their macros :)
> Again, such files can be supported in the "non-UTF-8 mode" of phase 1.

Agreed.

Sure.
 
Tom.

>
> Jens
>
>
>>      Tom.
>>
>>>      I don't know about the ubiquity of that support, but the EDG front end has it as a build-time configuration option that customers can enable or not, as they choose. Here's the description of the flag (note that it cites gcc's processing as its basis):
>>>
>>>          /*
>>>          Flag that is TRUE to indicate that carriage return or carriage return
>>>          followed by newline can be used as a line terminator in GNU-compatible
>>>          modes.  This feature is provided to allow files with old MacOS line
>>>          terminators to be accepted.  The implementation is compatible with the way
>>>          in which the GNU compiler handles such line terminators.  It is disabled by
>>>          default because it is not required by most users.
>>>          */
>>>
>>>       
>>      _______________________________________________
>>      Core mailing list
>>      Core@lists.isocpp.org <mailto:Core@lists.isocpp.org>
>>      Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core <https://lists.isocpp.org/mailman/listinfo.cgi/core>
>>      Link to this post: http://lists.isocpp.org/core/2022/11/13459.php <http://lists.isocpp.org/core/2022/11/13459.php>
>>
>>