C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] u+000d carriage return in source files

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 22 Dec 2025 16:12:34 +0100
On Wed, Dec 17, 2025, 22:35 Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:

> I suspect this is all deliberately designed and specified for legacy
> reasons, but want to confirm that we are happy with the status quo.
>
> When we map a UTF-8 file in phase on of translation, we lose all u+000d
> carriage return code units in favor or new-line characters.
>
> However, if we instead map any other implementation-defined encoding, we
> can retain u+000d carriage return characters along-side the new-line
> characters. That is because we apply the DOS-line-ending transformation on
> only the UTF-8 part of phase 1 — would it make sense to move that rewrite
> to after either kind of encoding has been mapped?


> Looking into phase 2, that kind of transform is very similar to ignoring
> any leading BOM, would it make sense to move that whole transform into
> phase 2?
>

sure, we could do that. but "new-line" is not really specified anyway, we
might want to clarify we mean Unicode line break (p2348 tried to do that,
or do something along this line. Including fixing that new line has a
different meaning in the library wording)

>
That would have the nice property of restoring u+000d carriage return code
> units in raw string literals, but would also be a very observable change of
> behavior.
>

I think we should actively not try to do that.
In general tooling makes it impossible to guarantee these things are
preserved by ides source control software etc so adding any such guarantee
would be a bit pointless.

Additionally that wouldn't help when the execution encoding is some flavor
of EBCDIC or target some environment that is unfamiliar with \r\n.

To some extent the line ending you might want is more a property of the
target platform than of the source code. As you said, both Clang and MSVC
normalize to \n regardless of source encoding, and I don't think we would
be willing to change that.



The wording certainly could be a bit more precise.


Cheers!


Corentin

P.S: Some teletypes, such as the Model 28, can be configured to return the
carriage automatically on line feed.



> Is this a topic worth raising (post C++26)?
> Something that is more hassle than any value it might bring?
> Or something long since beaten to death and I am simply not looking in the
> right place for the design notes?
>
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2025/12/4653.php
>

Received on 2025-12-22 15:12:47