Date: Wed, 17 Dec 2025 16:34:53 -0500
I suspect this is all deliberately designed and specified for legacy reasons, but want to confirm that we are happy with the status quo.
When we map a UTF-8 file in phase on of translation, we lose all u+000d carriage return code units in favor or new-line characters.
However, if we instead map any other implementation-defined encoding, we can retain u+000d carriage return characters along-side the new-line characters. That is because we apply the DOS-line-ending transformation on only the UTF-8 part of phase 1 — would it make sense to move that rewrite to after either kind of encoding has been mapped?
Looking into phase 2, that kind of transform is very similar to ignoring any leading BOM, would it make sense to move that whole transform into phase 2?
That would have the nice property of restoring u+000d carriage return code units in raw string literals, but would also be a very observable change of behavior.
Is this a topic worth raising (post C++26)?
Something that is more hassle than any value it might bring?
Or something long since beaten to death and I am simply not looking in the right place for the design notes?
AlisdairM
When we map a UTF-8 file in phase on of translation, we lose all u+000d carriage return code units in favor or new-line characters.
However, if we instead map any other implementation-defined encoding, we can retain u+000d carriage return characters along-side the new-line characters. That is because we apply the DOS-line-ending transformation on only the UTF-8 part of phase 1 — would it make sense to move that rewrite to after either kind of encoding has been mapped?
Looking into phase 2, that kind of transform is very similar to ignoring any leading BOM, would it make sense to move that whole transform into phase 2?
That would have the nice property of restoring u+000d carriage return code units in raw string literals, but would also be a very observable change of behavior.
Is this a topic worth raising (post C++26)?
Something that is more hassle than any value it might bring?
Or something long since beaten to death and I am simply not looking in the right place for the design notes?
AlisdairM
Received on 2025-12-17 21:35:11
