On Fri, Jun 10, 2022 at 5:08 AM Jens Maurer via Core <core@lists.isocpp.org> wrote:

On 10/06/2022 10.02, Corentin via SG16 wrote:

> It's also very repetitive but maybe we can massage that a bit.

I'm not seeing serious repetition if you take phrases such
as "UTF-8 code units" as words of power.

+1.

> Lastly, I really don't like the " There are no end-of-line indicators apart from the content of the UTF-8 code unit sequence" which is more confusing than enlightening.

I'm fine with removing the note, but I would like to see
the parenthetical

"(introducing new-line characters for end-of-line indicators)"

restored for the "any other kind" case.
(Omitting the parenthetical feels like a regression.)

+1.

> It's also unfortunate that the utf-8-ness is tied to a medium rather than the content,

I don't follow. We can't rely on "content" alone, because we want to diagnose
ill-formed UTF-8 code units. If we relied on "content" alone, an ill-formed
UTF-8 code unit would, by definition, make the source file "not UTF-8", and we'd
lose the diagnostic.

+1. I considered the possibility of defining "UTF-8 source file" as a well-formed sequence of UTF-8 code units", and I rejected that because I want to be able to talk about an "ill-formed UTF-8 file".