C++ Logo


Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: William M. (Mike) Miller <"William>
Date: Fri, 10 Jun 2022 09:09:53 -0400
On Fri, Jun 10, 2022 at 5:08 AM Jens Maurer via Core <core_at_[hidden]>

> On 10/06/2022 10.02, Corentin via SG16 wrote:
> > It's also very repetitive but maybe we can massage that a bit.
> I'm not seeing serious repetition if you take phrases such
> as "UTF-8 code units" as words of power.


> > Lastly, I really don't like the " There are no end-of-line indicators
> apart from the content of the UTF-8 code unit sequence" which is more
> confusing than enlightening.
> I'm fine with removing the note, but I would like to see
> the parenthetical
> "(introducing new-line characters for end-of-line indicators)"
> restored for the "any other kind" case.
> (Omitting the parenthetical feels like a regression.)


> > It's also unfortunate that the utf-8-ness is tied to a medium rather
> than the content,
> I don't follow. We can't rely on "content" alone, because we want to
> diagnose
> ill-formed UTF-8 code units. If we relied on "content" alone, an
> ill-formed
> UTF-8 code unit would, by definition, make the source file "not UTF-8",
> and we'd
> lose the diagnostic.

+1. I considered the possibility of defining "UTF-8 source file" as a
well-formed sequence of UTF-8 code units", and I rejected that because I
want to be able to talk about an "ill-formed UTF-8 file".

Received on 2022-06-10 13:10:04