C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Fri, 10 Jun 2022 10:42:43 -0400
On Fri, Jun 10, 2022 at 4:02 AM Corentin <corentin.jabot_at_[hidden]> wrote:

> I'm concerned that this approach will be hard to understand by people who
> have not followed the discussions, on top of preexisting obfuscations (the
> translation set indirection).
> It's also very repetitive but maybe we can massage that a bit.
> Lastly, I really don't like the " There are no end-of-line indicators
> apart from the content of the UTF-8 code unit sequence" which is more
> confusing than enlightening.
>

This is extremely relevant if you consider that "text" being a sequence of
characters without structure is not the only way you can look at text.


> It's also unfortunate that the utf-8-ness is tied to a medium rather than
> the content, and that we can't agree that source code is text, or that any
> textual data consumed by an implementation has an associated encoding.
>

What we do not seem to agree on is whether or not "text" can be taken as
structured by lines and the such.

I truly am trying to convey the intent of the paper through to places where
certain assumptions about the nature of text files do not match the native
ones. If the wording does not include hooks to point out that certain
paradigms are not meant to extend into the world of portable, UTF-8 source
code, then we'll likely end up with "UTF-8 source code" that isn't
portable. It would not be caused by "hostility" from any party, merely a
failure of the wording to clarify the intent.

Received on 2022-06-10 14:43:12