On Fri, Jun 10, 2022 at 4:02 AM Corentin <corentin.jabot@gmail.com> wrote:

I'm concerned that this approach will be hard to understand by people who have not followed the discussions, on top of preexisting obfuscations (the translation set indirection).
It's also very repetitive but maybe we can massage that a bit.
Lastly, I really don't like the " There are no end-of-line indicators apart from the content of the UTF-8 code unit sequence" which is more confusing than enlightening.

This is extremely relevant if you consider that "text" being a sequence of characters without structure is not the only way you can look at text.

It's also unfortunate that the utf-8-ness is tied to a medium rather than the content, and that we can't agree that source code is text, or that any textual data consumed by an implementation has an associated encoding.

What we do not seem to agree on is whether or not "text" can be taken as structured by lines and the such.

I truly am trying to convey the intent of the paper through to places where certain assumptions about the nature of text files do not match the native ones. If the wording does not include hooks to point out that certain paradigms are not meant to extend into the world of portable, UTF-8 source code, then we'll likely end up with "UTF-8 source code" that isn't portable. It would not be caused by "hostility" from any party, merely a failure of the wording to clarify the intent.