On Wed, Jun 22, 2022 at 8:12 AM William M. (Mike) Miller via Core <core@lists.isocpp.org> wrote:

On Wed, Jun 22, 2022 at 4:12 AM Corentin <corentin.jabot@gmail.com> wrote:
Hey folks.

Updated paper here: https://isocpp.org/files/papers/D2295R6.pdf
I applied the changes requested and would like to take a vote at the next meeting on this wording.

I will be honest, I hesitated dropping the paper, as I find it unfortunate to conflate the encoding of text with the way the bytes are stored physically, and I really do not think the standard
should be explicit about any one storage method specificity.

That being said /physical source/input/ is a nice improvement. win some, lose some.
And ultimately, the important thing is that the intent of the paper be standardized even if we can't completely agree on wording.

I'm reasonably happy with the new wording. I'd still prefer changing "introducing" to "representing", or adding "if necessary" in the sentence about new-lines; if the input file already delimits lines with a new-line character, there's no "introducing" taking place. The existing wording appears to require that the "implementation-defined manner" must include deleting new-lines that appear in the input and "introducing" new-lines to replace them in the resulting logical source.

With regard to the editor's note in the paper, I'm sympathetic to the concerns about record-oriented UTF-8 files. However, it seems to me that normatively supporting that category would require some changes in the specification of the UTF-8 case, since the current wording for UTF-8 files implies that there is an exact one-to-one correspondence between the code units of the input and the sequence of elements of the translation character set, which would likely not be the case for the record-oriented UTF-8 (because there would presumably not be new-line code units but only record boundaries in the input). Maybe it would be sufficient to change "are a sequence" to "contain a sequence" in the first paragraph and to change the second paragraph to "...sequence of UCS scalar values (introducing new-line characters, if necessary, for end-of-line indicators) that constitutes the sequence..."?

Implementations are free to support record-oriented source files containing UTF-8 as being non-"UTF-8 source file"s. Changing the definition of UTF-8 source file will harm portability of source files by possibly encouraging non-trivial transfer mechanisms.

_______________________________________________
Core mailing list
Core@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2022/06/12839.php