C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Peter Brett <pbrett_at_[hidden]>
Date: Wed, 22 Jun 2022 08:17:28 +0000
Hi Corentin,

On balance, I think this is fine.

I thought that an editor’s note in a paper was to guide the project editors in integrating the wording into the working draft, but the editor’s note you’ve written doesn’t seem to relate to that. Please correct me if I’m wrong 😊

The only tiny change I would make, if it was up to me, would be to strike “In other words,” from the start of the note.

Best wishes,

               Peter

From: Corentin <corentin.jabot_at_[hidden]>
Sent: 22 June 2022 09:12
To: Jens Maurer <Jens.Maurer_at_[hidden]>
Cc: William M. (Mike) Miller <william.m.miller_at_[hidden]om>; C++ Core Language Working Group <core_at_[hidden]>; Peter Brett <pbrett_at_[hidden]>; SG16 <sg16_at_[hidden]>; Alisdair Meredith <alisdairm_at_[hidden]>
Subject: Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

EXTERNAL MAIL
Hey folks.

Updated paper here: https://isocpp.org/files/papers/D2295R6.pdf<https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2295R6.pdf__;!!EHscmS1ygiU1lA!GlFljO-A5yunNeABraODehU5bwUKKh8Z3PnK5UHod_oubIJc_09zJPJDpsRmUg-TZ6s3oKW6PBZQJlWJZU-fg98$>
I applied the changes requested and would like to take a vote at the next meeting on this wording.

I will be honest, I hesitated dropping the paper, as I find it unfortunate to conflate the encoding of text with the way the bytes are stored physically, and I really do not think the standard
should be explicit about any one storage method specificity.

That being said /physical source/input/ is a nice improvement. win some, lose some.
And ultimately, the important thing is that the intent of the paper be standardized even if we can't completely agree on wording.


Thanks,
Corentin


On Sun, Jun 12, 2022 at 4:05 PM Jens Maurer <Jens.Maurer_at_[hidden]<mailto:Jens.Maurer_at_[hidden]>> wrote:
On 11/06/2022 18.14, William M. (Mike) Miller wrote:
> On Sat, Jun 11, 2022 at 11:55 AM Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]> <mailto:corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>> wrote:
>
> We had previous discussions in SG16 and conversation with Hubert (who is the main stakeholder) where we all agreed that the final wording we want is
> "characters are mapped, in an implementation-defined manner, to a sequence of translation character set elements."
> Whether we do this in this paper or the other, I fundamentally do not care as long as we land on that wording.
>
> "end-of-line indicators" is an ill-defined term that was meant to cover the record case specifically , but the reformulation does that as well.
>
>
> "End-of-line indicator" is no less well-defined than "file" - it's a reference to a common concept that is not otherwise defined by this standard. If the objection is to the word "indicator," I'd be happy with a formulation like "ends of lines are represented..."

In particular in translation phase 1, I think we have to live with a few
hand-wavy terms. I think "end-of-line indicator" is good as-is,
and is pre-existing text ostensibly not in scope for the paper,
so (procedurally) fewer changes imply fewer things impeding consensus.

> The P2348 phrasing sweeps too much under the carpet. In fact, it gives the impression that only characters present in the input file can be represented in the result, which is not the case with the record-oriented file representation.

Agreed. I'm fine with using the P2348 normative formulation,
but the parenthetical (or a slight alteration thereof) should
stay regardless.

> I'd be okay with making this a note, since it's attempting to constrain implementation-defined behavior, which is sort of suspect, but I think the expectation that you get new-lines for end-of-records needs to be explicit and not just assumed.

I'm fine with a note.

off-topic: I disagree with P2348's approach of keeping end-of-line variations
throughout phases 1-4; there is no need for that. In particular, raw string
literals give you LF (only).

Jens

Received on 2022-06-22 08:17:41