C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sun, 12 Jun 2022 16:05:00 +0200
On 11/06/2022 18.14, William M. (Mike) Miller wrote:
> On Sat, Jun 11, 2022 at 11:55 AM Corentin <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>> wrote:
>
> We had previous discussions in SG16 and conversation with Hubert (who is the main stakeholder) where we all agreed that the final wording we want is
> "characters are mapped, in an implementation-defined manner, to a sequence of translation character set elements."
> Whether we do this in this paper or the other, I fundamentally do not care as long as we land on that wording.
>
> "end-of-line indicators" is an ill-defined term that was meant to cover the record case specifically , but the reformulation does that as well.
>
>
> "End-of-line indicator" is no less well-defined than "file" - it's a reference to a common concept that is not otherwise defined by this standard. If the objection is to the word "indicator," I'd be happy with a formulation like "ends of lines are represented..."

In particular in translation phase 1, I think we have to live with a few
hand-wavy terms. I think "end-of-line indicator" is good as-is,
and is pre-existing text ostensibly not in scope for the paper,
so (procedurally) fewer changes imply fewer things impeding consensus.

> The P2348 phrasing sweeps too much under the carpet. In fact, it gives the impression that only characters present in the input file can be represented in the result, which is not the case with the record-oriented file representation.

Agreed. I'm fine with using the P2348 normative formulation,
but the parenthetical (or a slight alteration thereof) should
stay regardless.

> I'd be okay with making this a note, since it's attempting to constrain implementation-defined behavior, which is sort of suspect, but I think the expectation that you get new-lines for end-of-records needs to be explicit and not just assumed.

I'm fine with a note.

off-topic: I disagree with P2348's approach of keeping end-of-line variations
throughout phases 1-4; there is no need for that. In particular, raw string
literals give you LF (only).

Jens

Received on 2022-06-12 14:05:14