C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 22 Jun 2022 10:11:58 +0200
Hey folks.

Updated paper here: https://isocpp.org/files/papers/D2295R6.pdf
I applied the changes requested and would like to take a vote at the next
meeting on this wording.

I will be honest, I hesitated dropping the paper, as I find it unfortunate
to conflate the encoding of text with the way the bytes are stored
physically, and I really do not think the standard
should be explicit about any one storage method specificity.

That being said /physical source/input/ is a nice improvement. win some,
lose some.
And ultimately, the important thing is that the intent of the paper be
standardized even if we can't completely agree on wording.


Thanks,
Corentin


On Sun, Jun 12, 2022 at 4:05 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 11/06/2022 18.14, William M. (Mike) Miller wrote:
> > On Sat, Jun 11, 2022 at 11:55 AM Corentin <corentin.jabot_at_[hidden]
> <mailto:corentin.jabot_at_[hidden]>> wrote:
> >
> > We had previous discussions in SG16 and conversation with Hubert
> (who is the main stakeholder) where we all agreed that the final wording we
> want is
> > "characters are mapped, in an implementation-defined manner, to a
> sequence of translation character set elements."
> > Whether we do this in this paper or the other, I fundamentally do
> not care as long as we land on that wording.
> >
> > "end-of-line indicators" is an ill-defined term that was meant to
> cover the record case specifically , but the reformulation does that as
> well.
> >
> >
> > "End-of-line indicator" is no less well-defined than "file" - it's a
> reference to a common concept that is not otherwise defined by this
> standard. If the objection is to the word "indicator," I'd be happy with a
> formulation like "ends of lines are represented..."
>
> In particular in translation phase 1, I think we have to live with a few
> hand-wavy terms. I think "end-of-line indicator" is good as-is,
> and is pre-existing text ostensibly not in scope for the paper,
> so (procedurally) fewer changes imply fewer things impeding consensus.
>
> > The P2348 phrasing sweeps too much under the carpet. In fact, it gives
> the impression that only characters present in the input file can be
> represented in the result, which is not the case with the record-oriented
> file representation.
>
> Agreed. I'm fine with using the P2348 normative formulation,
> but the parenthetical (or a slight alteration thereof) should
> stay regardless.
>
> > I'd be okay with making this a note, since it's attempting to constrain
> implementation-defined behavior, which is sort of suspect, but I think the
> expectation that you get new-lines for end-of-records needs to be explicit
> and not just assumed.
>
> I'm fine with a note.
>
> off-topic: I disagree with P2348's approach of keeping end-of-line
> variations
> throughout phases 1-4; there is no need for that. In particular, raw
> string
> literals give you LF (only).
>
> Jens
>

Received on 2022-06-22 08:12:10