ISOCPP sg16 List: Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 22 Jun 2022 22:36:16 -0400

On Wed, Jun 22, 2022 at 4:12 AM Corentin via Core <core_at_[hidden]>
wrote:

> Hey folks.
>
> Updated paper here: https://isocpp.org/files/papers/D2295R6.pdf
> I applied the changes requested and would like to take a vote at the next
> meeting on this wording.
>

Looks good to me.

>
> I will be honest, I hesitated dropping the paper, as I find it unfortunate
> to conflate the encoding of text with the way the bytes are stored
> physically, and I really do not think the standard
> should be explicit about any one storage method specificity.
>
> That being said /physical source/input/ is a nice improvement. win some,
> lose some.
> And ultimately, the important thing is that the intent of the paper be
> standardized even if we can't completely agree on wording.
>
>
> Thanks,
> Corentin
>
>
> On Sun, Jun 12, 2022 at 4:05 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 11/06/2022 18.14, William M. (Mike) Miller wrote:
>> > On Sat, Jun 11, 2022 at 11:55 AM Corentin <corentin.jabot_at_[hidden]
>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>> >
>> > We had previous discussions in SG16 and conversation with Hubert
>> (who is the main stakeholder) where we all agreed that the final wording we
>> want is
>> > "characters are mapped, in an implementation-defined manner, to a
>> sequence of translation character set elements."
>> > Whether we do this in this paper or the other, I fundamentally do
>> not care as long as we land on that wording.
>> >
>> > "end-of-line indicators" is an ill-defined term that was meant to
>> cover the record case specifically , but the reformulation does that as
>> well.
>> >
>> >
>> > "End-of-line indicator" is no less well-defined than "file" - it's a
>> reference to a common concept that is not otherwise defined by this
>> standard. If the objection is to the word "indicator," I'd be happy with a
>> formulation like "ends of lines are represented..."
>>
>> In particular in translation phase 1, I think we have to live with a few
>> hand-wavy terms. I think "end-of-line indicator" is good as-is,
>> and is pre-existing text ostensibly not in scope for the paper,
>> so (procedurally) fewer changes imply fewer things impeding consensus.
>>
>> > The P2348 phrasing sweeps too much under the carpet. In fact, it gives
>> the impression that only characters present in the input file can be
>> represented in the result, which is not the case with the record-oriented
>> file representation.
>>
>> Agreed. I'm fine with using the P2348 normative formulation,
>> but the parenthetical (or a slight alteration thereof) should
>> stay regardless.
>>
>> > I'd be okay with making this a note, since it's attempting to constrain
>> implementation-defined behavior, which is sort of suspect, but I think the
>> expectation that you get new-lines for end-of-records needs to be explicit
>> and not just assumed.
>>
>> I'm fine with a note.
>>
>> off-topic: I disagree with P2348's approach of keeping end-of-line
>> variations
>> throughout phases 1-4; there is no need for that. In particular, raw
>> string
>> literals give you LF (only).
>>
>> Jens
>>
> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2022/06/12838.php
>

Received on 2022-06-23 02:36:47