C++ Logo


Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sat, 11 Jun 2022 17:55:19 +0200
We had previous discussions in SG16 and conversation with Hubert (who is
the main stakeholder) where we all agreed that the final wording we want is
"characters are mapped, in an implementation-defined manner, to a sequence
of translation character set elements."
Whether we do this in this paper or the other, I fundamentally do not care
as long as we land on that wording.

"end-of-line indicators" is an ill-defined term that was meant to cover the
record case specifically , but the reformulation does that as well.
The CRLF issue is pre existing, and it is solved not by some fuzzy phase 1
introduction of different characters out of thin air, but by considering
the CRLF sequence as a whole as a line break in later phases.

On Sat, Jun 11, 2022 at 5:24 PM William M. (Mike) Miller <
william.m.miller_at_[hidden]> wrote:

> On Sat, Jun 11, 2022 at 10:59 AM Corentin <corentin.jabot_at_[hidden]>
> wrote:
>> On Sat, Jun 11, 2022 at 4:52 PM William M. (Mike) Miller <
>> william.m.miller_at_[hidden]> wrote:
>>> On Sat, Jun 11, 2022 at 4:01 AM Corentin <corentin.jabot_at_[hidden]>
>>> wrote:
>> My second comment regards new-line characters and end-of-line indicators.
>>> As I understand it, there are two real-world scenarios the existing wording
>>> is intended to cover: cases where different characters or sequences (CR,
>>> CRLF) are used instead of new-lines, and record-oriented files where there
>>> is no character at the end of a line. The word "introducing" is appropriate
>>> for the latter case, but it seems incongruous for the former. Could we
>>> replace that phrase with "representing end-of-line indicators as new-line
>>> characters"?
>> This is preexisting and better addressed when we process P2348
>> Whitespaces Wording Revamp which addresses that point.
> I think it is arguably more germane to this paper than that one. This
> paper deals directly with the Phase 1 mapping of input source to the
> logical source representation, and putting new-lines into the stream is
> part of that process. P2348 deals principally with the post-Phase-1
> treatment of white space. I'd prefer to get Phase 1 clearly specified in
> this paper and make only small tweaks (like renaming new-line to
> line-break) to Phase 1 in P2348. (I don't want to get into a discussion of
> that paper in this thread, but I strongly prefer the rewording I suggested
> above to the treatment of the point in P2348, which is another reason I'd
> like to make the change here.)
> --
> William M. (Mike) Miller | Edison Design Group
> william.m.miller_at_[hidden]

Received on 2022-06-11 15:55:31