sg16: Re: [SG16] Agenda for the 2021-07-14 SG16 telecon

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 12 Jul 2021 10:13:06 +0200

On Sun, Jul 11, 2021 at 9:09 PM Hubert Tong <
hubert.reinterpretcast_at_[hidden]> wrote:

> On Sun, Jul 11, 2021 at 12:56 PM Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>>>
>>>
>>> In the third paragraph of phase 1:
>>> [ ... ], then the physical source file shall be a well-formed UTF-8
>>> sequence.
>>> Each UCS scalar value encoded in the UTF-8 sequence is mapped to the
>>> corresponding element of the translation character set.
>>>
>>
> Just to clarify: I am suggesting the above for the wording (it was not
> merely a quote providing context for the later comment). This version
> separates the diagnostic requirement from the description of the processing.
>

I purposefully avoided the term mapping here. because the set of source
characters and the set of translation set characters are the same there is
no need to specify a mapping.

>
>>
>>> I'm not sure what to make of the situation around end-of-line indicators
>>> yet. P2348, "Whitespaces Wording Revamp", is also floating in the mix.
>>>
>>
>> Indeed. P2348 is motivated by P2295.
>> I believe it's not dramatic to leave things partially hanging (there can
>> be line feed in utf-8 files and we do say that line feed is new-line in
>> P2314), but I hope we will talk about P2348 at some point.
>>
>
> For the UTF-8 case, I think a note to the effect that "there are no
> end-of-line indicators apart from the content of the UTF-8 sequence" could
> help (at least to further the discussion). For P2348, I suggest that
> "out-of-band" end-of-line indicators should remain accepted.
>
> Suggestion for P2348:
> The physical source file is mapped, in an implementation-defined manner,
> to a sequence of basic source character set elements.
>
>
>> Thanks,
>> Corentin
>>
>

Received on 2021-07-12 03:13:20