sg16: Re: [SG16] Agenda for the 2021-07-14 SG16 telecon

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Mon, 12 Jul 2021 12:05:09 +0200

On 12/07/2021 10.13, Corentin Jabot via SG16 wrote:
>
>
> On Sun, Jul 11, 2021 at 9:09 PM Hubert Tong <hubert.reinterpretcast_at_[hidden] <mailto:hubert.reinterpretcast_at_[hidden]>> wrote:
>
> On Sun, Jul 11, 2021 at 12:56 PM Corentin Jabot <corentinjabot_at_[hidden] <mailto:corentinjabot_at_[hidden]>> wrote:
>
> In the third paragraph of phase 1:
> [ ... ], then the physical source file shall be a well-formed UTF-8 sequence.
> Each UCS scalar value encoded in the UTF-8 sequence is mapped to the corresponding element of the translation character set.
>
>
> Just to clarify: I am suggesting the above for the wording (it was not merely a quote providing context for the later comment). This version separates the diagnostic requirement from the description of the processing.
>
>
> I purposefully avoided the term mapping here. because the set of source characters and the set of translation set characters are the same there is no need to specify a mapping.

The current text appears to equate sequences of UTF-8 code units with
elements of a character set. That's not correct; we first need
to parse UTF-8 to form a code point (eh, scalar value),
which is then something we can relate to elements of the translation
character set.

I think "is mapped" is fine (a 1:1 mapping is still a mapping), in particular
since we also "map" (although in an implementation-defined manner) in the
non-UTF-8 cases.

Jesn

Received on 2021-07-12 05:05:18