C++ Logo

SG16

Advanced search

Subject: Re: Correct UTF-8 handling during phase 1 of translation
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-03-25 09:11:39


On Thu, Mar 18, 2021 at 12:05 AM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 17/03/2021 18.29, Corentin via SG16 wrote:
> > Hello,
> >
> > New revision of the paper which says "there should be some mechanism for
> the compiler to get fed utf-8 data for portability sake" paper.
> >
> > I've removed the whitespace discussion out of this paper, I think we can
> ignore it until there is a paper that only deals with that, if ever :)
> >
> > I'd love feedback before a future meeting
>
> The term "Unicode scalar value" doesn't exist in ISO 10646.
> Please replace by "UCS scalar value".
>
> The paper defines UTF-8 source file and then says
> "How the character set of a source file is determined is
> implementation-defined."
>
> I'm unsure whether UTF-8 is intended to be a "character set" in that sense.
>
> Also, we have
> "An implementation accepts UTF-8 source files; The set of additional source
> file character sets accepted is implementation-defined."
>
> UTF-8 is not a character set, so "additional" feels like a category
> error here.
>

You are right, final R1 version https://isocpp.org/files/papers/P2295R1.pdf
I unfortunately missed the deadline

>
> From a presentation standpoint, I think it would be best if
> UTF-8 source files would be handled first and completely in
> the phase 1 description, and then all the implementation-defined
> "other" stuff would be described. We could just say
> "For any other source file, ..." and leave the existing text as-is.
>
> Jens
>



SG16 list run by sg16-owner@lists.isocpp.org