C++ Logo

sg16

Advanced search

Re: [SG16] Correct UTF-8 handling during phase 1 of translation

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 18 Mar 2021 00:05:42 +0100
On 17/03/2021 18.29, Corentin via SG16 wrote:
> Hello,
>
> New revision of the paper which says "there should be some mechanism for the compiler to get fed utf-8 data for portability sake" paper.
>
> I've removed the whitespace discussion out of this paper, I think we can ignore it until there is a paper that only deals with that, if ever :)
>
> I'd love feedback before a future meeting

The term "Unicode scalar value" doesn't exist in ISO 10646.
Please replace by "UCS scalar value".

The paper defines UTF-8 source file and then says
"How the character set of a source file is determined is implementation-defined."

I'm unsure whether UTF-8 is intended to be a "character set" in that sense.

Also, we have
"An implementation accepts UTF-8 source files; The set of additional source
file character sets accepted is implementation-defined."

UTF-8 is not a character set, so "additional" feels like a category
error here.

From a presentation standpoint, I think it would be best if
UTF-8 source files would be handled first and completely in
the phase 1 description, and then all the implementation-defined
"other" stuff would be described. We could just say
"For any other source file, ..." and leave the existing text as-is.

Jens

Received on 2021-03-17 18:05:47