C++ Logo


Advanced search

Subject: Re: Correct UTF-8 handling during phase 1 of translation
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2021-03-17 18:05:42

On 17/03/2021 18.29, Corentin via SG16 wrote:
> Hello,
> New revision of the paper which says "there should be some mechanism for the compiler to get fed utf-8 data for portability sake" paper.
> I've removed the whitespace discussion out of this paper, I think we can ignore it until there is a paper that only deals with that, if ever :)
> I'd love feedback before a future meeting

The term "Unicode scalar value" doesn't exist in ISO 10646.
Please replace by "UCS scalar value".

The paper defines UTF-8 source file and then says
"How the character set of a source file is determined is implementation-defined."

I'm unsure whether UTF-8 is intended to be a "character set" in that sense.

Also, we have
"An implementation accepts UTF-8 source files; The set of additional source
file character sets accepted is implementation-defined."

UTF-8 is not a character set, so "additional" feels like a category
error here.

From a presentation standpoint, I think it would be best if
UTF-8 source files would be handled first and completely in
the phase 1 description, and then all the implementation-defined
"other" stuff would be described. We could just say
"For any other source file, ..." and leave the existing text as-is.


SG16 list run by sg16-owner@lists.isocpp.org