C++ Logo

sg16

Advanced search

Re: [SG16] P2295R3 Support for UTF-8 as a portable source file encoding

From: Charlie Barto <Charles.Barto_at_[hidden]>
Date: Fri, 30 Apr 2021 04:26:20 +0000
Yeah, that's what I meant.

My concern was with the "the scalar value of each source character shall be preserved" in the below

"A UTF-8 file is a source file encoded with the UTF-8 encoding scheme defined in ISO/IEC 10646. An implementation shall support UTF-8 files. If the source file is determined to be a UTF-8 file, it shall represent a well-formed sequence of UTF-8 code units and the scalar value of each source character shall be preserved."

My concern is that if you write a string literal with Unicode characters in it and the compiler converts them to GB18030 that's not "preserving the scalar value" I don't understand translation phases very well, so feel free to tell me that's somehow handled later on.

From: Peter Brett <pbrett_at_[hidden]>
Sent: Thursday, April 29, 2021 3:34 AM
To: Charlie Barto <Charles.Barto_at_[hidden]>
Cc: Corentin <corentin.jabot_at_[hidden]>; sg16_at_[hidden]
Subject: RE: [SG16] P2295R3 Support for UTF-8 as a portable source file encoding

Hi Charlie,

I'm going to assume that:


  * by 'source character set' you mean the encoding scheme of the source file
  * by 'execution character set' you mean the encoding scheme used for ordinary string literals in the compiled executable

In that case, no - as I understand it this wording does not affect the conformance of an implementation where the literal encoding is GB18030. Please could you clarify what it was about the phase 1 changes that caused concern?

Thanks!

              Peter

From: SG16 <sg16-bounces_at_[hidden]<mailto:sg16-bounces_at_[hidden]>> On Behalf Of Charlie Barto via SG16
Sent: 29 April 2021 09:54
To: sg16_at_[hidden]<mailto:sg16_at_[hidden]>
Cc: Charlie Barto <Charles.Barto_at_[hidden]<mailto:Charles.Barto_at_[hidden]>>; Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>
Subject: Re: [SG16] P2295R3 Support for UTF-8 as a portable source file encoding

EXTERNAL MAIL
Does that first change to lex.phases make the case where source character set is utf8 and execution character set is some oddball encoding (like gb18030) I'll formed non-conforming?

Get Outlook for iOS<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Faka.ms%2Fo0ukef__%3B!!EHscmS1ygiU1lA!TGJVOeDR4D9YtxenASOJ-opVy7E39jQlKFuBmO063U90BTMPpwm-wrEAz5kvhQ%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C957da1d2471344c47b0308d90afa478a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552892619569701%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ibibrW42Lzvm008B4gBF5Abnuyr5DTWIZUH5Xge5qJg%3D&reserved=0>
________________________________
From: SG16 <sg16-bounces_at_[hidden]<mailto:sg16-bounces_at_[hidden]>> on behalf of Corentin via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>>
Sent: Thursday, April 29, 2021 12:34:35 AM
To: SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>>
Cc: Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>>
Subject: [SG16] P2295R3 Support for UTF-8 as a portable source file encoding

Per request in yesterday's meeting,
here is P2295R3 Support for UTF-8 as a portable source file encoding.

I am looking forward to your feedback

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2295r3.pdf<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp*3A*2F*2Fwww.open-std.org*2Fjtc1*2Fsc22*2Fwg21*2Fdocs*2Fpapers*2F2021*2Fp2295r3.pdf%26data%3D04*7C01*7CCharles.Barto*40microsoft.com*7C16b7089d2ecf4d0bf73408d90ae14776*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637552785381773715*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26sdata%3DaXgu2D*2F4OkYKpVYZpJuOr5nB*2B*2F8lAwEyLq2*2Bnc*2FQxi4*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!EHscmS1ygiU1lA!TGJVOeDR4D9YtxenASOJ-opVy7E39jQlKFuBmO063U90BTMPpwm-wrHNaOEk1w%24&data=04%7C01%7CCharles.Barto%40microsoft.com%7C957da1d2471344c47b0308d90afa478a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552892619579654%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1uNcyaWOQ%2FPt67iyywpyKxgO6Kmaqv7jfFFJPS%2B0DMA%3D&reserved=0>

Received on 2021-04-29 23:26:24