C++ Logo

SG16

Advanced search

Subject: Re: SG16 approval of P2295R5 and P2362R1
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-07-28 01:04:45


On Wed, Jul 28, 2021, 06:15 Tom Honermann <tom_at_[hidden]> wrote:

> On 7/27/21 6:34 PM, Hubert Tong wrote:
>
> On Mon, Jul 26, 2021 at 11:44 AM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> SG16 approved forwarding a draft of P2295R5 <https://wg21.link/p2295r5>
>> (Support for UTF-8 as a portable source file encoding) and P2362R0
>> <https://wg21.link/p2362r0> (Make obfuscating wide character literals
>> ill-formed) with minor modifications to EWG during its July 14th telecon
>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#july-14th-2021>.
>> All requested SG16 changes are present in the published versions of
>> P2295R5 <https://wg21.link/p2295r5> and P2362R1
>> <https://wg21.link/p2362r1> that appear in the most recent mailing
>> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/#mailing2021-07>
>> (note that P2362R1 <https://wg21.link/p2362r1> sports a new title).
>>
>> These papers are now ready for review by EWG and the Github issue tracker
>> <https://github.com/cplusplus/papers/issues> has been updated
>> accordingly. Both papers have wording that has been reviewed by a core
>> expert and each reflects existing implementation practice.
>>
> I will note that P2295's treatment of end-of-line indicators for UTF-8
> source files has not yet been implemented (to my knowledge) on platforms
> where text files traditionally have "out-of-band" line length information.
> I am not aware of technical limitations that prevent having a convention
> that works in the manner P2295 indicates, so this comment is for
> information only.
>
> Thank you for that correction, Hubert.
>
> Is there a de-facto standard convention for how text files that originate
> on other platforms are translated to such an environment? For example, are
> new-line sequences in the original file removed in favor of such
> out-of-band information? Or are they typically preserved? If preserved, I
> imagine they may not correlate with the out-of-band line information. Are
> there multiple new-line sequence forms in practice?
>
> I'm asking because I would like to better understand the impact to
> programmers. Given a UTF-8 encoded file on another platform, in practice,
> are there multiple ways in which such a file might be translated for this
> environment? If so, is there a dominant representation?
>

Do we have a list of platforms currently in use that store C++ source files
in thar manner ( as opposed to program data for example )?

Regardless, for such platform, we can imagine there is a phase 0 that
presents a unified view of the physical source... data set as a file.

The intent of the paper being that source files can be compiled portably,
if the platform can't read files, some process would be necessary to
transform the file to a data set long before phase 1 and because that
process can replace line breaks anyway....

The only requirement is that was is ultimately fed to the compiler is valid
UTF-8 - a stream of bytes produced in some fashion.

By the same token I find it rather unfortunate that we have now two notes
for these platforms using data sets while their use case is already covered
by normative wording ( "implementation defined mapping" cover this use
case)...

Tom.
>
> P2295 <https://wg21.link/p2295r5> has also been reviewed by SG22 (C/C++
>> Liaison) and has not been tagged for review by any other SGs. P2362
>> <https://wg21.link/p2362> still awaits SG22 review, so I encourage the
>> EWG and SG22 chairs to coordinate to determine if EWG review should await
>> SG22's review.
>>
>> Thank you to both authors for the time and patience they exhibited
>> throughout the reviews of these papers; particularly with regard to finding
>> wording for P2295 <https://wg21.link/p2295r5>.
>>
>> Tom.
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>
>



SG16 list run by sg16-owner@lists.isocpp.org