C++ Logo

SG16

Advanced search

Subject: Re: SG16 approval of P2295R5 and P2362R1
From: Tom Honermann (tom_at_[hidden])
Date: 2021-07-27 23:15:19


On 7/27/21 6:34 PM, Hubert Tong wrote:
> On Mon, Jul 26, 2021 at 11:44 AM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> SG16 approved forwarding a draft of P2295R5
> <https://wg21.link/p2295r5> (Support for UTF-8 as a portable
> source file encoding) and P2362R0 <https://wg21.link/p2362r0>
> (Make obfuscating wide character literals ill-formed) with minor
> modifications to EWG during its July 14th telecon
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#july-14th-2021>. 
> All requested SG16 changes are present in the published versions
> of P2295R5 <https://wg21.link/p2295r5> and P2362R1
> <https://wg21.link/p2362r1> that appear in the most recent mailing
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/#mailing2021-07>
> (note that P2362R1 <https://wg21.link/p2362r1> sports a new title).
>
> These papers are now ready for review by EWG and the Github issue
> tracker <https://github.com/cplusplus/papers/issues> has been
> updated accordingly.  Both papers have wording that has been
> reviewed by a core expert and each reflects existing
> implementation practice.
>
> I will note that P2295's treatment of end-of-line indicators for UTF-8
> source files has not yet been implemented (to my knowledge) on
> platforms where text files traditionally have "out-of-band" line
> length information. I am not aware of technical limitations that
> prevent having a convention that works in the manner P2295 indicates,
> so this comment is for information only.

Thank you for that correction, Hubert.

Is there a de-facto standard convention for how text files that
originate on other platforms are translated to such an environment?  For
example, are new-line sequences in the original file removed in favor of
such out-of-band information?  Or are they typically preserved?  If
preserved, I imagine they may not correlate with the out-of-band line
information.  Are there multiple new-line sequence forms in practice?

I'm asking because I would like to better understand the impact to
programmers.  Given a UTF-8 encoded file on another platform, in
practice, are there multiple ways in which such a file might be
translated for this environment?  If so, is there a dominant representation?

Tom.

> P2295 <https://wg21.link/p2295r5> has also been reviewed by SG22
> (C/C++ Liaison) and has not been tagged for review by any other
> SGs. P2362 <https://wg21.link/p2362> still awaits SG22 review, so
> I encourage the EWG and SG22 chairs to coordinate to determine if
> EWG review should await SG22's review.
>
> Thank you to both authors for the time and patience they exhibited
> throughout the reviews of these papers; particularly with regard
> to finding wording for P2295 <https://wg21.link/p2295r5>.
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>



SG16 list run by sg16-owner@lists.isocpp.org