sg16: Re: [SG16] Agenda for the 2021-05-12 SG16 telecon

From: Peter Brett <pbrett_at_[hidden]>
Date: Tue, 4 May 2021 08:12:07 +0000

Hi all,

I am unlikely to be able to attend the 12th May call. However:

  * Victor and Corentin: thank you for drafting such an excellent paper on short notice
  * I am strongly in favour of adopting it as the resolution of LWG3547

Best wishes,

                Peter

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 04 May 2021 05:06
To: SG16 <sg16_at_[hidden]>
Cc: Tom Honermann <tom_at_honermann.net>
Subject: [SG16] Agenda for the 2021-05-12 SG16 telecon

EXTERNAL MAIL

SG16 will hold a telecon on Wednesday, May 12th at 19:30 UTC (timezone conversion<https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20210512T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHtanPvp6g$>).

The agenda is:

  * D2372R1: Fixing locale handling in chrono formatters<https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2372R1.html__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuiFrxIrQ$>

     * Affirm or rebut LEWGs position.

  * P2093R5: Formatted output<https://urldefense.com/v3/__https:/wg21.link/p2093r5__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuY-HuX9A$>

     * Discuss locale dependent character encoding concerns.

  * P2295R2: Support for UTF-8 as a portable source file encoding<https://urldefense.com/v3/__https:/wg21.link/p2295r3__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHv36onUWw$>

     * Review updates intended to address prior SG16 feedback.

  * P2348R0: Whitespaces Wording Revamp<https://urldefense.com/v3/__https:/wg21.link/p2348r0__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHtOObm4JQ$>

Our last telecon was consumed by discussion<https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings/blob/master/README.md*april-28th-2021__;Iw!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHvykkfL9Q$> of LWG3547<https://urldefense.com/v3/__https:/cplusplus.github.io/LWG/issue3547__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuODrVnXA$> and possible remedies. Though we did not reach consensus on a direction forward during that telecon, Victor and Corentin, at the LEWG chair's request, drafted D2372R0, presented it at the LEWG telecon held 2021-05-03<https://urldefense.com/v3/__https:/wiki.edg.com/bin/view/Wg21telecons2021/P2372*2021-05-03__;Iw!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHswBbbIWQ$>, and LEWG reached strong consensus for it. The D2372R0 revision will be submitted for the May mailing as P2372R0; and a D2372R1<https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2372R1.html__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuiFrxIrQ$> revision addressing LEWG feedback will be submitted as P2372R1. Both revisions substantially match the proposed resolution that SG16 discussed. Since SG16 did not reach consensus on that direction, the LEWG chair has asked that we revisit it to either affirm or rebut the LEWG consensus. We will therefore (briefly) discuss and then poll that direction. Note that the poll taken in SG16 differs from the poll taken in LEWG. In SG16, we polled applying the proposed resolution to C++23 while LEWG polled applying the proposed resolution (with amendments to not change behavior for iostream manipulators) to C++23 *and* retroactively to C++20.

Once we've dispatched D2372R1, we'll return to the original agenda for our last telecon; discussion of P2093R5<https://urldefense.com/v3/__https:/wg21.link/p2093r5__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuY-HuX9A$> (Formatted output) and P2295R2<https://urldefense.com/v3/__https:/wg21.link/p2295r3__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHv36onUWw$> (Support for UTF-8 as a portable source file encoding). I've retained P2348R0<https://urldefense.com/v3/__https:/wg21.link/p2348r0__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHtOObm4JQ$> on the agenda, though I don't expect that we'll get to it.

With regard to P2093R5<https://urldefense.com/v3/__https:/wg21.link/p2093r5__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHuY-HuX9A$>, the current status is that LEWG has referred the paper back to SG16 for further discussion; please see the LEWG meeting minutes here<https://urldefense.com/v3/__https:/wiki.edg.com/bin/view/Wg21telecons2021/P2093*Library-Evolution-2021-04-06__;Iw!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHsxygNWsw$>. Specifically, LEWG would benefit from additional analysis of previously deferred questions<https://urldefense.com/v3/__http:/lists.isocpp.org/lib-ext/2021/03/18189.php__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHv45n8ffg$> regarding character encoding concerns, transcoding requirements (or the lack there of) and the ensuing consequences (or lack there of).

  1. How errors in transcoding should be handled. E.g., when transcoding from UTF-8 to a UTF-16 based console interface and the UTF-8 input is not well-formed.
  2. The choice to base behavior on the compile-time choice of literal encoding. An implication of the current proposal is that a program that contains only ASCII characters in string literals will change behavior depending on whether the literal encoding is UTF-8 vs ASCII (or some other ASCII derived encoding).
  3. Whether transcoding to the console interface encoding should be performed when the literal encoding is not UTF-8.
  4. What the implications are for future support of std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text").

I think these concerns will be easier to resolve if we first reach consensus regarding scenarios in which localized text may be provided in an unexpected encoding. The following is a slightly modified example of code Hubert previously provided. The example has been modified to explicitly opt into localized chrono formatting.

std::print("{:L%p}\n", std::chrono::system_clock::now().time_since_epoch());

At issue is the encoding used by locale sensitive chrono formatters. The example above contains the %p specifier and is locale sensitive because AM/PM designations may be localized. In a Chinese locale the desired translation of "PM" is "下午", but the locale will provide the translation in the locale encoding. As specified in P2093R5, if the literal encoding is UTF-8, than std::print() will expect the translation to be provided in UTF-8, but if the locale is not UTF-8-based (e.g., Big5; perhaps Shift-JIS for the Japanese 午後 translation), then the result is mojibake.

I had previously suggested the following possible directions we can investigate to resolve the encoding concerns.

  * Specialize std::locale facets<https://urldefense.com/v3/__https:/en.cppreference.com/w/cpp/locale/locale__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHvHXqlVgg$> and related I/O manipulators like std::put_time()<https://urldefense.com/v3/__https:/en.cppreference.com/w/cpp/io/manip/put_time__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHseoHEmTQ$> for char8_t. This would allow std::print() to, when the literal encoding is UTF-8, opt-in to use of the UTF-8/char8_t facets and I/O manipulators.
  * When the literal encoding is UTF-8, stipulate that running the program in a non-UTF-8 based locale is non-conforming. This would effectively require MSVC programmers to, when building code with the /utf-8 option, to also force selection of a UTF-8 code page via a manifest<https://urldefense.com/v3/__https:/docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHucN4W4ig$> and require use of Windows 10 build 1903 or later.
  * When the literal encoding is UTF-8, specify that non-UTF-8 based locale dependent translations be implicitly transcoded (such transcoding should never result in errors except perhaps for memory allocation failures).
  * Drop the special case handling for the literal encoding being UTF-8 and specify that, when bypassing a stream to write directly to the console, that the output be implicitly transcoded from the current locale dependent encoding (whatever it is) to the console encoding (UTF-8).

If we get through all of that, we'll review Corentin's updates in P2295R2<https://urldefense.com/v3/__https:/wg21.link/p2295r3__;!!EHscmS1ygiU1lA!RwaqdDpe1PcVanpNJIu0WO5Rgpj79_Z48fvrulbD0BdJHPSmWmxVDHv36onUWw$> to address prior SG16 feedback. Thank you to everyone that already provided additional feedback on the mailing list!

Tom.

Received on 2021-05-04 03:12:17