C++ Logo

sg16

Advanced search

Re: [SG16] Agenda for the 2021-05-12 SG16 telecon

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Tue, 11 May 2021 17:40:29 -0700
Dear Unicoders,

Here is a link to a new revision of P2093:
https://isocpp.org/files/papers/D2093R6.html. It's essentially the same as
R5 but addresses the latest LEWG feedback and adds a few clarifications.
The only change to the wording is replacing <io> with <print>.

Cheers,
Victor

On Tue, May 11, 2021 at 11:02 AM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> Reminder that this meeting is taking place tomorrow.
>
> Per suggestion by Peter, the agenda order is being changed to review the
> updates in P2295R2 before D2372R1 and P2093R5 in the hopes that we can
> forward P2295R2 to EWG. We'll try to limit that discussion to 30 minutes.
> The updated agenda is below. Again, we are unlikely to get to P2348R0 at
> all.
>
> - P2295R2: Support for UTF-8 as a portable source file encoding
> <https://wg21.link/p2295r3>
> - Review updates intended to address prior SG16 feedback.
> - D2372R1: Fixing locale handling in chrono formatters
> <https://isocpp.org/files/papers/D2372R1.html>
> - Affirm or rebut LEWGs position.
> - P2093R5: Formatted output <https://wg21.link/p2093r5>
> - Discuss locale dependent character encoding concerns.
> - P2348R0: Whitespaces Wording Revamp <https://wg21.link/p2348r0>
>
> Tom.
>
> On 5/4/21 12:06 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, May 12th at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20210512T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
> ).
>
> The agenda is:
>
> - D2372R1: Fixing locale handling in chrono formatters
> <https://isocpp.org/files/papers/D2372R1.html>
> - Affirm or rebut LEWGs position.
> - P2093R5: Formatted output <https://wg21.link/p2093r5>
> - Discuss locale dependent character encoding concerns.
> - P2295R2: Support for UTF-8 as a portable source file encoding
> <https://wg21.link/p2295r3>
> - Review updates intended to address prior SG16 feedback.
> - P2348R0: Whitespaces Wording Revamp <https://wg21.link/p2348r0>
>
> Our last telecon was consumed by discussion
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#april-28th-2021>
> of LWG3547 <https://cplusplus.github.io/LWG/issue3547> and possible
> remedies. Though we did not reach consensus on a direction forward during
> that telecon, Victor and Corentin, at the LEWG chair's request, drafted
> D2372R0, presented it at the LEWG telecon held 2021-05-03
> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2372#2021-05-03>, and
> LEWG reached strong consensus for it. The D2372R0 revision will be
> submitted for the May mailing as P2372R0; and a D2372R1
> <https://isocpp.org/files/papers/D2372R1.html> revision addressing LEWG
> feedback will be submitted as P2372R1. Both revisions substantially match
> the proposed resolution that SG16 discussed. Since SG16 did not reach
> consensus on that direction, the LEWG chair has asked that we revisit it to
> either affirm or rebut the LEWG consensus. We will therefore (briefly)
> discuss and then poll that direction. Note that the poll taken in SG16
> differs from the poll taken in LEWG. In SG16, we polled applying the
> proposed resolution to C++23 while LEWG polled applying the proposed
> resolution (with amendments to not change behavior for iostream
> manipulators) to C++23 *and* retroactively to C++20.
>
> Once we've dispatched D2372R1, we'll return to the original agenda for our
> last telecon; discussion of P2093R5 <https://wg21.link/p2093r5>
> (Formatted output) and P2295R2 <https://wg21.link/p2295r3> (Support for
> UTF-8 as a portable source file encoding). I've retained P2348R0
> <https://wg21.link/p2348r0> on the agenda, though I don't expect that
> we'll get to it.
>
> With regard to P2093R5 <https://wg21.link/p2093r5>, the current status is
> that LEWG has referred the paper back to SG16 for further discussion;
> please see the LEWG meeting minutes here
> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
> Specifically, LEWG would benefit from additional analysis of previously
> deferred questions <http://lists.isocpp.org/lib-ext/2021/03/18189.php>
> regarding character encoding concerns, transcoding requirements (or the
> lack there of) and the ensuing consequences (or lack there of).
>
> 1. How errors in transcoding should be handled. E.g., when
> transcoding from UTF-8 to a UTF-16 based console interface and the UTF-8
> input is not well-formed.
> 2. The choice to base behavior on the compile-time choice of literal
> encoding. An implication of the current proposal is that a program that
> contains only ASCII characters in string literals will change behavior
> depending on whether the literal encoding is UTF-8 vs ASCII (or some other
> ASCII derived encoding).
> 3. Whether transcoding to the console interface encoding should be
> performed when the literal encoding is not UTF-8.
> 4. What the implications are for future support of std::print("{} {} {}
> {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text").
>
> I think these concerns will be easier to resolve if we first reach
> consensus regarding scenarios in which localized text may be provided in an
> unexpected encoding. The following is a slightly modified example of code
> Hubert previously provided. The example has been modified to explicitly
> opt into localized chrono formatting.
>
> std::print("{:L%p}\n",
> std::chrono::system_clock::now().time_since_epoch());
>
> At issue is the encoding used by locale sensitive chrono formatters. The
> example above contains the %p specifier and is locale sensitive because
> AM/PM designations may be localized. In a Chinese locale the desired
> translation of "PM" is "下午", but the locale will provide the translation in
> the locale encoding. As specified in P2093R5, if the literal encoding is
> UTF-8, than std::print() will expect the translation to be provided in
> UTF-8, but if the locale is not UTF-8-based (e.g., Big5; perhaps Shift-JIS
> for the Japanese 午後 translation), then the result is mojibake.
>
> I had previously suggested the following possible directions we can
> investigate to resolve the encoding concerns.
>
> - Specialize std::locale facets
> <https://en.cppreference.com/w/cpp/locale/locale> and related I/O
> manipulators like std::put_time()
> <https://en.cppreference.com/w/cpp/io/manip/put_time> for char8_t.
> This would allow std::print() to, when the literal encoding is UTF-8,
> opt-in to use of the UTF-8/char8_t facets and I/O manipulators.
> - When the literal encoding is UTF-8, stipulate that running the
> program in a non-UTF-8 based locale is non-conforming. This would
> effectively require MSVC programmers to, when building code with the
> /utf-8 option, to also force selection of a UTF-8 code page via a
> manifest
> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
> and require use of Windows 10 build 1903 or later.
> - When the literal encoding is UTF-8, specify that non-UTF-8 based
> locale dependent translations be implicitly transcoded (such transcoding
> should never result in errors except perhaps for memory allocation
> failures).
> - Drop the special case handling for the literal encoding being UTF-8
> and specify that, when bypassing a stream to write directly to the console,
> that the output be implicitly transcoded from the current locale dependent
> encoding (whatever it is) to the console encoding (UTF-8).
>
> If we get through all of that, we'll review Corentin's updates in P2295R2
> <https://wg21.link/p2295r3> to address prior SG16 feedback. Thank you to
> everyone that already provided additional feedback on the mailing list!
>
> Tom.
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-05-11 19:40:45