sg16: Re: [SG16] Agenda for the 2021-05-26 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 25 May 2021 09:07:58 -0400

Reminder that this meeting is taking place tomorrow. The agenda remains
the same.

Tom.

On 5/16/21 5:23 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, May 26th at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20210526T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * P2295R3: Support for UTF-8 as a portable source file encoding
> <https://wg21.link/p2295r3>
> o Review updates intended to address prior SG16 feedback.
> * P2093R6: Formatted output <https://wg21.link/p2093r6>
> o Discuss locale dependent character encoding concerns.
>
> Since we did not get to discuss P2295R3 at our last telecon, it will
> again retain the top spot on the agenda followed by P2093R6. Thus,
> the agenda looks much the same as for the last telecon (I dropped
> P2348R0 <https://wg21.link/p2348r0> for now; we won't realistically
> get to it).
>
> With regard to P2093R6 <https://wg21.link/p2093r6>, the current status
> is unchanged; LEWG has referred the paper back to SG16 for further
> discussion; please see the LEWG meeting minutes here
> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
> Specifically, LEWG would benefit from additional analysis of
> previously deferred questions
> <http://lists.isocpp.org/lib-ext/2021/03/18189.php> regarding
> character encoding concerns, transcoding requirements (or the lack
> there of) and the ensuing consequences (or lack there of).
>
> 1. How errors in transcoding should be handled. E.g., when
> transcoding from UTF-8 to a UTF-16 based console interface and the
> UTF-8 input is not well-formed.
> 2. The choice to base behavior on the compile-time choice of literal
> encoding. An implication of the current proposal is that a
> program that contains only ASCII characters in string literals
> will change behavior depending on whether the literal encoding is
> UTF-8 vs ASCII (or some other ASCII derived encoding).
> 3. Whether transcoding to the console interface encoding should be
> performed when the literal encoding is not UTF-8.
> 4. What the implications are for future support of std::print("{} {}
> {}{}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text").
>
> At our last telecon, we focused on how to handle ill-formed inputs,
> but did not much discuss how such inputs arise. Now that LWG3547
> <https://cplusplus.github.io/LWG/issue3547> has been effectively
> (though not officially) resolved by P2372R1
> <https://wg21.link/p2372r1>, we have a concrete example of how the
> std::print() facility itself can produce ill-formed input (assuming
> that std::print() transcodes all inputs using the same encoding). I
> would like to start with this example as I think it is fundamental to
> how we choose to answer the above questions.
>
> std::print("{:L%p}\n",
> std::chrono::system_clock::now().time_since_epoch());
>
> At issue is the encoding used by chrono formatters specified with the
> L option to request a locale specific form. The example above
> contains the %p specifier with the L option. In a Chinese locale the
> desired translation of "PM" is "下午", but the locale will provide the
> translation in the locale encoding. As specified in P2093R6, if the
> literal encoding is UTF-8, than std::print() will expect the
> translation to be provided in UTF-8, but if the locale is not
> UTF-8-based (e.g., Big5; perhaps Shift-JIS for the Japanese 午後
> translation), then the result is mojibake.
>
> These are possible directions we can investigate to resolve the
> encoding concerns.
>
> * Specialize std::locale facets
> <https://en.cppreference.com/w/cpp/locale/locale> and related I/O
> manipulators like std::put_time()
> <https://en.cppreference.com/w/cpp/io/manip/put_time> for
> char8_t. This would allow std::print() to, when the literal
> encoding is UTF-8, opt-in to use of the UTF-8/char8_t facets and
> I/O manipulators.
> * When the literal encoding is UTF-8, stipulate that running the
> program in a non-UTF-8 based locale is non-conforming. This would
> effectively require MSVC programmers to, when building code with
> the /utf-8 option, to also force selection of a UTF-8 code page
> via a manifest
> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
> and require use of Windows 10 build 1903 or later.
> * When the literal encoding is UTF-8, specify that non-UTF-8 based
> locale dependent translations be implicitly transcoded (such
> transcoding should never result in errors except perhaps for
> memory allocation failures).
> * Drop the special case handling for the literal encoding being
> UTF-8 and specify that, when bypassing a stream to write directly
> to the console, that the output be implicitly transcoded from the
> current locale dependent encoding (whatever it is) to the console
> encoding (UTF-8).
>
> Tom.
>
>

Received on 2021-05-25 08:08:02