sg16: Re: [SG16] Agenda for the 2021-04-28 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 26 Apr 2021 12:18:24 -0400

On 4/19/21 10:58 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, April 28th at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20210428T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * P2093R5: Formatted output <https://wg21.link/p2093r5>
> * P2348R0: Whitespaces Wording Revamp
> <https://isocpp.org/files/papers/P2348R0.pdf>
>
> LEWG discussed P2093R5 at their 2021-04-06 telecon and decided to
> refer the paper back to SG16 for further discussion. LEWG meeting
> minutes are available here
> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>;
> please review them prior to the telecon. LEWG reviewed the list of
> prior SG16 deferred questions posted to them here
> <http://lists.isocpp.org/lib-ext/2021/03/18189.php>. Of those, they
> established consensus on an answer for #2 (they agreed not to block
> std::print() on a proposal for underlying terminal facilities), but
> referred the rest back to us. My interpretation of their actions is
> that LEWG would like a revision of the paper to address these concerns
> based on SG16 input (e.g., discuss design options and SG16 consensus
> or lack thereof). We'll therefore focus on these questions at this
> telecon.
>
> Hubert provided the following very interesting example usage.
>
> std::print("{:%r}\n",
> std::chrono::system_clock::now().time_since_epoch());
>
> At issue is the encoding used by locale sensitive chrono formatters.
> Search [time.format] <http://eel.is/c++draft/time.format> for "locale"
> to find example chrono format specifiers that are locale dependent.
> The example above contains the %r specifier and is locale sensitive
> because AM/PM designations may be localized. In a Chinese locale the
> desired translation of "PM" is "下午", but the locale will provide the
> translation in the locale encoding. As specified in P2093R5, if the
> execution (literal) encoding is UTF-8, than std::print() will expect
> the translation to be provided in UTF-8, but if the locale is not
> UTF-8-based (e.g., Big5; perhaps Shift-JIS for the Japanese 午後
> translation), then the result is mojibake. This is a good example of
> how locale conflates translation and character encoding.
>
> Addressing the above will be our first order of business. Please
> reserve some time to independently think about this problem (ignore
> responses to this message for a few days if you need to). I am
> explicitly not listing possible approaches to address this concern in
> this message so as to avoid adding (further) bias in any specific
> direction. I suspect the answers to the previously deferred SG16
> questions will be easier to answer once this concern is resolved.
>
Now that we've all had some time to think about this issue, here are
some possible directions we can pursue to resolve it. These are
presented in no particular order.

  * Specialize std::locale facets
    <https://en.cppreference.com/w/cpp/locale/locale> and related I/O
    manipulators like std::put_time()
    <https://en.cppreference.com/w/cpp/io/manip/put_time> for char8_t.
    This would allow std::print() to, when the literal encoding is
    UTF-8, opt-in to use of the UTF-8/char8_t facets and I/O manipulators.
  * When the literal encoding is UTF-8, stipulate that running the
    program in a non-UTF-8 based locale is non-conforming. This would
    effectively require MSVC programmers to, when building code with the
    /utf-8 option, to also force selection of a UTF-8 code page via a
    manifest
    <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
    and require use of Windows 10 build 1903 or later.
  * When the literal encoding is UTF-8, specify that non-UTF-8 based
    locale dependent translations be implicitly transcoded (such
    transcoding should never result in errors except perhaps for memory
    allocation failures).
  * Drop the special case handling for the literal encoding being UTF-8
    and specify that, when bypassing a stream to write directly to the
    console, that the output be implicitly transcoded from the current
    locale dependent encoding (whatever it is) to the console encoding
    (UTF-8).

Please feel free to comment on these, or additional, approaches before
our meeting on Wednesday.

I think it would benefit LEWG if a revision of the paper presented each
of these possibilities, the consequences, and the rationale (and
hopefully SG16 consensus) for the proposed direction.

Tom.

> I do not intend to time limit discussion of P2093R5 as I believe this
> is an important matter to resolve. If we are able to complete
> discussion of P2093R5, then we'll discuss P2348R0.
>
> Tom.
>
>

Received on 2021-04-26 11:18:50