Date: Tue, 25 May 2021 17:21:35 +0200
On Tue, May 25, 2021 at 5:19 PM Tom Honermann <tom_at_[hidden]> wrote:
> On 5/25/21 10:36 AM, Corentin Jabot via SG16 wrote:
>
>
>
> On Tue, May 25, 2021 at 3:08 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Reminder that this meeting is taking place tomorrow. The agenda remains
>> the same.
>>
>> Tom.
>>
>> On 5/16/21 5:23 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, May 26th at 19:30 UTC (timezone
>> conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210526T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>> ).
>>
>> The agenda is:
>>
>> - P2295R3: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r3>
>> - Review updates intended to address prior SG16 feedback.
>> - P2093R6: Formatted output <https://wg21.link/p2093r6>
>> - Discuss locale dependent character encoding concerns.
>>
>> Since we did not get to discuss P2295R3 at our last telecon, it will
>> again retain the top spot on the agenda followed by P2093R6. Thus, the
>> agenda looks much the same as for the last telecon (I dropped P2348R0
>> <https://wg21.link/p2348r0> for now; we won't realistically get to it).
>>
>>
> I will try to be there, no promise though.
>
> Thanks for letting me know. If you are unable to attend, and if you don't
> object, we'll still review P2295R3 and carefully record any requested
> changes so that we can keep making progress on this paper.
>
Given my disagreement with some recent suggestions, that might be
counterproductive!
> Tom.
>
> Btw I would love feedback on P2348. There is little but wording in this
> paper so mail might be as good or better avenue for such feedback :)
>
> With regard to P2093R6 <https://wg21.link/p2093r6>, the current status is
>> unchanged; LEWG has referred the paper back to SG16 for further discussion;
>> please see the LEWG meeting minutes here
>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
>> Specifically, LEWG would benefit from additional analysis of previously
>> deferred questions <http://lists.isocpp.org/lib-ext/2021/03/18189.php>
>> regarding character encoding concerns, transcoding requirements (or the
>> lack there of) and the ensuing consequences (or lack there of).
>>
>> 1. How errors in transcoding should be handled. E.g., when
>> transcoding from UTF-8 to a UTF-16 based console interface and the UTF-8
>> input is not well-formed.
>> 2. The choice to base behavior on the compile-time choice of literal
>> encoding. An implication of the current proposal is that a program that
>> contains only ASCII characters in string literals will change behavior
>> depending on whether the literal encoding is UTF-8 vs ASCII (or some other
>> ASCII derived encoding).
>> 3. Whether transcoding to the console interface encoding should be
>> performed when the literal encoding is not UTF-8.
>> 4. What the implications are for future support of std::print("{} {}
>> {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text").
>>
>> At our last telecon, we focused on how to handle ill-formed inputs, but
>> did not much discuss how such inputs arise. Now that LWG3547
>> <https://cplusplus.github.io/LWG/issue3547> has been effectively (though
>> not officially) resolved by P2372R1 <https://wg21.link/p2372r1>, we have
>> a concrete example of how the std::print() facility itself can produce
>> ill-formed input (assuming that std::print() transcodes all inputs using
>> the same encoding). I would like to start with this example as I think it
>> is fundamental to how we choose to answer the above questions.
>>
>> std::print("{:L%p}\n",
>> std::chrono::system_clock::now().time_since_epoch());
>>
>> At issue is the encoding used by chrono formatters specified with the L
>> option to request a locale specific form. The example above contains the
>> %p specifier with the L option. In a Chinese locale the desired
>> translation of "PM" is "下午", but the locale will provide the translation in
>> the locale encoding. As specified in P2093R6, if the literal encoding is
>> UTF-8, than std::print() will expect the translation to be provided in
>> UTF-8, but if the locale is not UTF-8-based (e.g., Big5; perhaps Shift-JIS
>> for the Japanese 午後 translation), then the result is mojibake.
>>
>> These are possible directions we can investigate to resolve the encoding
>> concerns.
>>
>> - Specialize std::locale facets
>> <https://en.cppreference.com/w/cpp/locale/locale> and related I/O
>> manipulators like std::put_time()
>> <https://en.cppreference.com/w/cpp/io/manip/put_time> for char8_t.
>> This would allow std::print() to, when the literal encoding is UTF-8,
>> opt-in to use of the UTF-8/char8_t facets and I/O manipulators.
>> - When the literal encoding is UTF-8, stipulate that running the
>> program in a non-UTF-8 based locale is non-conforming. This would
>> effectively require MSVC programmers to, when building code with the
>> /utf-8 option, to also force selection of a UTF-8 code page via a
>> manifest
>> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
>> and require use of Windows 10 build 1903 or later.
>> - When the literal encoding is UTF-8, specify that non-UTF-8 based
>> locale dependent translations be implicitly transcoded (such transcoding
>> should never result in errors except perhaps for memory allocation
>> failures).
>> - Drop the special case handling for the literal encoding being UTF-8
>> and specify that, when bypassing a stream to write directly to the console,
>> that the output be implicitly transcoded from the current locale dependent
>> encoding (whatever it is) to the console encoding (UTF-8).
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>
>
>
> On 5/25/21 10:36 AM, Corentin Jabot via SG16 wrote:
>
>
>
> On Tue, May 25, 2021 at 3:08 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Reminder that this meeting is taking place tomorrow. The agenda remains
>> the same.
>>
>> Tom.
>>
>> On 5/16/21 5:23 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, May 26th at 19:30 UTC (timezone
>> conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210526T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>> ).
>>
>> The agenda is:
>>
>> - P2295R3: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r3>
>> - Review updates intended to address prior SG16 feedback.
>> - P2093R6: Formatted output <https://wg21.link/p2093r6>
>> - Discuss locale dependent character encoding concerns.
>>
>> Since we did not get to discuss P2295R3 at our last telecon, it will
>> again retain the top spot on the agenda followed by P2093R6. Thus, the
>> agenda looks much the same as for the last telecon (I dropped P2348R0
>> <https://wg21.link/p2348r0> for now; we won't realistically get to it).
>>
>>
> I will try to be there, no promise though.
>
> Thanks for letting me know. If you are unable to attend, and if you don't
> object, we'll still review P2295R3 and carefully record any requested
> changes so that we can keep making progress on this paper.
>
Given my disagreement with some recent suggestions, that might be
counterproductive!
> Tom.
>
> Btw I would love feedback on P2348. There is little but wording in this
> paper so mail might be as good or better avenue for such feedback :)
>
> With regard to P2093R6 <https://wg21.link/p2093r6>, the current status is
>> unchanged; LEWG has referred the paper back to SG16 for further discussion;
>> please see the LEWG meeting minutes here
>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
>> Specifically, LEWG would benefit from additional analysis of previously
>> deferred questions <http://lists.isocpp.org/lib-ext/2021/03/18189.php>
>> regarding character encoding concerns, transcoding requirements (or the
>> lack there of) and the ensuing consequences (or lack there of).
>>
>> 1. How errors in transcoding should be handled. E.g., when
>> transcoding from UTF-8 to a UTF-16 based console interface and the UTF-8
>> input is not well-formed.
>> 2. The choice to base behavior on the compile-time choice of literal
>> encoding. An implication of the current proposal is that a program that
>> contains only ASCII characters in string literals will change behavior
>> depending on whether the literal encoding is UTF-8 vs ASCII (or some other
>> ASCII derived encoding).
>> 3. Whether transcoding to the console interface encoding should be
>> performed when the literal encoding is not UTF-8.
>> 4. What the implications are for future support of std::print("{} {}
>> {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text").
>>
>> At our last telecon, we focused on how to handle ill-formed inputs, but
>> did not much discuss how such inputs arise. Now that LWG3547
>> <https://cplusplus.github.io/LWG/issue3547> has been effectively (though
>> not officially) resolved by P2372R1 <https://wg21.link/p2372r1>, we have
>> a concrete example of how the std::print() facility itself can produce
>> ill-formed input (assuming that std::print() transcodes all inputs using
>> the same encoding). I would like to start with this example as I think it
>> is fundamental to how we choose to answer the above questions.
>>
>> std::print("{:L%p}\n",
>> std::chrono::system_clock::now().time_since_epoch());
>>
>> At issue is the encoding used by chrono formatters specified with the L
>> option to request a locale specific form. The example above contains the
>> %p specifier with the L option. In a Chinese locale the desired
>> translation of "PM" is "下午", but the locale will provide the translation in
>> the locale encoding. As specified in P2093R6, if the literal encoding is
>> UTF-8, than std::print() will expect the translation to be provided in
>> UTF-8, but if the locale is not UTF-8-based (e.g., Big5; perhaps Shift-JIS
>> for the Japanese 午後 translation), then the result is mojibake.
>>
>> These are possible directions we can investigate to resolve the encoding
>> concerns.
>>
>> - Specialize std::locale facets
>> <https://en.cppreference.com/w/cpp/locale/locale> and related I/O
>> manipulators like std::put_time()
>> <https://en.cppreference.com/w/cpp/io/manip/put_time> for char8_t.
>> This would allow std::print() to, when the literal encoding is UTF-8,
>> opt-in to use of the UTF-8/char8_t facets and I/O manipulators.
>> - When the literal encoding is UTF-8, stipulate that running the
>> program in a non-UTF-8 based locale is non-conforming. This would
>> effectively require MSVC programmers to, when building code with the
>> /utf-8 option, to also force selection of a UTF-8 code page via a
>> manifest
>> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
>> and require use of Windows 10 build 1903 or later.
>> - When the literal encoding is UTF-8, specify that non-UTF-8 based
>> locale dependent translations be implicitly transcoded (such transcoding
>> should never result in errors except perhaps for memory allocation
>> failures).
>> - Drop the special case handling for the literal encoding being UTF-8
>> and specify that, when bypassing a stream to write directly to the console,
>> that the output be implicitly transcoded from the current locale dependent
>> encoding (whatever it is) to the console encoding (UTF-8).
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>
>
>
Received on 2021-05-25 10:21:49