Date: Tue, 25 May 2021 11:19:00 -0400
On 5/25/21 10:36 AM, Corentin Jabot via SG16 wrote:
>
>
> On Tue, May 25, 2021 at 3:08 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Reminder that this meeting is taking place tomorrow. The agenda
> remains the same.
>
> Tom.
>
> On 5/16/21 5:23 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, May 26th at 19:30 UTC
>> (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210526T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> The agenda is:
>>
>> * P2295R3: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r3>
>> o Review updates intended to address prior SG16 feedback.
>> * P2093R6: Formatted output <https://wg21.link/p2093r6>
>> o Discuss locale dependent character encoding concerns.
>>
>> Since we did not get to discuss P2295R3 at our last telecon, it
>> will again retain the top spot on the agenda followed by
>> P2093R6. Thus, the agenda looks much the same as for the last
>> telecon (I dropped P2348R0 <https://wg21.link/p2348r0> for now;
>> we won't realistically get to it).
>>
>
> I will try to be there, no promise though.
Thanks for letting me know. If you are unable to attend, and if you
don't object, we'll still review P2295R3 and carefully record any
requested changes so that we can keep making progress on this paper.
Tom.
> Btw I would love feedback on P2348. There is little but wording in
> this paper so mail might be as good or better avenue for such feedback :)
>
>> With regard to P2093R6 <https://wg21.link/p2093r6>, the current
>> status is unchanged; LEWG has referred the paper back to SG16 for
>> further discussion; please see the LEWG meeting minutes here
>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
>> Specifically, LEWG would benefit from additional analysis of
>> previously deferred questions
>> <http://lists.isocpp.org/lib-ext/2021/03/18189.php> regarding
>> character encoding concerns, transcoding requirements (or the
>> lack there of) and the ensuing consequences (or lack there of).
>>
>> 1. How errors in transcoding should be handled. E.g., when
>> transcoding from UTF-8 to a UTF-16 based console interface
>> and the UTF-8 input is not well-formed.
>> 2. The choice to base behavior on the compile-time choice of
>> literal encoding. An implication of the current proposal is
>> that a program that contains only ASCII characters in string
>> literals will change behavior depending on whether the
>> literal encoding is UTF-8 vs ASCII (or some other ASCII
>> derived encoding).
>> 3. Whether transcoding to the console interface encoding should
>> be performed when the literal encoding is not UTF-8.
>> 4. What the implications are for future support of
>> std::print("{} {} {}{}", L"Wide text", u8"UTF-8 text",
>> u"UTF-16 text", U"UTF-32 text").
>>
>> At our last telecon, we focused on how to handle ill-formed
>> inputs, but did not much discuss how such inputs arise. Now that
>> LWG3547 <https://cplusplus.github.io/LWG/issue3547> has been
>> effectively (though not officially) resolved by P2372R1
>> <https://wg21.link/p2372r1>, we have a concrete example of how
>> the std::print() facility itself can produce ill-formed input
>> (assuming that std::print() transcodes all inputs using the same
>> encoding). I would like to start with this example as I think it
>> is fundamental to how we choose to answer the above questions.
>>
>> std::print("{:L%p}\n",
>> std::chrono::system_clock::now().time_since_epoch());
>>
>> At issue is the encoding used by chrono formatters specified with
>> the L option to request a locale specific form. The example
>> above contains the %p specifier with the L option. In a Chinese
>> locale the desired translation of "PM" is "下午", but the locale
>> will provide the translation in the locale encoding. As
>> specified in P2093R6, if the literal encoding is UTF-8, than
>> std::print() will expect the translation to be provided in UTF-8,
>> but if the locale is not UTF-8-based (e.g., Big5; perhaps
>> Shift-JIS for the Japanese 午後 translation), then the result is
>> mojibake.
>>
>> These are possible directions we can investigate to resolve the
>> encoding concerns.
>>
>> * Specialize std::locale facets
>> <https://en.cppreference.com/w/cpp/locale/locale> and related
>> I/O manipulators like std::put_time()
>> <https://en.cppreference.com/w/cpp/io/manip/put_time> for
>> char8_t. This would allow std::print() to, when the literal
>> encoding is UTF-8, opt-in to use of the UTF-8/char8_t facets
>> and I/O manipulators.
>> * When the literal encoding is UTF-8, stipulate that running
>> the program in a non-UTF-8 based locale is non-conforming.
>> This would effectively require MSVC programmers to, when
>> building code with the /utf-8 option, to also force selection
>> of a UTF-8 code page via a manifest
>> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
>> and require use of Windows 10 build 1903 or later.
>> * When the literal encoding is UTF-8, specify that non-UTF-8
>> based locale dependent translations be implicitly transcoded
>> (such transcoding should never result in errors except
>> perhaps for memory allocation failures).
>> * Drop the special case handling for the literal encoding being
>> UTF-8 and specify that, when bypassing a stream to write
>> directly to the console, that the output be implicitly
>> transcoded from the current locale dependent encoding
>> (whatever it is) to the console encoding (UTF-8).
>>
>> Tom.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
>
>
> On Tue, May 25, 2021 at 3:08 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Reminder that this meeting is taking place tomorrow. The agenda
> remains the same.
>
> Tom.
>
> On 5/16/21 5:23 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, May 26th at 19:30 UTC
>> (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210526T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> The agenda is:
>>
>> * P2295R3: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r3>
>> o Review updates intended to address prior SG16 feedback.
>> * P2093R6: Formatted output <https://wg21.link/p2093r6>
>> o Discuss locale dependent character encoding concerns.
>>
>> Since we did not get to discuss P2295R3 at our last telecon, it
>> will again retain the top spot on the agenda followed by
>> P2093R6. Thus, the agenda looks much the same as for the last
>> telecon (I dropped P2348R0 <https://wg21.link/p2348r0> for now;
>> we won't realistically get to it).
>>
>
> I will try to be there, no promise though.
Thanks for letting me know. If you are unable to attend, and if you
don't object, we'll still review P2295R3 and carefully record any
requested changes so that we can keep making progress on this paper.
Tom.
> Btw I would love feedback on P2348. There is little but wording in
> this paper so mail might be as good or better avenue for such feedback :)
>
>> With regard to P2093R6 <https://wg21.link/p2093r6>, the current
>> status is unchanged; LEWG has referred the paper back to SG16 for
>> further discussion; please see the LEWG meeting minutes here
>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
>> Specifically, LEWG would benefit from additional analysis of
>> previously deferred questions
>> <http://lists.isocpp.org/lib-ext/2021/03/18189.php> regarding
>> character encoding concerns, transcoding requirements (or the
>> lack there of) and the ensuing consequences (or lack there of).
>>
>> 1. How errors in transcoding should be handled. E.g., when
>> transcoding from UTF-8 to a UTF-16 based console interface
>> and the UTF-8 input is not well-formed.
>> 2. The choice to base behavior on the compile-time choice of
>> literal encoding. An implication of the current proposal is
>> that a program that contains only ASCII characters in string
>> literals will change behavior depending on whether the
>> literal encoding is UTF-8 vs ASCII (or some other ASCII
>> derived encoding).
>> 3. Whether transcoding to the console interface encoding should
>> be performed when the literal encoding is not UTF-8.
>> 4. What the implications are for future support of
>> std::print("{} {} {}{}", L"Wide text", u8"UTF-8 text",
>> u"UTF-16 text", U"UTF-32 text").
>>
>> At our last telecon, we focused on how to handle ill-formed
>> inputs, but did not much discuss how such inputs arise. Now that
>> LWG3547 <https://cplusplus.github.io/LWG/issue3547> has been
>> effectively (though not officially) resolved by P2372R1
>> <https://wg21.link/p2372r1>, we have a concrete example of how
>> the std::print() facility itself can produce ill-formed input
>> (assuming that std::print() transcodes all inputs using the same
>> encoding). I would like to start with this example as I think it
>> is fundamental to how we choose to answer the above questions.
>>
>> std::print("{:L%p}\n",
>> std::chrono::system_clock::now().time_since_epoch());
>>
>> At issue is the encoding used by chrono formatters specified with
>> the L option to request a locale specific form. The example
>> above contains the %p specifier with the L option. In a Chinese
>> locale the desired translation of "PM" is "下午", but the locale
>> will provide the translation in the locale encoding. As
>> specified in P2093R6, if the literal encoding is UTF-8, than
>> std::print() will expect the translation to be provided in UTF-8,
>> but if the locale is not UTF-8-based (e.g., Big5; perhaps
>> Shift-JIS for the Japanese 午後 translation), then the result is
>> mojibake.
>>
>> These are possible directions we can investigate to resolve the
>> encoding concerns.
>>
>> * Specialize std::locale facets
>> <https://en.cppreference.com/w/cpp/locale/locale> and related
>> I/O manipulators like std::put_time()
>> <https://en.cppreference.com/w/cpp/io/manip/put_time> for
>> char8_t. This would allow std::print() to, when the literal
>> encoding is UTF-8, opt-in to use of the UTF-8/char8_t facets
>> and I/O manipulators.
>> * When the literal encoding is UTF-8, stipulate that running
>> the program in a non-UTF-8 based locale is non-conforming.
>> This would effectively require MSVC programmers to, when
>> building code with the /utf-8 option, to also force selection
>> of a UTF-8 code page via a manifest
>> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
>> and require use of Windows 10 build 1903 or later.
>> * When the literal encoding is UTF-8, specify that non-UTF-8
>> based locale dependent translations be implicitly transcoded
>> (such transcoding should never result in errors except
>> perhaps for memory allocation failures).
>> * Drop the special case handling for the literal encoding being
>> UTF-8 and specify that, when bypassing a stream to write
>> directly to the console, that the output be implicitly
>> transcoded from the current locale dependent encoding
>> (whatever it is) to the console encoding (UTF-8).
>>
>> Tom.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
Received on 2021-05-25 10:19:03