sg16: Re: [SG16] Agenda for the 2021-05-12 SG16 telecon

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Thu, 27 May 2021 09:50:01 -0700

> I don't know that users can meaningfully use vprint_(non)unicode.

It depends on who you count as users. Like other vformat overloads they
shouldn't be used by everyone but they are definitely useful to library
writers who implement low-level formatting facilities.

- Victor

On Wed, May 26, 2021 at 5:55 AM Corentin Jabot <corentinjabot_at_[hidden]>
wrote:

>
>
> On Wed, May 26, 2021 at 2:49 PM Victor Zverovich <
> victor.zverovich_at_[hidden]> wrote:
>
>> > Do we have to publically expose these?
>>
>> Yes, they are useful for library writers in the same way vformat
>> overloads are.
>>
>
> What if there is a single public overload?
>
> print
> vprint
> __vprint_unicode
> __vprint
>
> I don't know that users can meaningfully use vprint_(non)unicode.
>
>
>>
>> - Victor
>>
>> On Tue, May 25, 2021 at 11:59 AM Corentin Jabot via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>>
>>>
>>> On Thu, May 20, 2021 at 12:34 AM Tom Honermann via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> On 5/19/21 2:17 PM, Victor Zverovich via SG16 wrote:
>>>>
>>>> > I'm concerned that deployment experience might be limited to specific
>>>> environments.
>>>>
>>>> The great thing about the current implementation (and proposal) is that
>>>> it is consistent with printf's behavior on common C standard library
>>>> implementations on non-Windows platforms, so we have all the deployment
>>>> experience in the world. This is also why I am reluctant to innovate in
>>>> this area. There has been a lot of usage experience on Windows as well and
>>>> there is much less variation there.
>>>>
>>>> That is a benefit, but I don't think that is strongly relevant to
>>>> Hubert's question.
>>>>
>>>> The intent of the proposal is to grant additional permissions for
>>>> implementations to alter behavior based on where the output is directed.
>>>> Implementation experience only exists for that for Windows, but the wording
>>>> is (intentionally) written to be agnostic to implementation. Thus, we
>>>> don't have implementation experience for (all of) this feature outside of
>>>> Windows at the moment. I understand and appreciate that the proposal is
>>>> strongly intended to work around a well known Windows deficiency, but it
>>>> does have applicability elsewhere.
>>>>
>>>>
>>>> > I think the non-Unicode function is awkwardly named.
>>>>
>>>> Naming suggestions are welcome!
>>>>
>>>>
>>>> - vprint_unicode()
>>>> - => vprint_utf8()
>>>> "unicode" is ambiguous, but the specification is clear that
>>>> UTF-8 is intended.
>>>> - => u8vprint()
>>>> => vu8print()
>>>> I don't recommend these as they may imply char8_t association.
>>>> - vprint_nonunicode()
>>>> - => vprint_mojibake()
>>>> If we want to be honest.
>>>> - => vprint_polyglot()
>>>> This feels pretty accurate to me.
>>>> - => vprint_narrow()
>>>> This doesn't feel right to me since "narrow" within the standard
>>>> includes UTF-8.
>>>>
>>>>
>>> Do we have to publically expose these?
>>>
>>>
>>>>
>>>> -
>>>>
>>>> Tom.
>>>>
>>>>
>>>> Cheers,
>>>> Victor
>>>>
>>>> On Wed, May 12, 2021 at 12:51 PM Hubert Tong <
>>>> hubert.reinterpretcast_at_[hidden]> wrote:
>>>>
>>>>> On Wed, May 12, 2021 at 3:14 PM Victor Zverovich <
>>>>> victor.zverovich_at_[hidden]> wrote:
>>>>>
>>>>>> Hi Hubert,
>>>>>>
>>>>>> Thanks for the suggestions, I'll try incorporating them in the next
>>>>>> iteration of the paper.
>>>>>>
>>>>>> > I think it would help if the point was stated more explicitly ...
>>>>>>
>>>>>> Good idea, will clarify this.
>>>>>>
>>>>>> > The paper can at least acknowledge that "polyglot" string literals
>>>>>> exist ...
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>> > we'll end up with cases where the literal encoding is UTF-8 but the
>>>>>> user won't want the UTF-8 std::print behaviour to potentially kick in.
>>>>>>
>>>>>> I am a bit skeptical because I haven't seen any reports about cases
>>>>>> like this from the extensive usage experience of this feature. We can't fix
>>>>>> clearly broken things and be bug-to-bug compatible with legacy APIs at the
>>>>>> same time.
>>>>>>
>>>>>> > At least two cases come to mind.
>>>>>>
>>>>>> I don't think we can do much if users decide to lie about the
>>>>>> encoding. We should make the common case work rather than try making
>>>>>> everyone happy and support theoretical use cases not backed by actual
>>>>>> implementation and usage experience.
>>>>>>
>>>>>
>>>>> I'm concerned that deployment experience might be limited to specific
>>>>> environments. I expect the conditions for the second scenario are met very
>>>>> easily on *nix and also very difficult to test for (requires some sort of
>>>>> special test environment/harness).
>>>>>
>>>>>
>>>>>> That said they can always use nonunicode function or continue using
>>>>>> their legacy APIs in those cases.
>>>>>>
>>>>>
>>>>> I think the non-Unicode function is awkwardly named.
>>>>>
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Victor
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 11, 2021 at 8:23 PM Hubert Tong <
>>>>>> hubert.reinterpretcast_at_[hidden]> wrote:
>>>>>>
>>>>>>> On Tue, May 11, 2021 at 8:41 PM Victor Zverovich via SG16 <
>>>>>>> sg16_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> Dear Unicoders,
>>>>>>>>
>>>>>>>> Here is a link to a new revision of P2093:
>>>>>>>> https://isocpp.org/files/papers/D2093R6.html. It's essentially the
>>>>>>>> same as R5 but addresses the latest LEWG feedback and adds a few
>>>>>>>> clarifications. The only change to the wording is replacing <io> with
>>>>>>>> <print>.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks Victor.
>>>>>>>
>>>>>>> With respect to the choice to transcoding, it took me a while to
>>>>>>> catch on to the statement being made. I think it would help if the point
>>>>>>> was stated more explicitly that the choice to perform replacement during
>>>>>>> transcoding is because that is consistent with the treatment of malformed
>>>>>>> UTF-8 for UTF-8-native terminals and the choice not to transcode in the
>>>>>>> case where the terminal is UTF-8 native is because we expect the terminal
>>>>>>> to behave predictably as-is we did do the "transcoding".
>>>>>>>
>>>>>>> I'm still not entirely convinced about the argument surrounding the
>>>>>>> choice of using the literal encoding though. The paper can at least
>>>>>>> acknowledge that "polyglot" string literals exist and partially obviates
>>>>>>> the insistence that the literal encoding being UTF-8 according to the build
>>>>>>> system/build mode means that the involvement of non-UTF-8 strings in the
>>>>>>> vicinity of std::print constitutes "mixing encodings".
>>>>>>>
>>>>>>> I really think that, just for predictability surrounding the display
>>>>>>> of substitution text, we'll end up with cases where the literal encoding is
>>>>>>> UTF-8 but the user won't want the UTF-8 std::print behaviour to potentially
>>>>>>> kick in.
>>>>>>>
>>>>>>> At least two cases come to mind:
>>>>>>> (1) Printing using both legacy interfaces and std::print where the
>>>>>>> legacy interfaces are not using UTF-8 may appear fine on some terminals but
>>>>>>> would result, on redirect, in output with mixed encoding.
>>>>>>>
>>>>>>> (2) std::print where the literal encoding is UTF-8 but the literals
>>>>>>> are all "polyglot" and substitution strings that are not UTF-8 can appear
>>>>>>> to be okay when redirecting or printing to non-Unicode terminals; however,
>>>>>>> once deployed to a Unicode terminal, replacement characters show up (even
>>>>>>> if the output is properly encoded for the underlying C output interface).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Victor
>>>>>>>>
>>>>>>>> On Tue, May 11, 2021 at 11:02 AM Tom Honermann via SG16 <
>>>>>>>> sg16_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> Reminder that this meeting is taking place tomorrow.
>>>>>>>>>
>>>>>>>>> Per suggestion by Peter, the agenda order is being changed to
>>>>>>>>> review the updates in P2295R2 before D2372R1 and P2093R5 in the hopes that
>>>>>>>>> we can forward P2295R2 to EWG. We'll try to limit that discussion to 30
>>>>>>>>> minutes. The updated agenda is below. Again, we are unlikely to get to
>>>>>>>>> P2348R0 at all.
>>>>>>>>>
>>>>>>>>> - P2295R2: Support for UTF-8 as a portable source file encoding
>>>>>>>>> <https://wg21.link/p2295r3>
>>>>>>>>> - Review updates intended to address prior SG16 feedback.
>>>>>>>>> - D2372R1: Fixing locale handling in chrono formatters
>>>>>>>>> <https://isocpp.org/files/papers/D2372R1.html>
>>>>>>>>> - Affirm or rebut LEWGs position.
>>>>>>>>> - P2093R5: Formatted output <https://wg21.link/p2093r5>
>>>>>>>>> - Discuss locale dependent character encoding concerns.
>>>>>>>>> - P2348R0: Whitespaces Wording Revamp
>>>>>>>>> <https://wg21.link/p2348r0>
>>>>>>>>>
>>>>>>>>> Tom.
>>>>>>>>>
>>>>>>>>> On 5/4/21 12:06 AM, Tom Honermann via SG16 wrote:
>>>>>>>>>
>>>>>>>>> SG16 will hold a telecon on Wednesday, May 12th at 19:30 UTC (timezone
>>>>>>>>> conversion
>>>>>>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210512T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> The agenda is:
>>>>>>>>>
>>>>>>>>> - D2372R1: Fixing locale handling in chrono formatters
>>>>>>>>> <https://isocpp.org/files/papers/D2372R1.html>
>>>>>>>>> - Affirm or rebut LEWGs position.
>>>>>>>>> - P2093R5: Formatted output <https://wg21.link/p2093r5>
>>>>>>>>> - Discuss locale dependent character encoding concerns.
>>>>>>>>> - P2295R2: Support for UTF-8 as a portable source file
>>>>>>>>> encoding <https://wg21.link/p2295r3>
>>>>>>>>> - Review updates intended to address prior SG16 feedback.
>>>>>>>>> - P2348R0: Whitespaces Wording Revamp
>>>>>>>>> <https://wg21.link/p2348r0>
>>>>>>>>>
>>>>>>>>> Our last telecon was consumed by discussion
>>>>>>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#april-28th-2021>
>>>>>>>>> of LWG3547 <https://cplusplus.github.io/LWG/issue3547> and
>>>>>>>>> possible remedies. Though we did not reach consensus on a direction
>>>>>>>>> forward during that telecon, Victor and Corentin, at the LEWG chair's
>>>>>>>>> request, drafted D2372R0, presented it at the LEWG telecon held
>>>>>>>>> 2021-05-03
>>>>>>>>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2372#2021-05-03>,
>>>>>>>>> and LEWG reached strong consensus for it. The D2372R0 revision will be
>>>>>>>>> submitted for the May mailing as P2372R0; and a D2372R1
>>>>>>>>> <https://isocpp.org/files/papers/D2372R1.html> revision
>>>>>>>>> addressing LEWG feedback will be submitted as P2372R1. Both revisions
>>>>>>>>> substantially match the proposed resolution that SG16 discussed. Since
>>>>>>>>> SG16 did not reach consensus on that direction, the LEWG chair has asked
>>>>>>>>> that we revisit it to either affirm or rebut the LEWG consensus. We will
>>>>>>>>> therefore (briefly) discuss and then poll that direction. Note that the
>>>>>>>>> poll taken in SG16 differs from the poll taken in LEWG. In SG16, we polled
>>>>>>>>> applying the proposed resolution to C++23 while LEWG polled applying the
>>>>>>>>> proposed resolution (with amendments to not change behavior for iostream
>>>>>>>>> manipulators) to C++23 *and* retroactively to C++20.
>>>>>>>>>
>>>>>>>>> Once we've dispatched D2372R1, we'll return to the original agenda
>>>>>>>>> for our last telecon; discussion of P2093R5
>>>>>>>>> <https://wg21.link/p2093r5> (Formatted output) and P2295R2
>>>>>>>>> <https://wg21.link/p2295r3> (Support for UTF-8 as a portable
>>>>>>>>> source file encoding). I've retained P2348R0
>>>>>>>>> <https://wg21.link/p2348r0> on the agenda, though I don't expect
>>>>>>>>> that we'll get to it.
>>>>>>>>>
>>>>>>>>> With regard to P2093R5 <https://wg21.link/p2093r5>, the current
>>>>>>>>> status is that LEWG has referred the paper back to SG16 for further
>>>>>>>>> discussion; please see the LEWG meeting minutes here
>>>>>>>>> <https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06>.
>>>>>>>>> Specifically, LEWG would benefit from additional analysis of previously
>>>>>>>>> deferred questions
>>>>>>>>> <http://lists.isocpp.org/lib-ext/2021/03/18189.php> regarding
>>>>>>>>> character encoding concerns, transcoding requirements (or the lack there
>>>>>>>>> of) and the ensuing consequences (or lack there of).
>>>>>>>>>
>>>>>>>>> 1. How errors in transcoding should be handled. E.g., when
>>>>>>>>> transcoding from UTF-8 to a UTF-16 based console interface and the UTF-8
>>>>>>>>> input is not well-formed.
>>>>>>>>> 2. The choice to base behavior on the compile-time choice of
>>>>>>>>> literal encoding. An implication of the current proposal is that a program
>>>>>>>>> that contains only ASCII characters in string literals will change behavior
>>>>>>>>> depending on whether the literal encoding is UTF-8 vs ASCII (or some other
>>>>>>>>> ASCII derived encoding).
>>>>>>>>> 3. Whether transcoding to the console interface encoding
>>>>>>>>> should be performed when the literal encoding is not UTF-8.
>>>>>>>>> 4. What the implications are for future support of std::print("{}
>>>>>>>>> {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text",
>>>>>>>>> U"UTF-32 text").
>>>>>>>>>
>>>>>>>>> I think these concerns will be easier to resolve if we first reach
>>>>>>>>> consensus regarding scenarios in which localized text may be provided in an
>>>>>>>>> unexpected encoding. The following is a slightly modified example of code
>>>>>>>>> Hubert previously provided. The example has been modified to explicitly
>>>>>>>>> opt into localized chrono formatting.
>>>>>>>>>
>>>>>>>>> std::print("{:L%p}\n",
>>>>>>>>> std::chrono::system_clock::now().time_since_epoch());
>>>>>>>>>
>>>>>>>>> At issue is the encoding used by locale sensitive chrono
>>>>>>>>> formatters. The example above contains the %p specifier and is
>>>>>>>>> locale sensitive because AM/PM designations may be localized. In a Chinese
>>>>>>>>> locale the desired translation of "PM" is "下午", but the locale will provide
>>>>>>>>> the translation in the locale encoding. As specified in P2093R5, if the
>>>>>>>>> literal encoding is UTF-8, than std::print() will expect the
>>>>>>>>> translation to be provided in UTF-8, but if the locale is not UTF-8-based
>>>>>>>>> (e.g., Big5; perhaps Shift-JIS for the Japanese 午後 translation), then the
>>>>>>>>> result is mojibake.
>>>>>>>>>
>>>>>>>>> I had previously suggested the following possible directions we
>>>>>>>>> can investigate to resolve the encoding concerns.
>>>>>>>>>
>>>>>>>>> - Specialize std::locale facets
>>>>>>>>> <https://en.cppreference.com/w/cpp/locale/locale> and related
>>>>>>>>> I/O manipulators like std::put_time()
>>>>>>>>> <https://en.cppreference.com/w/cpp/io/manip/put_time> for
>>>>>>>>> char8_t. This would allow std::print() to, when the literal
>>>>>>>>> encoding is UTF-8, opt-in to use of the UTF-8/char8_t facets
>>>>>>>>> and I/O manipulators.
>>>>>>>>> - When the literal encoding is UTF-8, stipulate that running
>>>>>>>>> the program in a non-UTF-8 based locale is non-conforming. This would
>>>>>>>>> effectively require MSVC programmers to, when building code with the
>>>>>>>>> /utf-8 option, to also force selection of a UTF-8 code page
>>>>>>>>> via a manifest
>>>>>>>>> <https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page>
>>>>>>>>> and require use of Windows 10 build 1903 or later.
>>>>>>>>> - When the literal encoding is UTF-8, specify that non-UTF-8
>>>>>>>>> based locale dependent translations be implicitly transcoded (such
>>>>>>>>> transcoding should never result in errors except perhaps for memory
>>>>>>>>> allocation failures).
>>>>>>>>> - Drop the special case handling for the literal encoding
>>>>>>>>> being UTF-8 and specify that, when bypassing a stream to write directly to
>>>>>>>>> the console, that the output be implicitly transcoded from the current
>>>>>>>>> locale dependent encoding (whatever it is) to the console encoding (UTF-8).
>>>>>>>>>
>>>>>>>>> If we get through all of that, we'll review Corentin's updates in
>>>>>>>>> P2295R2 <https://wg21.link/p2295r3> to address prior SG16
>>>>>>>>> feedback. Thank you to everyone that already provided additional feedback
>>>>>>>>> on the mailing list!
>>>>>>>>>
>>>>>>>>> Tom.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> SG16 mailing list
>>>>>>>>> SG16_at_[hidden]
>>>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>>>>>>
>>>>>>>> --
>>>>>>>> SG16 mailing list
>>>>>>>> SG16_at_[hidden]
>>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>

Received on 2021-05-27 11:50:24