<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 26, 2021 at 6:19 PM Tom Honermann via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org">sg16@lists.isocpp.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <div>On 4/19/21 10:58 AM, Tom Honermann via
      SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <p>SG16 will hold a telecon on Wednesday, April 28th at 19:30 UTC
        (<a href="https://www.timeanddate.com/worldclock/converter.html?iso=20210428T193000&amp;p1=1440&amp;p2=tz_pdt&amp;p3=tz_mdt&amp;p4=tz_cdt&amp;p5=tz_edt&amp;p6=tz_cest" target="_blank">timezone
          conversion</a>).</p>
      <p>The agenda is:</p>
      <ul>
        <li><a href="https://wg21.link/p2093r5" target="_blank">P2093R5:
            Formatted output</a></li>
        <li><a href="https://isocpp.org/files/papers/P2348R0.pdf" target="_blank">P2348R0:
            Whitespaces Wording Revamp</a><br>
        </li>
      </ul>
      <p>LEWG discussed P2093R5 at their 2021-04-06 telecon and decided
        to refer the paper back to SG16 for further discussion.  LEWG
        meeting minutes are available <a href="https://wiki.edg.com/bin/view/Wg21telecons2021/P2093#Library-Evolution-2021-04-06" target="_blank">here</a>;
        please review them prior to the telecon.  LEWG reviewed the list
        of prior SG16 deferred questions posted to them <a href="http://lists.isocpp.org/lib-ext/2021/03/18189.php" target="_blank">here</a>. 
        Of those, they established consensus on an answer for #2 (they
        agreed not to block <tt>std::print()</tt> on a proposal for
        underlying terminal facilities), but referred the rest back to
        us.  My interpretation of their actions is that LEWG would like
        a revision of the paper to address these concerns based on SG16
        input (e.g., discuss design options and SG16 consensus or lack
        thereof).  We&#39;ll therefore focus on these questions at this
        telecon.</p>
      <p>Hubert provided the following very interesting example usage.</p>
      <p><tt>std::print(&quot;{:%r}\n&quot;,
          std::chrono::system_clock::now().time_since_epoch());</tt></p>
      <p>At issue is the encoding used by locale sensitive chrono
        formatters.  Search <a href="http://eel.is/c++draft/time.format" target="_blank">[time.format]</a>
        for &quot;locale&quot; to find example chrono format specifiers that are
        locale dependent.  The example above contains the <tt>%r</tt>
        specifier and is locale sensitive because AM/PM designations may
        be localized.  In a Chinese locale the desired translation of
        &quot;PM&quot; is &quot;下午&quot;, but the locale will provide the translation in the
        locale encoding.  As specified in P2093R5, if the execution
        (literal) encoding is UTF-8, than <tt>std::print()</tt> will
        expect the translation to be provided in UTF-8, but if the
        locale is not UTF-8-based (e.g., Big5; perhaps Shift-JIS for the
        Japanese 午後 translation), then the result is mojibake. This is a
        good example of how locale conflates translation and character
        encoding.</p>
      <p>Addressing the above will be our first order of business. 
        Please reserve some time to independently think about this
        problem (ignore responses to this message for a few days if you
        need to).  I am explicitly not listing possible approaches to
        address this concern in this message so as to avoid adding
        (further) bias in any specific direction.  I suspect the answers
        to the previously deferred SG16 questions will be easier to
        answer once this concern is resolved.</p>
    </blockquote>
    <p>Now that we&#39;ve all had some time to think about this issue, here
      are some possible directions we can pursue to resolve it.  These
      are presented in no particular order.<br>
    </p>
    <ul>
      <li>Specialize <a href="https://en.cppreference.com/w/cpp/locale/locale" target="_blank"><tt>std::locale</tt>
          facets</a> and related I/O manipulators like <a href="https://en.cppreference.com/w/cpp/io/manip/put_time" target="_blank"><tt>std::put_time()</tt></a>
        for <tt>char8_t</tt>.  This would allow <tt>std::print()</tt>
        to, when the literal encoding is UTF-8, opt-in to use of the
        UTF-8/<tt>char8_t</tt> facets and I/O manipulators.<br>
      </li>
      <li>When the literal encoding is UTF-8, stipulate that running the
        program in a non-UTF-8 based locale is non-conforming.  This
        would effectively require MSVC programmers to, when building
        code with the <tt>/utf-8</tt> option, to also <a href="https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page" target="_blank">force
          selection of a UTF-8 code page via a manifest</a> and require
        use of Windows 10 build 1903 or later.</li>
      <li>When the literal encoding is UTF-8, specify that non-UTF-8
        based locale dependent translations be implicitly transcoded
        (such transcoding should never result in errors except perhaps
        for memory allocation failures).<br>
      </li>
      <li>Drop the special case handling for the literal encoding being
        UTF-8 and specify that, when bypassing a stream to write
        directly to the console, that the output be implicitly
        transcoded from the current locale dependent encoding (whatever
        it is) to the console encoding (UTF-8). </li></ul></div></blockquote><div><br></div><div>We have 2 things to explain to LEWG for print. And we do not need to operate change to the design, just to explain things to them in a terms they can understand (and they want to rely on our expertise which</div><div>implies consensus among ourselves)</div><div><br></div><div>1. It is always non-sense to interpret a string in encoding X when it is in fact not.</div><div>2. From there, if a string literal is in UTF-8, we HAVE to assume the execution encoding is also utf-8. Why rely on the literal encoding and not execution? it is resilient to call to setlocale and more efficient. Also, format strings are likely to be literals.</div><div>3. From there if that string is displayed on a terminal/console/screen/tty, it is text. So it has to be rendered correctly. On a specific system (windows) there is a way to enforce that. Because windows has a separate mechanism for unicode display and console handling that exists independently of the C++ execution encoding.</div><div>4. &quot;we have to assume&quot; in 2. implies a precondition. That is true REGARDLESS of utf-8 or not. in all cases the format string has to be interpreted as text, which assumes it is valid in the execution encoding. CF the Microsoft STL issue for braces in shift JS.</div><div>5. This means that converting to UTF-16 on windows for the purpose of console display is always valid (no &quot;&quot;transcosding&quot;&quot; error) within the contract of the function, and as such does not have to be specified. Preconditions violations are UB within the standard library and we should keep doing that. In practice the implementation (which is here the terminal, not the stl) will do character replacement the best it can, or render something horrible.</div><div><br></div><div>The locale in there is a red herring. Changing the execution encoding is always dicey - all  strings that were correctly interpreted correctly before the locale change are potentially no longer</div><div>correctly interpreted because their encoding no longer matches the new execution encoding.</div><div>The existence of a setlocale function doesn&#39;t imply that calling it leads to sensible results if the locale change also changes the encoding :) </div><div><br></div><div><br></div><div>&gt; Specialize <a href="https://en.cppreference.com/w/cpp/locale/locale" target="_blank"><tt>std::locale</tt>
          facets</a> and related I/O manipulators like <a href="https://en.cppreference.com/w/cpp/io/manip/put_time" target="_blank"><tt>std::put_time()</tt></a>
        for <tt>char8_t</tt>.  This would allow <tt>std::print()</tt>
        to, when the literal encoding is UTF-8, opt-in to use of the
        UTF-8/<tt>char8_t</tt> facets and I/O manipulators.</div><div><br></div><div>This is a different issue, one Peter and I have discussed. we should not try to shove char into char8_t. Both char8_t and utf-8 char are valid use cases. Also, the whole point of fmt::print is to avoid all of that :)</div><div><br></div><div>&gt; When the literal encoding is UTF-8, stipulate that running the
        program in a non-UTF-8 based locale is non-conforming.  This
        would effectively require MSVC programmers to, when building
        code with the <tt>/utf-8</tt> option, to also <a href="https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page" target="_blank">force
          selection of a UTF-8 code page via a manifest</a> and require
        use of Windows 10 build 1903 or later.</div><div><br></div><div>If you program contains literals that are not correctly interpreted by the execution encoding, the behavior of your program cannot be correct &lt;insert scary U word&gt;. So they should probably do that but it seems out of scope.</div><div>The literalS encoding and the execution encoding should be consistent (each string literal should be correctly interpreted).</div><div><br></div><div>&gt; When the literal encoding is UTF-8, specify that non-UTF-8
        based locale dependent translations be implicitly transcoded</div><div>Sorry, can you detail what you mean? I do not understand, sorry<br></div><div><br></div><div>&gt; Drop the special case handling for the literal encoding being UTF-8 and specify that, when bypassing a stream to write directly to the console, that the output be implicitly transcoded from the current locale dependent encoding (whatever it is) to the console encoding (UTF-8). </div><div><br></div><div>Dropping the special case seems more difficult in terms of wording.</div><div>If everything else fails, Microsoft could do the sensible thing as a matter of QOL.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
    <p>Please feel free to comment on these, or additional, approaches
      before our meeting on Wednesday.</p>
    <p>I think it would benefit LEWG if a revision of the paper
      presented each of these possibilities, the consequences, and the
      rationale (and hopefully SG16 consensus) for the proposed
      direction.<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite">
      <p>I do not intend to time limit discussion of P2093R5 as I
        believe this is an important matter to resolve.  If we are able
        to complete discussion of P2093R5, then we&#39;ll discuss P2348R0.<br>
      </p>
      <p>Tom.<br>
      </p>
      <br>
      <fieldset></fieldset>
    </blockquote>
    <p><br>
    </p>
  </div>

-- <br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a><br>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
</blockquote></div></div>

