<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Dec 4, 2021, 01:04 Tom Honermann &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div>
    <div>On 12/3/21 4:47 PM, Corentin Jabot
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">
        <div><br>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Fri, Dec 3, 2021, 22:03
              Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank" rel="noreferrer">tom@honermann.net</a>&gt; wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>On 12/1/21 2:28 PM, Corentin Jabot wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div dir="ltr"><br>
                    </div>
                    <br>
                    <div class="gmail_quote">
                      <div dir="ltr" class="gmail_attr">On Wed, Dec 1,
                        2021 at 8:13 PM Tom Honermann &lt;<a href="mailto:tom@honermann.net" rel="noreferrer noreferrer" target="_blank">tom@honermann.net</a>&gt;
                        wrote:<br>
                      </div>
                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                        <div>
                          <div>On 11/28/21 5:22 AM, Jens Maurer wrote:<br>
                          </div>
                          <blockquote type="cite">
                            <pre>On 28/11/2021 10.42, Corentin Jabot via SG16 wrote:
</pre>
                            <blockquote type="cite">
                              <pre>On Sun, Nov 28, 2021, 01:31 Tom Honermann via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org" rel="noreferrer noreferrer" target="_blank">sg16@lists.isocpp.org</a> <a href="mailto:sg16@lists.isocpp.org" rel="noreferrer noreferrer" target="_blank">&lt;mailto:sg16@lists.isocpp.org&gt;</a>&gt; wrote:
</pre>
                            </blockquote>
                            <blockquote type="cite">
                              <pre>     2. If the estimated width of the fill character is greater than 1, then alignment to the end of the available space might not be possible. The choice here is whether to under-fill or over-fill the available space. This possibility is avoided if fill characters are restricted to characters with an estimated width of exactly 1.
        std::format(&quot;{:🤡&gt;4}&quot;, 123);


Is there value in specifying it? Neither solutions are great nor terrible, i think saying unspecified would be fine, so would underfilling i guess.

Hopefully, we are consistent and choose option 1 among those specified in the lwg issue

    For P2286R3 <a href="https://wg21.link/p2286r3" rel="noreferrer noreferrer" target="_blank">&lt;https://wg21.link/p2286r3&gt;</a>, LEWG requested <a href="https://lists.isocpp.org/sg16/2021/11/2845.php" rel="noreferrer noreferrer" target="_blank">&lt;https://lists.isocpp.org/sg16/2021/11/2845.php&gt;</a> that SG16 consider the ramifications for support of user defined delimiters. We should also discuss the &quot;?&quot; specifier proposed to explicitly opt in to quoted and escaped formats for std::string, std::string_view, and arrays of char/wchar_t.

Not sure the quoted thing is in our purview.

For the delimiter, we should support codepoints, to be consistent with everything else. The issue is the we don&#39;t have experience with that afaik.
</pre>
                            </blockquote>
                            <pre>But the compile-time format string parser might not necessarily understand
the details of the literal encoding, so it&#39;s unclear how codepoints map to
code units.  Or are you saying that the rest of std::format already requires
detailed understanding, anyway?</pre>
                          </blockquote>
                          <p>I believe the compile-time format string
                            parser is already required to understand
                            such details. For example, if the literal
                            encoding is Shift-JIS, then the parser would
                            need to be able to differentiate byte values
                            that appear as lead code units vs trailing
                            code units (since, for example, a 0x5C code
                            unit denotes the &#39;\&#39; character if it is a
                            lead code unit, but that value may also
                            appear as a trailing code unit for a double
                            byte character).</p>
                        </div>
                      </blockquote>
                      <div>I think Jens is right. MSVC does handle
                        Shift-JIS specifically but I&#39;m not sure we
                        can/should mandate something that work
                        universally, the burden on implementation could
                        be high)</div>
                    </div>
                  </div>
                </blockquote>
                <p>Are you suggesting that we should revisit the
                  consensus for the proposed resolution for <a href="https://cplusplus.github.io/LWG/issue3576" rel="noreferrer noreferrer" target="_blank">LWG3576</a> from our <a href="https://github.com/sg16-unicode/sg16-meetings#august-25th-2021" rel="noreferrer noreferrer" target="_blank">2021-08-25 telecon</a>?</p>
              </div>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I am concerned about implementability</div>
        <div dir="auto">The current resolution calls for a compile time
          mechanism to read a codepoint for arbitrary encoding.</div>
        <div dir="auto">Such mechanism currently doesn&#39;t exist.</div>
        <div dir="auto">For an implementation like GCC, the generic
          solution would be to expose iconv facilities through builtins
          (the equivalent of mblen or mbrtocX at least, i think, as
          Hubert pointed out)</div>
        <div dir="auto">This seems... A lot to ask in an issue
          resolution.</div>
        <div dir="auto">I don&#39;t remember if that was considered last
          time or if it constitute new information in anyway but we
          might want to bring that up again.</div>
      </div>
    </blockquote>
    <p>I believe this requirement is already the status quo. Let me
      provide a better example than I did previously.</p>
    <p><font face="monospace">std::format(&quot;&lt;text&gt;&quot;);</font></p>
    <p>If the literal encoding is not self-synchronizing then <font face="monospace">&lt;text&gt;</font> may contain code units that
      correspond to the (single) code unit for &#39;{&#39; but that do not
      encode the &#39;{&#39; character. This can happen due to DBCS or
      shift-state encoding. An implementation needs to be able to
      recognize this case (for effected encodings) in order to avoid
      incorrectly interpreting the text as containing an introducer for
      a replacement field.<br></p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I am well aware.</div><div dir="auto">I wonder if we understood that fully (compile time support and codepoint semantics were decision taken at about the same time independently of one another). I do not recall realizing that we were asking for full blown constexpr codepoint decode.</div><div dir="auto"><br></div><div dir="auto">I think I&#39;d like to get input from implementers.</div><div dir="auto">If I understand this msvc PR, support for compile time non UTF-8 multi bytes encoding is limited </div><div dir="auto"><a href="https://github.com/microsoft/STL/pull/2221">https://github.com/microsoft/STL/pull/2221</a></div><div dir="auto"><br></div><div dir="auto">I am not opposed to the direction to be clear, but I am reluctant to go further down this road without implementers support. We are asking a lot.</div><div dir="auto"><br></div><div dir="auto">For reasons, the work to add EBCDIC to clang has a home grown encoder, for example, as clang cares about environments where iconv is not present.</div><div dir="auto">This direction would likely, in addition to add constexpr builtins mandate that someone writes an EBCDIC -&gt; utf decoder in clang or libc++.</div><div dir="auto"><br></div><div dir="auto">It makes me wonder if some of these features should be restricted to u8 formatting strings 😅</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">It might turn out to be a non issue, but it&#39;s worth making sure we are all on the same page.</div><div dir="auto"><br></div><div dir="auto">Charlie, opinion?</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <p>Tom.<br>
                </p>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div class="gmail_quote">
                      <div> </div>
                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                        <div>
                          <p>I agree with Corentin that delimiters
                            should be restricted to code points. That is
                            consistent with the direction we have
                            already advocated for fill characters in <a href="https://cplusplus.github.io/LWG/issue3576" rel="noreferrer noreferrer" target="_blank">LWG3576</a>.<br>
                          </p>
                          <p>Tom.<br>
                          </p>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </blockquote>
                <p><br>
                </p>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </div>

</blockquote></div></div></div>

