<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 10, 2022 at 6:01 AM Tom Honermann &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <div>On 10/2/22 5:45 PM, Corentin Jabot via
      SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">On the second poll,  I&#39;ll copy the message I sent
        before the meeting
        <div><br>
        </div>
        <div>--</div>
        <div><br>
        </div>
        <div>There are further issues here.<br>
          The width of grapheme is independent of encodings.<br>
          We are just not forcing implementation not to decode. Is that
          what we want?<br>
          I don&#39;t think it is useful.<br>
          Most encodings cannot represent any of the wide codepoints,
          the wideness of codepoints in shift jis can be derived without
          doing a full decoding.<br>
          <br>
          Suggested resolution:<br>
          For a string decoded to a sequence of unicode codepoints, its
          width is the sum of estimated widths of the first code points
          in its extended grapheme clusters.<br>
          <br>
          If the intent is for implementers to throw their hands in the
          air when the encoding is not &quot;a unicode encoding&quot;, then surely<br>
          we want to support UTF-8/16/32 and that&#39;s it. UTF-EBCDIC isn&#39;t
          more important or special than shift-jis and there is no
          reason for one encoding to have privileged handling over the
          other.<br>
        </div>
      </div>
    </blockquote>
    I think that is where we ended up; the intent is only to specify
    behavior for UTF-8, UTF-16, and UTF-32. I think the best we could do
    for encodings that are not defined by the C++ standard or one of its
    normative references would be to add normative guidance to do
    likewise for all implementation-defined encodings; in which case
    there would be no need to restrict guidance to Unicode encodings; we
    could simply specify widths for characters independently of how they
    are mapped to any specific encoding.<br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
          <br>
          More generally, any unicode that can round trip through
          Unicode should qualify as Unicode encoding, but I don&#39;t think
          we have a definition of that anywhere.<br>
          Unicode defines Unicode Encoding Form<br>
          &gt; A character encoding form that assigns each Unicode
          scalar value to a unique code unit sequence<br>
        </div>
        <div><br>
        </div>
        <div>--</div>
        <div><br>
        </div>
        <div>Ie, I don&#39;t think the poll solves anything, it just uses a
          different terminology to describe the same thing (nothing in
          iso 10646 leads me to believe that &quot;ucs encoding scheme&quot; can
          only designate ucs encoding forms specified in iso 10646 - in
          addition of being obscure terminology).</div>
      </div>
    </blockquote>
    It solves the issue that there is no definition for &quot;Unicode
    encoding&quot;. &quot;UCS encoding scheme&quot; at least has a definition. If the
    definition in ISO/IEC 10646 is not clear, then I would argue that is
    a concern to raise with WG2.<br></div></blockquote><div><br></div><div>It&#39;s clear, just not limited in the way that we want.</div><div>I think we would be much better off by saying &quot;For UTF-8, UTF-16 and UTF-32&quot;. It doesn&#39;t leave much room for confusion</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>On GB18030, it&#39;s a different character set, with its own
          set of encodings</div>
      </div>
    </blockquote>
    <p>I&#39;ve been under the impression that, as of GB 18030-2022, use of
      the PUA is no longer required because all GB 18030 specified
      characters are now represented in Unicode. In other words, the
      Unicode repertoire is a superset of the GB 18030 repertoire. Is
      that not correct? Its specified encodings are, of course,
      distinct.</p></div></blockquote><div>Yes, I do believe that, as of this year, all characters representable in GB 18030 can be represented in Unicode.</div><div>But the fact that Unicode is a superset makes any of the GB18030 encodings not suitable to represent Unicode. <br></div><div>Maybe I&#39;m being overly pedantic here.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"><br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Sun, Oct 2, 2022 at 10:51
          PM Tom Honermann via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org" target="_blank">sg16@lists.isocpp.org</a>&gt;
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p>The summary for the SG16 meeting held September 28th,
              2022 is now available.  For those that attended, please
              review and suggest corrections.</p>
            <ul>
              <li><a href="https://github.com/sg16-unicode/sg16-meetings/#september-28th-2022" target="_blank">https://github.com/sg16-unicode/sg16-meetings/#september-28th-2022</a></li>
            </ul>
            <p>Two polls were taken during this meeting.</p>
            <p>The first was for <a href="https://cplusplus.github.io/LWG/issue3767" target="_blank">LWG #3767
                (codecvt&lt;charN_t, char8_t, mbstate_t&gt; incorrectly
                added to locale</a>) to establish consensus on whether
              the <font face="monospace">codecvt</font> facets
              mentioned in the issue are intended to be locale
              sensitive. The established position has been conveyed to
              LWG via <a href="https://github.com/cplusplus/papers/issues/1310" target="_blank">GitHub issue 1310</a>.<br>
            </p>
            <p>The second was for <a href="https://cplusplus.github.io/LWG/issue3412" target="_blank">LWG #3412
                (§[format.string.std] references to &quot;Unicode encoding&quot;
                unclear</a>) to establish consensus on a direction for a
              proposed resolution.<br>
            </p>
            Tom. </div>
          -- <br>
          SG16 mailing list<br>
          <a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a><br>
          <a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset></fieldset>
    </blockquote>
  </div>

</blockquote></div></div>

