<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 7 Jan 2024, 02:14 Tom Honermann, &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>

  
    
  
  <div>
    <div>On 1/6/24 2:23 PM, Jonathan Wakely via
      SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">
        <div><br>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Sat, 6 Jan 2024, 17:37
              Jens Maurer, &lt;<a href="mailto:jens.maurer@gmx.net" target="_blank" rel="noreferrer">jens.maurer@gmx.net</a>&gt;
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
              <br>
              On 06/01/2024 00.40, Jonathan Wakely wrote:<br>
              &gt;<br>
              &gt;<br>
              &gt; On Fri, 5 Jan 2024 at 20:46, Jens Maurer &lt;<a href="mailto:jens.maurer@gmx.net" rel="noreferrer noreferrer" target="_blank">jens.maurer@gmx.net</a>
              &lt;mailto:<a href="mailto:jens.maurer@gmx.net" rel="noreferrer noreferrer" target="_blank">jens.maurer@gmx.net</a>&gt;&gt;
              wrote:<br>
              &gt;<br>
              &gt;<br>
              &gt;<br>
              &gt;     On 05/01/2024 18.35, Jonathan Wakely via SG16
              wrote:<br>
              &gt;     &gt;<br>
              &gt;     &gt;<br>
              &gt;     &gt; On Fri, 5 Jan 2024, 16:47 Mark de Wever,
              &lt;<a href="mailto:koraq@xs4all.nl" rel="noreferrer noreferrer" target="_blank">koraq@xs4all.nl</a>
              &lt;mailto:<a href="mailto:koraq@xs4all.nl" rel="noreferrer noreferrer" target="_blank">koraq@xs4all.nl</a>&gt;
              &lt;mailto:<a href="mailto:koraq@xs4all.nl" rel="noreferrer noreferrer" target="_blank">koraq@xs4all.nl</a>
              &lt;mailto:<a href="mailto:koraq@xs4all.nl" rel="noreferrer noreferrer" target="_blank">koraq@xs4all.nl</a>&gt;&gt;&gt;
              wrote:<br>
              &gt;     &gt;<br>
              &gt;     &gt;     On Fri, Jan 05, 2024 at 04:26:49PM
              +0000, Jonathan Wakely via SG16 wrote:<br>
              &gt;     &gt;     &gt; Since the adoption of P2736 C++23
              and the current C++ working draft just<br>
              &gt;     &gt;     &gt; refer to &quot;the Unicode Standard&quot;,
              with a URL referring to the latest<br>
              &gt;     &gt;     &gt; version. We removed the
              bibliography entry for TR29 revision 35. P2736<br>
              &gt;     &gt;     &gt; gives the justification for this
              that the revision of #29 included in<br>
              &gt;     &gt;     &gt; Unicode 15 (revision 41) is just a
              bug fix, so there&#39;s no problem referring<br>
              &gt;     &gt;     &gt; to that instead.<br>
              &gt;     &gt;     &gt;<br>
              &gt;     &gt;     &gt; That might have been true last
              year, but the current Unicode Standard<br>
              &gt;     &gt;     &gt; (15.1.0) includes revision 43 of
              UAX #29, which makes significant changes<br>
              &gt;     &gt;     &gt; to the extended grapheme cluster
              breaking rules. A new state machine is<br>
              &gt;     &gt;     &gt; needed (and new lookup tables of
              properties) to implement rule GB9c. That&#39;s<br>
              &gt;     &gt;     &gt; not just a bug fix, is it?<br>
              &gt;     &gt;     &gt;<br>
              &gt;     &gt;     &gt; Are C++ implementations expected to
              implement rule GB9c, despite it not<br>
              &gt;     &gt;     &gt; being part of the standard when
              C++23 was published?<br>
              &gt;     &gt;<br>
              &gt;     &gt;     AFAIK this was indeed intended. The
              Unicode Standard moves at a faster<br>
              &gt;     &gt;     pace than the C++ Standard. This allows
              C++ to always use the latest<br>
              &gt;     &gt;     Unicode features and backport them to
              older language versions.<br>
              &gt;     &gt;<br>
              &gt;     &gt;<br>
              &gt;     &gt; Maybe the intent was to allow that, but the
              way I read it we *require* that. Is there wording that
              says that an implementation can choose which version to
              conform to?<br>
              &gt;     &gt;<br>
              &gt;     &gt; If not, what stops all existing
              implementations become non-conforming when a new version
              of unicode gets published?<br>
              &gt;<br>
              &gt;     Nothing, if the new version of Unicode changes
              behavior that C++<br>
              &gt;     refers to (as seems to be the case here).<br>
              &gt;<br>
              &gt;     My understanding is that this was intentional;
              ISO wants us to refer<br>
              &gt;     to undated standard if possible, too.<br>
              &gt;<br>
              &gt;     If we feel we should &quot;freeze&quot; the Unicode version
              for each C++ standard<br>
              &gt;     release, we could do that.  Implementer feedback
              is certainly welcome<br>
              &gt;     for that decision.<br>
              &gt;<br>
              &gt;<br>
              &gt; I think I&#39;d prefer if we just somehow say that
              implementations can define which Unicode standard they
              conform to. That way if a conforming C++23 implementation
              uses Unicode 15.1.0 (the latest version today) then it
              doesn&#39;t become non-conforming overnight when a new Unicode
              standard is published. We can recommend that
              implementations pin themselves to a recent Unicode
              standard, and even recommend that implementations should
              (if possible) update to use newer Unicode standards as
              they become available.<br>
              <br>
              Hm...  That&#39;s not how normative references are supposed to
              work in an ISO world,<br>
              I think (&quot;pick the version you want&quot; -- no), but we could
              certainly try that.<br>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I&#39;d be fine with &quot;C++23 refers to unicode
          15.0.0&quot;, or &quot;it is implementation defined which unicode
          standard a C++23 implementation conforms to&quot;, but I don&#39;t like
          the idea of C++23 being a moving target that changes meaning
          after publication.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">How do I even know which code points I can refer
          to with a universal-character-name in a portable C++23
          program? Doesn&#39;t that depend on the unicode version?</div>
      </div>
    </blockquote>
    <p>The code points that can be specified via <i><a href="http://eel.is/c++draft/lex.charset#nt:universal-character-name" target="_blank" rel="noreferrer">universal-character-name</a></i>
      don&#39;t change, but additional names may become available for use in
      <i><a href="http://eel.is/c++draft/lex.charset#nt:named-universal-character" target="_blank" rel="noreferrer">named-universal-character</a></i>.
</p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Oops sorry, that&#39;s what I meant. The \N{FOO} form.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p>      The Unicode stability policy ensures that such names never go away
      (even when erroneously specified). See <a href="https://www.unicode.org/policies/stability_policy.html#Name" target="_blank" rel="noreferrer">https://www.unicode.org/policies/stability_policy.html#Name</a>.<br>
    </p>
    <blockquote type="cite">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <br>
              &gt; But there&#39;s no way that a discontinued/EOL compiler
              version can get updated to a newer Unicode standard, which
              is what we seem to be requiring as a condition of being a
              conforming implementation.<br>
              <br>
              I don&#39;t think this problem arises in practice.  Do we have
              a conforming implementation<br>
              of C++ (which happens to be C++20 at this point in time)? 
              This will stop being conforming<br>
              in a few weeks when C++23 is published, at which point
              C++20 is considered withdrawn /<br>
              superseded.  And when C++23 is published, it will stay in
              force for about three years.<br>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">But compilers still offer support for previous
          standards. We don&#39;t say &quot;sorry, C++23 is out, you can&#39;t use
          -std=c++17 now&quot;.</div>
      </div>
    </blockquote>
    Wouldn&#39;t that be nice though :)<br>
    <blockquote type="cite">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto">Should I interpret &quot;C++23 requires you to use
          the latest unicode standard&quot; as only being true until 2026?
          That makes it tempting to not even try to conform to C++23
          until 2026, when it stops being a moving target ;-)</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">More seriously, I think what you&#39;re saying is
          that an implementation&#39;s &quot;C++20 mode&quot; is already a
          non-standard thing that has impl-defined meaning, because the
          standard only defines one version of C++ at a time. So an
          implementation can choose what its &quot;C++20 mode&quot; means, and
          pinning it to a version of unicode that was current in 2020 is
          OK. </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">But I still find it unsettling that the
          definition of &quot;C++&quot; will change under our feet between 2023
          and 2026. It effectively means that everything the unicode
          consortium does is immediately adopted as a DR against the
          current C++ standard with no involvement from WG21.</div>
      </div>
    </blockquote>
    <p>It is a fact that parts of the Unicode Standard will necessarily
      change as a byproduct of continually adding and improving support
      for the evolving collection of human languages. While we can
      choose to evolve C++ in some lockstep form with the Unicode
      Standard, users will nevertheless be exposed to differences in
      behavior at some point. It is far from clear to me that
      implementors and programmers benefit by having those changes
      happen at discrete points.</p>
    <p>From an implementation perspective, having C++23 mode use one
      Unicode version and C++26 mode use another version seems
      problematic, at least for implementations that don&#39;t provide
      distinct standard library implementations for each standard mode
      (as is the case for all major implementors).</p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Indeed.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
    <p>As a user, I would like and expect newer compiler versions to
      provide support for newer Unicode versions independent of whatever
      standard mode I happen to compile my code with.<br></p></div></blockquote></div></div><div dir="auto">Agreed, and recommending a minimum unicode version for each C++ standard would work for that. </div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p>
    </p>
    <p>ABI concerns are just as relevant for minor compiler upgrades as
      it is for major upgrades these days. Going forward, we should
      strive to ensure that Unicode features that don&#39;t have a strong
      stability policy are adequately hidden behind an ABI boundary. I
      don&#39;t recall having discussed use of the grapheme breaking
      algorithm in <font face="monospace">std::format</font> from an
      ABI perspective.</p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">In older versions of the algo from 2015 you could detect a break just by inspecting two characters at a time. The current algo requires a state machine, or at least some additional state to be tracked (at a minimum, a pointer to the start of the current cluster), and hundreds of bytes of new lookup tables.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
    <p>I think it makes sense to specify a minimum Unicode version for
      each C++ standard and I would not be opposed to adding such
      specification. However, it is possible that the choice of Unicode
      version might not always remain a choice that implementors make.
      As we add additional Unicode features to the C++ standard,
      implementors might find it desirable to rely on system provided
      Unicode services (e.g., by an OS provided build of ICU), at least
      for some features. I think we might be best off having the choice
      of Unicode version be implementation-defined and use of a recent
      version a QoI matter.<br></p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">That sounds reasonable.</div><div dir="auto"><br></div><div dir="auto">In practice, implementations are not going to always be able to use the very latest unicode standard, so we&#39;re just setting users up for disappointment if we say that the standard requires/guarantees it.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p>
    </p>
    <blockquote type="cite">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              Is there a conforming impplementation of C++23 already?<br>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Are you suggesting that because an
          implementation doesn&#39;t conform 100% to the standard yet, that
          it doesn&#39;t matter if remaining conforming is
          difficult/impractical?</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">That feels like &quot;until you conform, you don&#39;t
          get to complain that it&#39;s hard to conform&quot; :-)</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              Are compiler versions EOL&#39;d in three years?  At least for
              gcc, that doesn&#39;t seem to be<br>
              the case.<br>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Yes, it&#39;s just over 3 years of upstream support
          and fixes for each GCC release. GCC 10.1 was released 2020-05
          and then went EOL with 10.5 in 2023-07. GCC 11 was released
          2021 and will be EOL late this year. But a close-to-EOL
          release is not going to receive major updates to make it use a
          new unicode standard. In practice, I&#39;m probably not going to
          make such changes to a stable release branch at all. Once GCC
          14.1 is released in a few months, it might stick with unicode
          15.1.0 for its three year lifespan. So the window for making
          updates to a shipping release is smaller than 3 years.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Some vendors continue to support EOL releases
          past the end of upstream support (e.g. in an enterprise distro
          like RHEL). But they&#39;re unlikely to make significant code
          changes, like updating to use a new unicode standard.</div>
      </div>
    </blockquote>
    <p>I agree this is what will happen in practice. However, it seems
      like a tangent. The real question is whether Unicode behavior will
      differ for <font face="monospace">-std=c++23</font> mode for gcc
      14.1 vs gcc 19.1. I sure hope that it would!<br></p></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I hope so too :-)</div><div dir="auto"><br></div><div dir="auto"><br></div></div>

