<div dir="ltr">This is happening now</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 11, 2023 at 9:49 PM Tom Honermann via SG16 &lt;<a href="mailto:sg16@lists.isocpp.org">sg16@lists.isocpp.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>This is your friendly reminder that this meeting is taking place
      tomorrow.</p>
    <p>Tom.<br>
    </p>
    <div>On 4/7/23 3:01 PM, Tom Honermann via
      SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <p>SG16 will hold a telecon on Wednesday, April 12th, at 19:30 UTC
        (<a href="https://www.timeanddate.com/worldclock/converter.html?iso=20230412T193000&amp;p1=1440&amp;p2=tz_pt&amp;p3=tz_mt&amp;p4=tz_ct&amp;p5=tz_et&amp;p6=tz_cest" target="_blank">timezone
          conversion</a>).</p>
      <p><b>For those in central Europe, please note that daylight
          savings time began since we last met, so this telecon will
          begin one hour later relative to the last telecon.</b></p>
      <p>The agenda follows.</p>
      <ul>
        <li><a href="https://wg21.link/p2728r0" target="_blank">P2728R0</a>:
          Unicode in the Library, Part 1: UTF Transcoding</li>
        <ul>
          <li>Continue discussion.</li>
        </ul>
      </ul>
      <p>Discussion during the <a href="https://github.com/sg16-unicode/sg16-meetings#march-22nd-2023" target="_blank">2023-03-22
          SG16 telecon</a> included the following topics:<br>
      </p>
      <ul>
        <li>Use of CTAD vs use of factory functions.</li>
        <li>View adapters that place constraints on the underlying range
          but don&#39;t otherwise apply any adaptation (e.g., <font face="monospace">as_uf8()</font>).<br>
        </li>
        <li>Lack of error handling policies for the transcoding
          algorithms.</li>
        <li>Lack of convenient interfaces for handling code unit
          sequences that straddle a buffer boundary (due to network
          provided or segmented data).</li>
        <li>Whether or how to expose the transcoding iterator type
          unpacking functionality.</li>
        <li>Use of <font face="monospace">char32_t</font> vs other
          types for holding Unicode code point values.</li>
        <li>Whether and how to optimize the design for types
          historically used for character data vs the <font face="monospace">charN_t</font> types.</li>
        <li>The lack of standard library support for <font face="monospace">charN_t</font> types and the impact to <font face="monospace">charN_t</font> adoption.<br>
        </li>
        <li>Designing for composability through the use of elementary
          building blocks.</li>
        <li>The possibility of removing the front, back, and insert
          iterators in favor of an iterator adapter.</li>
        <li>The possibility of removing the full set of UTF converting
          iterators.</li>
        <li>The need for first class support of UTF-8 data in <font face="monospace">char</font>-based storage, possibly
          contingent on the choice of literal encoding.<br>
        </li>
        <li>Locale considerations and Python&#39;s move to <font face="monospace">C.UTF-8</font> as its default locale.<br>
        </li>
      </ul>
      <p>Note that many of these topics are more LEWG concerns than they
        are SG16 concerns. I think that is ok; the designs we forward
        should be guided by our expectations of what LEWG will find
        agreeable.</p>
      <p>My impression of current consensus based on recent discussion
        is that we wish to be forward looking and focus on support for <font face="monospace">charN_t</font> types with support for other
        types provided by wrappers, adapters, casts, etc... I&#39;d like to
        poll this.</p>
      <p>With regard to segmented data and handling of partial code unit
        sequences at the end of a segment, there are at least two
        concerns; 1) how to transition the boundary without treating the
        partial sequence as an error, and 2) how to handle the
        transition efficiently. Network buffers or data structure
        segments may provide contiguous data that can be processed
        optimally, but such optimizations cannot be applied to the
        entire sequence due to the segmentation. JeanHeyd&#39;s work in <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3095.htm" target="_blank">WG14
          N3095 (Restartable and Non-Restartable Functions for Efficient
          Character Conversions)</a> enables such data to be optimally
        processed by storing partial sequences in <font face="monospace">mbstate_t</font> instances and allowing for
        continuation with another buffer; these are not iterator-based
        interfaces. The interfaces proposed in <a href="https://wg21.link/p2728r0" target="_blank">P2728R0</a>
        cannot support such optimizations; at least not until support
        for segmented data concepts is added to the ranges library to
        allow for the identification of contiguous segments (we could
        recognize range-of-ranges designs, but not range designs where
        segmentation is an internal iterator detail). I&#39;d like to
        discuss whether we are comfortable with these limitations or
        whether we would prefer to wait for a partially-contiguous range
        specification so that maximally performant functionality can be
        provided in a range-based interface.</p>
      <p>I&#39;d like to spend time discussing the viability of transcoding
        output iterators like <font face="monospace">utf_8_to_32_out_iterator</font>
        and <font face="monospace">utf_16_to_32_out_iterator</font>.
        The issue is that writing a partial code unit sequence to them
        doesn&#39;t produce an output, so it isn&#39;t clear what happens if no
        further input is ever provided. Is the partial sequence silently
        lost? Does the iterator&#39;s destructor throw an exception or
        otherwise signal an error?<br>
      </p>
      <p> Candidate polls:</p>
      <ol>
        <li>UTF transcoding interfaces provided by the C++ standard
          library should operate on <font face="monospace">charN_t</font>
          types with support for other types provided by adapters.</li>
        <li>The association of a UTF-8 encoding with a sequence of <font face="monospace">char</font> must be explicit in the source
          code unless the literal encoding is UTF-8.</li>
        <li>The association of a UTF-16 or UTF-32 encoding with a
          sequence of <font face="monospace">wchar_t</font> must be
          explicit in the source code unless the wide literal encoding
          is UTF-16 or UTF-32.<br>
        </li>
        <li><font face="monospace">char32_t</font> should be used as the
          Unicode code point type within the C++ standard library.<br>
        </li>
        <li>Low level transcoding facilities (<a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3095.htm" target="_blank">WG14
            N3095</a>) suffice for high speed handling of segmented data
          structures with contiguous segments; high level facilities can
          rely on iterators to abstract such structures.</li>
        <li><i>M</i>x<i>N</i> conversions where <i>M</i> is larger than
          <i>N</i> (e.g., UTF-8 -&gt; UTF-32) shall be performed by
          view/iterator input adapters, not by output adapters.<br>
        </li>
      </ol>
      <p>Tom.<br>
      </p>
      <br>
      <fieldset></fieldset>
    </blockquote>
  </div>

-- <br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org" target="_blank">SG16@lists.isocpp.org</a><br>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
</blockquote></div>

