<div dir="ltr"><p>Thanks for providing this feedback. Here are my thoughts:</p>
<blockquote>
<p>But the general idea of a small cache is applicable; when the 
underlying range (only) models input range, the (specialized) iterator 
can hold a (4 byte) input buffer just as is done for the output code 
unit buffer. Unlike the output code unit buffer, there is no buffer 
index to maintain since base() would always return an iterator to the 
beginning of that buffer.</p>
</blockquote>
<p>The closest thing I can think of to a precedent that justifies this 
approach is that the views API can have range adaptors perform 
optimizations that result in <code>.base()</code> not returning an iterator to the view that was passed in to the range adaptor. For example, passing an instance of <code>std::ranges::reverse_view</code> to <code>std::views::reverse</code> yields the base of the <code>std::ranges::reverse_view</code> instead of a <code>reverse_view</code> of a <code>reverse_view</code>; so, when you invoke <code>.base()</code>, you don&#39;t get a <code>std::reverse_iterator</code>. P2728 takes advantage of this to enable double-transcode optimizations.</p>
<p>However, I don&#39;t think we have precedent for a view type giving out an iterator from <code>.base()</code>
 whose type is unrelated to the iterator type of the underlying view. 
That approach seems like it violates the expectations users might have 
of the way that <code>.base()</code> works.</p>
<p>In <a href="https://isocpp.org/files/papers/P2728R13.html">P2728R13</a> I just removed <code>.base()</code> for non-forward input ranges.</p>
<blockquote>
<p>The text_view iterators also expose a base_range() member that 
returns a range of the underlying code unit sequence corresponding to 
base() + code-unit-sequence-length (which I think is equivalent to 
to_increment_ in P2728). Is there a reason not to expose such a member? 
As is, it appears that obtaining that range would require constructing a
 subrange using base() from one iterator and base() from another 
iterator that has advanced to the next character. Such a subrange would 
not be valid in the case of specialized input iterators that use an 
input buffer cache as I suggested above (the two iterators would not 
point in to the same range).</p>
</blockquote>
<p>In a previous revision of the paper (P2728R7), rather than having <code>_or_error</code> views that give out <code>std::expected</code> as the <code>value_type</code>, I tried to address error handling with a <code>.success()</code> member function on the iterator that gave out <code>std::expected&lt;void, utf_transcoding_error&gt;</code>. I was advised by the chair at that timethat adding member functions other than <code>.base()</code>
 was objectionable to SG9, because users now have built an expectation 
that they can implement classes that wrap views by providing a limited 
set of member functions, which includes <code>.base()</code> but which 
does not include any novel designs. Unfortunately, I can&#39;t point to the 
minutes, since I was given this advice during an &quot;unofficial&quot; session 
during Wrocław. I would worry about experiencing similar resistance to 
the idea of adding a <code>.base_range()</code> member function.</p>
<p>However, I currently haven&#39;t seen any use cases that <code>.base_range()</code> would enable that can&#39;t be implemented using <code>.base()</code>, other than input ranges, of course. In the previous telecon, I presented examples of sophisticated use cases for <code>.base()</code>,
 which are now added to P2728R13 in cleaner form. These are the 
&quot;Transcoding into a buffer of a fixed number of code units without 
truncating code points&quot; and &quot;Performing code unit substitutions on 
cuneiform strings&quot; examples.</p>
<p>Furthermore, another problem with giving out views into an internal 
buffer in the transcoding iterator is that users will inevitably try to 
do the following:</p>
<ul><li>Store a view or iterator pointing into the cache buffer</li><li>Increment the transcoding iterator, invalidating the aforementioned cache buffer view/iterator</li><li>Obtain a new view/iterator to the cache buffer of the incremented transcoding iterator</li><li>Try to compare the new view to the first one</li></ul>
<p>This is a footgun.</p>
<p>Ultimately, I just don&#39;t think supporting <code>.base()</code> on 
input views is feasible right now. The current paper design doesn&#39;t 
provide it for input views, which still leaves the door open to adding 
it in the future if we change our minds about that fact.</p>
<p>Thanks,</p>
<p>Eddie</p><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, May 26, 2026 at 3:48 PM Tom Honermann &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  
    
  
  <div>
    <p>It sounds like we&#39;ll be continuing discussion of P2728R12
      tomorrow. I would like to discuss the approach suggested below as
      a resolution for the concerns raised last time regarding the
      behavior of <font face="monospace">base()</font> for input
      ranges. Please share any thoughts ahead of the meeting if
      possible.</p>
    <p>Tom.</p>
    <div>On 5/16/26 5:35 PM, Tom Honermann via
      SG16 wrote:<br>
    </div>
    <blockquote type="cite">
      
      <p>Thank you for the presentation on Wednesday, Eddie. I was glad
        for us to finally get back to this paper! I have a few comments
        now that I&#39;ve read through the latest revision.</p>
      <p>We briefly discussed what the behavior for <font face="monospace">base()</font> should be for transcoding
        iterators that work with an underlying range that only models <font face="monospace"><a>std::ranges::input_range</a></font>.
        The proposed wording has this note in 24.7.?.6
        ([range.transcoding.iterator]).</p>
      <blockquote>
        <p>[ <i>Note:</i> <font face="monospace">to_utf_<a>view::iterator</a></font> maintains
          invariants on <font face="monospace">base()</font> which
          differ depending on whether it’s an input iterator. In both
          cases, if <font face="monospace">*this</font> is at the end
          of the range being adapted, then <font face="monospace">base()
            == end()</font>. But if it’s not at the end of the adapted
          range, and it’s an input iterator, then the position of <font face="monospace">base()</font> is always at the end of the
          input subsequence corresponding to the current code point. On
          the other hand, for forward and bidirectional iterators, the
          position of <font face="monospace">base()</font> is always at
          the beginning of the input subsequence corresponding to the
          current code point. — <i>end note</i> ]</p>
      </blockquote>
      <p>When I was working on <font face="monospace"><a href="https://github.com/tahonermann/text_view/" target="_blank">text_view</a></font>
        many years ago, I addressed this concern for encoding and
        decoding iterator types with an underlying input iterator
        through specialization; partial specializations of those types
        substituted a <a href="https://github.com/tahonermann/text_view/blob/master/include/text_view_detail/caching_iterator.hpp" target="_blank">caching
          iterator</a> for the original underlying input iterator. The
        exact way that I went about this would not be appropriate for
        the P2728 design (the cache consists of a cooperatively managed
        look ahead buffer that is incrementally retired as iterators are
        advanced; we don&#39;t want that here). But the general idea of a
        small cache is applicable; when the underlying range (only)
        models input range, the (specialized) iterator can hold a (4
        byte) input buffer just as is done for the output code unit
        buffer. Unlike the output code unit buffer, there is no buffer
        index to maintain since <font face="monospace">base()</font>
        would always return an iterator to the beginning of that buffer.
        For consistency with forward (and better) iterators, it would be
        useful for the iterator returned by <font face="monospace">base()</font>
        to be comparable to the underlying (input) iterator for the
        purposes of comparison against <font face="monospace">end()</font>;
        but see an alternative approach below.</p>
      <p>The <font face="monospace">text_view</font> iterators also
        expose a <font face="monospace">base_range()</font> member that
        returns a range of the underlying code unit sequence
        corresponding to <font face="monospace">base()</font> + <i>code-unit-sequence-length</i> (which
        I think is equivalent to <font face="monospace">to_increment_</font>
        in P2728). Is there a reason not to expose such a member? As is,
        it appears that obtaining that range would require constructing
        a subrange using <font face="monospace">base()</font> from one
        iterator and <font face="monospace">base()</font> from another
        iterator that has advanced to the next character. Such a
        subrange would not be valid in the case of specialized input
        iterators that use an input buffer cache as I suggested above
        (the two iterators would not point in to the same range).</p>
      <p>I think it would be useful to differentiate access to the
        (complete) underlying range vs access to the input code unit
        sequence for the current character. Obviously, access to the
        complete underlying range isn&#39;t possible for input iterators,
        but access to the current input code unit sequence is (with the
        caching approach described above is). The iterators could expose
        this interface:</p>
      <blockquote>
        <p>// Forward+ iterators only; returns an iterator into the
          underlying range.<br>
          constexpr const iterator_t&lt;Base&gt;&amp; <b>base()</b>
          const &amp; noexcept <b>requires forward_range&lt;Base&gt;</b>
          { ... }<br>
          constexpr iterator_t&lt;Base&gt; <b>base()</b> &amp;&amp; <b>requires
            forward_range&lt;Base&gt;</b> { ... }<br>
          <br>
          // Input+ iterators; returns a subrange containing the input
          code units for the current character.<br>
          // References the input code unit sequence cache for input
          iterators.<br>
          // References the underlying range otherwise.<br>
          constexpr subrange&lt;...&gt; <b>base_code_units()</b> const
          noexcept { ... }</p>
      </blockquote>
      <p>Unlike <font face="monospace">base()</font>, <font face="monospace">base_code_units()</font> would not
        necessarily contain iterators for the underlying range (e.g., in
        the case of a caching input iterator). Note that <font face="monospace">base()</font> could be used to modify the
        underlying range (likely ill-advised) while the subrange
        returned by <font face="monospace">base_code_units()</font>
        could restrict such writes thereby ensuring consistent behavior
        for input and forward+ iterators.</p>
      <p>Tom.</p>
      <br>
      <fieldset></fieldset>
    </blockquote>
  </div>

</blockquote></div>

