Date: Wed, 24 Jun 2026 12:19:45 -0400
Assuming you mean p2728r12
The state of the proposal has gotten ahead of the mailing. The current
revision is P2728R14: https://isocpp.org/files/papers/P2728R14.html
Since P2728R12, most of the changes have been adding examples, design
discussions, and wording fixes. The actual changes to the design are
(roughly excerpted from the changelog):
- Give utf_transcoding_error and to_utf_view_error_kind unspecified
underlying types, following the precedent of memory_order, launch, and
assertion_kind
- Remove base() from the transcoding iterator when the base range is an
input range and not a forward range.
- Add views::as_char and views::as_wchar_t; it’s also useful to get
UTF-encoded strings from charN_t types back into char/wchar_t as well as
having the reverse direction.
- Loosen the wording to admit chunked (e.g. SIMD) implementations
The last bullet, to allow SIMD implementations, is going to be pulled in
the next revision, due to this vote by SG9 in Brno:
*Poll: We want to leave implementations the freedom to eagerly buffer more
code units to allow a SIMD transcoding technique.*
SF F N A SA
0 0 3 3 1
Attendance: 8 Author: N Strong consensus against.
Full SG9 polling from Brno can be found here:
https://github.com/cplusplus/papers/issues/1422#issuecomment-4678313581
SG9’s requested changes (including reverting SIMD support and adding a new
.base_code_units() member function on the iterator) have not yet been
applied to a new revision of P2728R14 since I haven’t had time yet. They’ll
be in the post-Brno mailing.
- not having an unchecked mode (which of course should never be the
default) leaves some performance on the table for simd/ branchless decoding
- especially given how.. comprehensive error checking is in this proposal.
We should discuss that
I can measure the performance impact of the error checking for the next
revision but I suspect that it’s negligible.
In my testing and measurements, by far the most salient
performance-affecting characteristic of this API is the fact that it’s a
view rather than an algorithm. The performance relative to the algorithm
gets killed by the fact that, instead of being able to memcpy the
transcoding results directly into an output buffer, we have a
code-unit-by-code-unit iterator-based copy, which is inherent to the view
design, and which degrades performance regardless of how fast the actual
transcoding logic runs.
This is true even if we enable SIMD. In my testing, with an ASCII corpus
<https://github.com/lemire/unicode_lipsum/blob/main/lipsum/Latin-Lipsum.utf16.txt>
transcoded from UTF-16 to UTF-8, the non-SIMD version of my view ran at
1.86 GiB/s, the SIMD version of it ran at 4.20 GiB/s, but the optimized
simdutf algorithm-based API was 60.4 GiB/s. You can see the full
performance table and design discussion in the “SIMD Support” section of
R14: https://isocpp.org/files/papers/P2728R14.html#simd-support
Based on this experience, SG9 pulled SIMD support.
We should make a similar call for this unchecked API idea. This view is
supposed to be as fast of a view as it can be, but it’s never going to be
the tool of choice for users who want the absolute highest possible
transcoding performance. Providing a slightly faster unchecked view that’s
still much slower than an algorithm isn’t worth standardizing something
unsafe.
- as_utfN allows narrowing conversion. I think we should put a lot more
requirements on that, namely. Preventing as_utfN over charX_t seems like
the minimum, but I’m also concerned about conversion where the underlying
type has a different sizeof.
I assume you mean as_charN_t (there’s no as_utfN), but this seems like a
reasonable point.
- I do not understand the motivation for UTF Tags. It seems to try to
make a deduction guide for to_utf_view work - even if we probably don’t
need one (use the adaptor objects), and I question the ergonomics of trying
to use these tags in practice.
Yes, they are there to make the deduction guide for to_utf_view work.
This discussion has already been covered by SG9. The poll result from Kona
<https://github.com/cplusplus/papers/issues/1422#issuecomment-3501873463>
(2025)
was:
*Simplify the to_utfX_view classes by just having a single templated view
class similar to scan_view (with potentially annoying constructor tags to
make CTAD work) and focus on usability with the CPOs only.*
SF F N A SA
0 7 1 1 0
Consensus in favor. Attendance: 9 Author: N
Every C++ standard view has a deduction guide, and the deduction guides are
needed because we need to deduce views::all_t<R> to make the
std::ranges::dangling machinery work. The tags are unergonomic, but it
doesn’t matter because people will use the CPOs, and we also can’t just
leave out the deduction guide, because that would be inconsistent and
potentially break std::ranges::dangling.
- reserve_hint() could be specified to return an implementation-defined
value greater or equal to that of the underlying range
Sure.
On Wed, Jun 24, 2026 at 11:38 AM Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:
> On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>>
>>
>> On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>
>>> wg21: TODO [[
>>> https://github.com/cplusplus/papers/issues/1572][LWG2959]]
>>> char_traits<char16_t>::eof is a valid UTF-16 code unit
>>> :sg16:
>>>
>>
>> We should ask Louis/STL if they are happy to change int_type, or,
>> alternatively explain LEWG the only solution is to change int_type, and
>> see if they care.
>>
>
> Only for char16_t? Go beyond that, and I think we'll be back in the
> "committee broke std::string ABI" land.
>
> -- HT
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2026/06/4763.php
>
The state of the proposal has gotten ahead of the mailing. The current
revision is P2728R14: https://isocpp.org/files/papers/P2728R14.html
Since P2728R12, most of the changes have been adding examples, design
discussions, and wording fixes. The actual changes to the design are
(roughly excerpted from the changelog):
- Give utf_transcoding_error and to_utf_view_error_kind unspecified
underlying types, following the precedent of memory_order, launch, and
assertion_kind
- Remove base() from the transcoding iterator when the base range is an
input range and not a forward range.
- Add views::as_char and views::as_wchar_t; it’s also useful to get
UTF-encoded strings from charN_t types back into char/wchar_t as well as
having the reverse direction.
- Loosen the wording to admit chunked (e.g. SIMD) implementations
The last bullet, to allow SIMD implementations, is going to be pulled in
the next revision, due to this vote by SG9 in Brno:
*Poll: We want to leave implementations the freedom to eagerly buffer more
code units to allow a SIMD transcoding technique.*
SF F N A SA
0 0 3 3 1
Attendance: 8 Author: N Strong consensus against.
Full SG9 polling from Brno can be found here:
https://github.com/cplusplus/papers/issues/1422#issuecomment-4678313581
SG9’s requested changes (including reverting SIMD support and adding a new
.base_code_units() member function on the iterator) have not yet been
applied to a new revision of P2728R14 since I haven’t had time yet. They’ll
be in the post-Brno mailing.
- not having an unchecked mode (which of course should never be the
default) leaves some performance on the table for simd/ branchless decoding
- especially given how.. comprehensive error checking is in this proposal.
We should discuss that
I can measure the performance impact of the error checking for the next
revision but I suspect that it’s negligible.
In my testing and measurements, by far the most salient
performance-affecting characteristic of this API is the fact that it’s a
view rather than an algorithm. The performance relative to the algorithm
gets killed by the fact that, instead of being able to memcpy the
transcoding results directly into an output buffer, we have a
code-unit-by-code-unit iterator-based copy, which is inherent to the view
design, and which degrades performance regardless of how fast the actual
transcoding logic runs.
This is true even if we enable SIMD. In my testing, with an ASCII corpus
<https://github.com/lemire/unicode_lipsum/blob/main/lipsum/Latin-Lipsum.utf16.txt>
transcoded from UTF-16 to UTF-8, the non-SIMD version of my view ran at
1.86 GiB/s, the SIMD version of it ran at 4.20 GiB/s, but the optimized
simdutf algorithm-based API was 60.4 GiB/s. You can see the full
performance table and design discussion in the “SIMD Support” section of
R14: https://isocpp.org/files/papers/P2728R14.html#simd-support
Based on this experience, SG9 pulled SIMD support.
We should make a similar call for this unchecked API idea. This view is
supposed to be as fast of a view as it can be, but it’s never going to be
the tool of choice for users who want the absolute highest possible
transcoding performance. Providing a slightly faster unchecked view that’s
still much slower than an algorithm isn’t worth standardizing something
unsafe.
- as_utfN allows narrowing conversion. I think we should put a lot more
requirements on that, namely. Preventing as_utfN over charX_t seems like
the minimum, but I’m also concerned about conversion where the underlying
type has a different sizeof.
I assume you mean as_charN_t (there’s no as_utfN), but this seems like a
reasonable point.
- I do not understand the motivation for UTF Tags. It seems to try to
make a deduction guide for to_utf_view work - even if we probably don’t
need one (use the adaptor objects), and I question the ergonomics of trying
to use these tags in practice.
Yes, they are there to make the deduction guide for to_utf_view work.
This discussion has already been covered by SG9. The poll result from Kona
<https://github.com/cplusplus/papers/issues/1422#issuecomment-3501873463>
(2025)
was:
*Simplify the to_utfX_view classes by just having a single templated view
class similar to scan_view (with potentially annoying constructor tags to
make CTAD work) and focus on usability with the CPOs only.*
SF F N A SA
0 7 1 1 0
Consensus in favor. Attendance: 9 Author: N
Every C++ standard view has a deduction guide, and the deduction guides are
needed because we need to deduce views::all_t<R> to make the
std::ranges::dangling machinery work. The tags are unergonomic, but it
doesn’t matter because people will use the CPOs, and we also can’t just
leave out the deduction guide, because that would be inconsistent and
potentially break std::ranges::dangling.
- reserve_hint() could be specified to return an implementation-defined
value greater or equal to that of the underlying range
Sure.
On Wed, Jun 24, 2026 at 11:38 AM Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:
> On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>>
>>
>> On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>
>>> wg21: TODO [[
>>> https://github.com/cplusplus/papers/issues/1572][LWG2959]]
>>> char_traits<char16_t>::eof is a valid UTF-16 code unit
>>> :sg16:
>>>
>>
>> We should ask Louis/STL if they are happy to change int_type, or,
>> alternatively explain LEWG the only solution is to change int_type, and
>> see if they care.
>>
>
> Only for char16_t? Go beyond that, and I think we'll be back in the
> "committee broke std::string ABI" land.
>
> -- HT
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2026/06/4763.php
>
Received on 2026-06-24 16:19:59
