> In my testing and measurements, by far the most salient performance-affecting characteristic of this API is the fact that it’s a view rather than an algorithm. The performance relative to the algorithm gets killed by the fact that, instead of being able to memcpy the transcoding results directly into an output buffer, we have a code-unit-by-code-unit iterator-based copy, which is inherent to the view design, and which degrades performance regardless of how fast the actual transcoding logic runs.

 

This is also my experience.

And from my experience the solution that works best is to separate the transcoding into 2 steps:

1. Validation and estimation

2. Transcoding (no validation required), assumes that you did 1 first

 

(see sample code here: https://github.com/tmiguelf/utilities/blob/master/CoreLib/include/CoreLib/string/core_string_encoding.hpp)

 

 

In order to solve transcoding you need to solve 4  problems:

1. Is the input valid?

2. How much buffer do I need?

3. Where do you put the transcoded value?

4. How to transcode?

 

1,2 and 3 need to happen either strictly before transcoding takes place or simultaneously with transcoding.

 

The absolute base line in terms of performance that you can possibly have is going to be bound to the performance of the transcoding itself.

Assuming ideal conditions where the given string to transcode is known to be valid and a given output buffer is guaranteed to fit the output, you can not get any faster than a function that does only this, let’s call this time T1.

 

The following factors are going to affect performance considerations:

* Transcoding is O(n)

* The process of transcoding can give you the resulting output size for free

* You can over-estimate the required output size in O(1) is input size is given

* You can create an algorithm that both validates the input and computes the exact amount of output bytes in less time that it takes to transcode

* If you know the required output size before transcoding, you can do a single allocation in O(1)

* Hardware acceleration works best with contiguous arrays.

 

Solutions:

1. My solution of splitting into 2 steps (estimation + validation; and then transcoding) means that it will perform at worst 2x T1 + the cost of a single allocation (if you happen to use one)

2. The alternative 2 step solution of over-estimating the required amount of output instead of doing it exactly O(1), and then reducing the buffer after transcoding is faster than 1, but has the tradeoff that you might end up allocating more buffer than you needed.

3. Allocating as you transcode is going to have the worst performance of all as the allocation itself maybe more costly than T1 for any reasonable sized string, and you will be doing it multiple times.

 

Doing it with a range/view means you are stuck with number 3, and dubious hardware support. Plus, you have the problem of being stuck with exceptions to handle transcoding errors, while a function can use both exceptions and return codes (with return codes being preferred).

 

 

 


From: SG16 <sg16-bounces@lists.isocpp.org> on behalf of Eddie Nolan via SG16 <sg16@lists.isocpp.org>
Sent: Wednesday, 24 June 2026 18:19:45
To: sg16@lists.isocpp.org <sg16@lists.isocpp.org>
Cc: Eddie Nolan <eddiejnolan@gmail.com>; Steve Downey <sg16@sdowney.dev>; dascandy@gmail.com <dascandy@gmail.com>
Subject: Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time

 

Assuming you mean p2728r12

The state of the proposal has gotten ahead of the mailing. The current revision is P2728R14: https://isocpp.org/files/papers/P2728R14.html

Since P2728R12, most of the changes have been adding examples, design discussions, and wording fixes. The actual changes to the design are (roughly excerpted from the changelog):

  • Give utf_transcoding_error and to_utf_view_error_kind unspecified underlying types, following the precedent of memory_orderlaunch, and assertion_kind
  • Remove base() from the transcoding iterator when the base range is an input range and not a forward range.
  • Add views::as_char and views::as_wchar_t; it’s also useful to get UTF-encoded strings from charN_t types back into char/wchar_t as well as having the reverse direction.
  • Loosen the wording to admit chunked (e.g. SIMD) implementations

The last bullet, to allow SIMD implementations, is going to be pulled in the next revision, due to this vote by SG9 in Brno:

Poll: We want to leave implementations the freedom to eagerly buffer more code units to allow a SIMD transcoding technique.

SF

F

N

A

SA

0

0

3

3

1

Attendance: 8 Author: N Strong consensus against.

Full SG9 polling from Brno can be found here: https://github.com/cplusplus/papers/issues/1422#issuecomment-4678313581

SG9’s requested changes (including reverting SIMD support and adding a new .base_code_units() member function on the iterator) have not yet been applied to a new revision of P2728R14 since I haven’t had time yet. They’ll be in the post-Brno mailing.

  • not having an unchecked mode (which of course should never be the default) leaves some performance on the table for simd/ branchless decoding - especially given how.. comprehensive error checking is in this proposal. We should discuss that

I can measure the performance impact of the error checking for the next revision but I suspect that it’s negligible.

In my testing and measurements, by far the most salient performance-affecting characteristic of this API is the fact that it’s a view rather than an algorithm. The performance relative to the algorithm gets killed by the fact that, instead of being able to memcpy the transcoding results directly into an output buffer, we have a code-unit-by-code-unit iterator-based copy, which is inherent to the view design, and which degrades performance regardless of how fast the actual transcoding logic runs.

This is true even if we enable SIMD. In my testing, with an ASCII corpus transcoded from UTF-16 to UTF-8, the non-SIMD version of my view ran at 1.86 GiB/s, the SIMD version of it ran at 4.20 GiB/s, but the optimized simdutf algorithm-based API was 60.4 GiB/s. You can see the full performance table and design discussion in the “SIMD Support” section of R14: https://isocpp.org/files/papers/P2728R14.html#simd-support

Based on this experience, SG9 pulled SIMD support.

We should make a similar call for this unchecked API idea. This view is supposed to be as fast of a view as it can be, but it’s never going to be the tool of choice for users who want the absolute highest possible transcoding performance. Providing a slightly faster unchecked view that’s still much slower than an algorithm isn’t worth standardizing something unsafe.

  • as_utfN allows narrowing conversion. I think we should put a lot more requirements on that, namely. Preventing as_utfN over charX_t seems like the minimum, but I’m also concerned about conversion where the underlying type has a different sizeof.

I assume you mean as_charN_t (there’s no as_utfN), but this seems like a reasonable point.

  • I do not understand the motivation for UTF Tags. It seems to try to make a deduction guide for to_utf_view work - even if we probably don’t need one (use the adaptor objects), and I question the ergonomics of trying to use these tags in practice.

Yes, they are there to make the deduction guide for to_utf_view work.

This discussion has already been covered by SG9. The poll result from Kona (2025) was:

Simplify the to_utfX_view classes by just having a single templated view class similar to scan_view (with potentially annoying constructor tags to make CTAD work) and focus on usability with the CPOs only.

SF

F

N

A

SA

0

7

1

1

0

Consensus in favor. Attendance: 9 Author: N

Every C++ standard view has a deduction guide, and the deduction guides are needed because we need to deduce views::all_t<R> to make the std::ranges::dangling machinery work. The tags are unergonomic, but it doesn’t matter because people will use the CPOs, and we also can’t just leave out the deduction guide, because that would be inconsistent and potentially break std::ranges::dangling.

  • reserve_hint() could be specified to return an implementation-defined value greater or equal to that of the underlying range

Sure.

 

 

On Wed, Jun 24, 2026 at 11:38AM Hubert Tong via SG16 <sg16@lists.isocpp.org> wrote:

On Wed, Jun 24, 2026 at 9:50AM Corentin Jabot via SG16 <sg16@lists.isocpp.org> wrote:

 

 

On Wed, Jun 24, 2026 at 3:17AM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:

 

  wg21:       TODO [[https://github.com/cplusplus/papers/issues/1572][LWG2959]] char_traits<char16_t>::eof is a valid UTF-16 code unit                                                                                    :sg16:

 

We should ask Louis/STL if they are happy to change int_type, or, alternatively explain LEWG the only solution is to change int_type, and see if they care.

 

Only for char16_t? Go beyond that, and I think we'll be back in the "committee broke std::string ABI" land.

 

-- HT 

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16
Link to this post: http://lists.isocpp.org/sg16/2026/06/4763.php