C++ Logo

sg16

Advanced search

Re: Performance requirements for Unicode views/types/algorithms

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 2 Mar 2023 16:10:35 -0500
On Thu, Mar 2, 2023 at 4:08 PM Thiago Macieira via SG16 <
sg16_at_[hidden]> wrote:

> On Thursday, 2 March 2023 07:28:02 PST Steve Downey via SG16 wrote:
> > That ICU apparently has poor performance in this area indicates a little
> > that many users are OK with it. But it does also sound like there are use
> > cases for eager algorithms that could provide better performance.
> Probably
> > something like the ones that JeanHeyd was proposing for C?
>
> I don't know the actual internals of those functions, but the wrapping API
> with the needs of error code returning and the abstraction created by it
> do
> make it slower than a pure, raw implementation would be. It's a price one
> pays
> for having a consistent API for UTF-8 as well as Shift JIS or any other
> weird
> codec that ICU may have available and you needed to decode content.
>


This is not true.

If the API is designed with the concepts Alexander Stepanov had in mind
when he first designed the C++ STL, there is no material performance
difference between a raw C API and a fully-realized, deeply generic
STL-style API. I have thoroughly benchmarked exactly these use cases, and
documented it here:

https://ztdtext.readthedocs.io/en/latest/benchmarks.html
https://ztdtext.readthedocs.io/en/latest/benchmarks/transcoding%20-%20UTF.html

Sincerely,
JeanHeyd

Received on 2023-03-02 21:10:49