ISOCPP sg16 List: Re: Performance requirements for Unicode views/types/algorithms

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 2 Mar 2023 16:56:14 -0500

On Thu, Mar 2, 2023 at 4:37 PM Thiago Macieira <thiago_at_[hidden]> wrote:

> On Thursday, 2 March 2023 13:10:35 PST JeanHeyd Meneide wrote:
> > On Thu, Mar 2, 2023 at 4:08 PM Thiago Macieira via SG16
> > <sg16_at_[hidden]> wrote:
> > > On Thursday, 2 March 2023 07:28:02 PST Steve Downey via SG16 wrote:
> > > > That ICU apparently has poor performance in this area indicates a
> little
> > > > that many users are OK with it. But it does also sound like there are
> > > > use
> > > > cases for eager algorithms that could provide better performance.
> > > > Probably
> > > > something like the ones that JeanHeyd was proposing for C?
> > >
> > > I don't know the actual internals of those functions, but the wrapping
> API
> > > with the needs of error code returning and the abstraction created by
> it
> > > do
> > > make it slower than a pure, raw implementation would be. It's a price
> one
> > > pays for having a consistent API for UTF-8 as well as Shift JIS or any
> > > other weird codec that ICU may have available and you needed to decode
> > > content.
> >
> > This is not true.
> >
> > If the API is designed with the concepts Alexander Stepanov had in mind
> when
> > he first designed the C++ STL, there is no material performance
> difference
> > between a raw C API and a fully-realized, deeply generic STL-style API. I
> > have
>
> To be clear: I was referring to the ICU API.
>
> This is not STL-style:
>
> ucnv_toUnicode(icu_conv, &target, targetLimit, &source,
> sourceLimit,
> nullptr, flush, &err);
>
> In none of your graphs is ICU in the top half of performance. How much
> worse
> than the ideal it is varies depending on the benchmark in question. I also
> made no statements about it being meaningfully or measurably worse; I
> simply
> said that it has a wrapper API that abstracts and therefore the impact
> can't
> be zero.
>

I understand that: my objection was the characterization that a
wrapper API that has to handle both Shift-JIS and UTF-8 has to compromise
in any way on its performance at all. In particular, the ztd.text (tracking
WG21 P1629, which still needs an update) and cuneicode (tracking WG14
N3095) both handle arbitrary encodings without compromising UTF performance
qualities or capabilities. At no point does anyone need to give up
performance in pursuit of a general API that handles these things; that ICU
did is specifically ICU's fault and its engineering/compatibility choices,
not fundamental limitations of possible text APIs worth standardizing.

Sincerely,
JeanHeyd

Received on 2023-03-02 21:56:29