On Thu, Mar 2, 2023 at 4:37 PM Thiago Macieira <thiago@macieira.org> wrote:
On Thursday, 2 March 2023 13:10:35 PST JeanHeyd Meneide wrote:
> On Thu, Mar 2, 2023 at 4:08 PM Thiago Macieira via SG16
> <sg16@lists.isocpp.org> wrote:
> > On Thursday, 2 March 2023 07:28:02 PST Steve Downey via SG16 wrote:
> > > That ICU apparently has poor performance in this area indicates a little
> > > that many users are OK with it. But it does also sound like there are
> > > use
> > > cases for eager algorithms that could provide better performance.
> > > Probably
> > > something like the ones that JeanHeyd was proposing for C?
> >
> > I don't know the actual internals of those functions, but the wrapping API
> > with the needs of error code returning and the abstraction created by it
> > do
> > make it slower than a pure, raw implementation would be. It's a price one
> > pays for having a consistent API for UTF-8 as well as Shift JIS or any
> > other weird codec that ICU may have available and you needed to decode
> > content.
> This is not true.
> If the API is designed with the concepts Alexander Stepanov had in mind when
> he first designed the C++ STL, there is no material performance difference
> between a raw C API and a fully-realized, deeply generic STL-style API. I
> have

To be clear: I was referring to the ICU API.

This is not STL-style:

        ucnv_toUnicode(icu_conv, &target, targetLimit, &source, sourceLimit,
nullptr, flush, &err);

In none of your graphs is ICU in the top half of performance. How much worse
than the ideal it is varies depending on the benchmark in question. I also
made no statements about it being meaningfully or measurably worse; I simply
said that it has a wrapper API that abstracts and therefore the impact can't 
be zero.

     I understand that: my objection was the characterization that a wrapper API that has to handle both Shift-JIS and UTF-8 has to compromise in any way on its performance at all. In particular, the ztd.text (tracking WG21 P1629, which still needs an update) and cuneicode (tracking WG14 N3095) both handle arbitrary encodings without compromising UTF performance qualities or capabilities. At no point does anyone need to give up performance in pursuit of a general API that handles these things; that ICU did is specifically ICU's fault and its engineering/compatibility choices, not fundamental limitations of possible text APIs worth standardizing.