ISOCPP sg16 List: Re: Performance requirements for Unicode views/types/algorithms

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Wed, 01 Mar 2023 15:14:04 +0000

On 28/02/2023 18:43, Jens Maurer wrote:

>> I really wish SIMD had better support for UTF-8, only AVX-512 enables a
>> decent fraction of main memory bandwidth
>> (https://github.com/simdutf/simdutf).
>
> Thanks for the pointer. I was looking for a comparison like that.
>
> So, this means we do leave 5-10x performance on the table if we
> go for an interface that can deliver ICU-level performance (only).
>
> Sadness engulfs me.
>
>> I'd like to see as much of that
>> performance passed through by the standard library as possible, even if
>> it makes the API non-STL-like.
>
> So, it seems we need an idea how to employ SIMD with a ranges-based
> interface, or we go for eager transcoding algorithms (possibly
> in addition to the ranges-based ones).

I like to hold up <charconv> as the right design choice we ought to use
going forwards:

- Yes atoi() can parse numbers.

- Yes strtol() can parse numbers.

- Yes sscanf() can parse numbers.

- Yes iostreams can parse numbers.

One would have thought that number parsing were a done deal with such
menu before us. However, none of the above were particularly fast, and
some were downright slow. This is mainly due to historical reasons,
especially around the required use and modification of global state.

Thus <charconv> was born, and it can be orders of magnitude faster than
any of the functions above because it was designed with the benefits of
hindsight and an understanding of how recent CPUs work.

I'd ask the same design thinking for UTF-8: a low level maximum
performance API (ideally based on existing standard practice) and then
there is the escape hatch out of the slower higher level APIs for those
that need such a thing.

There is always the argument that "why do we need such high performance
given XXX?"

I'd tend to answer: nobody sane chooses C++ to solve problems unless (i)
they are forced to by a legacy codebase (ii) they need high performance.

This is why I find WG21 working on high level abstractions somewhat
misses the point for much of today's C++ users. They generally want more
performance before they want new high level abstractions. I appreciate
that is not a popular thing to say around WG21 folk, still ...

Niall

Received on 2023-03-01 15:14:05