ISOCPP sg16 List: Re: Performance requirements for Unicode views/types/algorithms

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 28 Feb 2023 19:43:42 +0100

On 28/02/2023 16.18, Niall Douglas via SG16 wrote:
> On 26/02/2023 01:48, Steve Downey via SG16 wrote:
>
>> Much text processing is tied to IO and the performance is mostly
>> secondary. If we could make accidentally incorrect harder to do that
>> would be a win.
>
> My consumer hardware storage here does 14Gb/sec reads (two PCIe 4.0 SSDs
> in RAID0). Only a few years ago that was main memory speeds for a high
> end PC.
>
> I think you need to assume text processing, and especially Unicode
> parsing, is basically main memory speeds whether it is from i/o or not.
>
> I really wish SIMD had better support for UTF-8, only AVX-512 enables a
> decent fraction of main memory bandwidth
> (https://github.com/simdutf/simdutf).

Thanks for the pointer. I was looking for a comparison like that.

So, this means we do leave 5-10x performance on the table if we
go for an interface that can deliver ICU-level performance (only).

Sadness engulfs me.

> I'd like to see as much of that
> performance passed through by the standard library as possible, even if
> it makes the API non-STL-like.

So, it seems we need an idea how to employ SIMD with a ranges-based
interface, or we go for eager transcoding algorithms (possibly
in addition to the ranges-based ones).

Jens

Received on 2023-02-28 18:43:51