C++ Logo

sg16

Advanced search

Re: Performance requirements for Unicode views/types/algorithms

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 01 Mar 2023 12:43:35 -0800
On Tuesday, 28 February 2023 07:18:07 PST Niall Douglas via SG16 wrote:
> I really wish SIMD had better support for UTF-8, only AVX-512 enables a
> decent fraction of main memory bandwidth
> (https://github.com/simdutf/simdutf)

I did talk to some CPU architects about this a few years ago and our
conclusion is that it wouldn't be worth it. The conversion was never a hot
path in any of the content we looked at, and the instructions this would
create would end up one of those complex beasts few people ever use because
they're not fast for anything except the narrow use-case they were designed
for.

You may be one of the few who would, but you're also one of the few who
probably remember the STTNI (STring and Text New Instructions) from SSE 4.2 -
the PCMPxSTRx instructions[1]. You'll also note that those have never been
extended to 256- and 512-bit. 10 years ago, I rewrote the UTF16-to-Latin1
codec in Qt with PCMPESTRM to detect out-of-range characters[2]. About 5 years
ago I yanked it out and replaced with a much faster PMINUW[3].

[1] https://uops.info/html-instr/PCMPESTRM_XMM_XMM_I8.html
[2] https://github.com/qt/qtbase/commit/
83ba0d56f878409d3f758549c72e5d099cc71a07
[3] https://github.com/qt/qtbase/commit/
a9074779cf1425b76c010b891403a0521e2cb4e4
-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DCAI Cloud Engineering

Received on 2023-03-01 20:43:37