--On 2/28/23 1:43 PM, Jens Maurer via SG16 wrote:
On 28/02/2023 16.18, Niall Douglas via SG16 wrote:On 26/02/2023 01:48, Steve Downey via SG16 wrote:Much text processing is tied to IO and the performance is mostly secondary. If we could make accidentally incorrect harder to do that would be a win.My consumer hardware storage here does 14Gb/sec reads (two PCIe 4.0 SSDs in RAID0). Only a few years ago that was main memory speeds for a high end PC. I think you need to assume text processing, and especially Unicode parsing, is basically main memory speeds whether it is from i/o or not. I really wish SIMD had better support for UTF-8, only AVX-512 enables a decent fraction of main memory bandwidth (https://github.com/simdutf/simdutf).Thanks for the pointer. I was looking for a comparison like that. So, this means we do leave 5-10x performance on the table if we go for an interface that can deliver ICU-level performance (only). Sadness engulfs me.I'd like to see as much of that performance passed through by the standard library as possible, even if it makes the API non-STL-like.So, it seems we need an idea how to employ SIMD with a ranges-based interface, or we go for eager transcoding algorithms (possibly in addition to the ranges-based ones).The interfaces JeanHeyd is proposing for C in WG14 N3095 (Restartable Functions for Efficient Character Conversions) are intended to support implementation with SIMD. As a worst case fallback, that should be an option if/once it is adopted and deployed.
Tom.
Jens
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16