Copying SG16 (the ISO WG21 C++ standard study group on Unicode and text processing).


On 5/18/21 7:20 PM, Nelson H. F. Beebe via Unicode wrote:
I recently recorded a BibTeX entry in

for a new paper that has just been published in a Wiley journal:

	Validating UTF-8 in less than one instruction per byte
	Software --- Practice and Experience 51(5) 950--964 May 2021

A preprint is available at

The authors exploit vector instructions in recent AMD/Intel x86_64 and
ARM v7 NEON processors to achieve high throughput that in some cases
exceeds that of the Standard C library function memcpy() for mostly
ASCII sequences, and for random UTF-8 sequences, runs at 1/4 to 1/2
the speed of memcpy().

C++ code implementing their work is freely available at

and the paper's references contain links to earlier papers on fast
validation and transformation of Unicode character sequences.

- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail:  -
- 155 S 1400 E RM 233              -
- Salt Lake City, UT 84112-0090, USA    URL: -