I recently recorded a BibTeX entry in http://www.math.utah.edu/pub/tex/bib/unicode.html#Keiser:2021:VUL for a new paper that has just been published in a Wiley journal: Validating UTF-8 in less than one instruction per byte Software --- Practice and Experience 51(5) 950--964 May 2021 https://doi.org/10.1002/spe.2920 A preprint is available at https://arxiv.org/abs/2010.03090 The authors exploit vector instructions in recent AMD/Intel x86_64 and ARM v7 NEON processors to achieve high throughput that in some cases exceeds that of the Standard C library function memcpy() for mostly ASCII sequences, and for random UTF-8 sequences, runs at 1/4 to 1/2 the speed of memcpy(). C++ code implementing their work is freely available at https://github.com/lemire/validateutf8-experiments and the paper's references contain links to earlier papers on fast validation and transformation of Unicode character sequences. ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - -------------------------------------------------------------------------------