The folding not only converts uppercase to lower-case, but also expands ligatures (e.g. ff to f f) and ß (sharp S) and similar characters (e.g. Kelvin sign to K, Ohm to Omega).
If we combine intervals (e.g. A-Z -> a-z) from consecutive to consecutive codepages for easier storage, we have
463 intervals with single code point -> single code point
86 intervals with single code point -> multi code point
34 alternative intervals single -> single, if single -> multi is not acceptable
(the other 52 would have to be kept in their original form, mostly ligatures)
So one needs two set of functions for case folding (upper->lower):
single->single
single->possibly multi
-----Ursprüngliche Nachricht-----
Von: JJ Marr via Std-Proposals <std-proposals@lists.isocpp.org>
Gesendet: Di 08.07.2025 18:50
Betreff: Re: [std-proposals] constexpr tolower, toupper, isalpha
An: std-proposals@lists.isocpp.org;
CC: JJ Marr <jjmarr@gmail.com>;
The authoritative tables for case mapping are found at https://www.unicode.org/Public/16.0.0/ucd/, specifically CaseFolding.txt for case folding. It's all premade.CaseFolding.txt is a 87 KB text file, most of which is comments. It's about 1654 lines, so assuming two UTF-32 characters a line that's a little under 13 KiB.Of course, this includes complex case mappings of one codepoint to multiple codepoints. If we drop those, we can make the table a bit smaller.