The folding not only converts uppercase to lower-case, but also expands ligatures (e.g. ff to f f) and ß (sharp S) and similar characters (e.g. Kelvin sign to K, Ohm to Omega).

 

If we combine intervals (e.g. A-Z -> a-z) from consecutive to consecutive codepages for easier storage, we have

 

463 intervals with single code point -> single code point

 

86 intervals with single code point -> multi code point

34 alternative intervals single -> single, if single -> multi is not acceptable

(the other 52 would have to be kept in their original form, mostly ligatures)

 

So one needs two set of functions for case folding (upper->lower):

single->single

single->possibly multi

 

 


 

-----Ursprüngliche Nachricht-----
Von: JJ Marr via Std-Proposals <std-proposals@lists.isocpp.org>
Gesendet: Di 08.07.2025 18:50
Betreff: Re: [std-proposals] constexpr tolower, toupper, isalpha
An: std-proposals@lists.isocpp.org;
CC: JJ Marr <jjmarr@gmail.com>;
The authoritative tables for case mapping are found at https://www.unicode.org/Public/16.0.0/ucd/, specifically CaseFolding.txt for case folding. It's all premade. 
 
CaseFolding.txt is a 87 KB text file, most of which is comments. It's about 1654 lines, so assuming two UTF-32 characters a line that's a little under 13 KiB.
 
Of course, this includes complex case mappings of one codepoint to multiple codepoints. If we drop those, we can make the table a bit smaller.