Date: Tue, 8 Jul 2025 19:14:40 +0200
The folding not only converts uppercase to lower-case, but also expands ligatures (e.g. ff to f f) and ß (sharp S) and similar characters (e.g. Kelvin sign to K, Ohm to Omega).
If we combine intervals (e.g. A-Z -> a-z) from consecutive to consecutive codepages for easier storage, we have
463 intervals with single code point -> single code point
86 intervals with single code point -> multi code point
34 alternative intervals single -> single, if single -> multi is not acceptable
(the other 52 would have to be kept in their original form, mostly ligatures)
So one needs two set of functions for case folding (upper->lower):
single->single
single->possibly multi
-----Ursprüngliche Nachricht-----
Von:JJ Marr via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Di 08.07.2025 18:50
Betreff:Re: [std-proposals] constexpr tolower, toupper, isalpha
An:std-proposals_at_[hidden];
CC:JJ Marr <jjmarr_at_[hidden]>;
The authoritative tables for case mapping are found at https://www.unicode.org/Public/16.0.0/ucd/, specifically CaseFolding.txt for case folding. It's all premade.
CaseFolding.txt is a 87 KB text file, most of which is comments. It's about 1654 lines, so assuming two UTF-32 characters a line that's a little under 13 KiB.
Of course, this includes complex case mappings of one codepoint to multiple codepoints. If we drop those, we can make the table a bit smaller.
Received on 2025-07-08 17:23:54