C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Tue, 8 Jul 2025 19:14:40 +0200
The folding not only converts uppercase to lower-case, but also expands ligatures (e.g. ff to f f) and ß (sharp S) and similar characters (e.g. Kelvin sign to K, Ohm to Omega).   If we combine intervals (e.g. A-Z -> a-z) from consecutive to consecutive codepages for easier storage, we have   463 intervals with single code point -> single code point   86 intervals with single code point -> multi code point 34 alternative intervals single -> single, if single -> multi is not acceptable (the other 52 would have to be kept in their original form, mostly ligatures)   So one needs two set of functions for case folding (upper->lower): single->single single->possibly multi       -----Ursprüngliche Nachricht----- Von:JJ Marr via Std-Proposals <std-proposals_at_[hidden]> Gesendet:Di 08.07.2025 18:50 Betreff:Re: [std-proposals] constexpr tolower, toupper, isalpha An:std-proposals_at_[hidden]; CC:JJ Marr <jjmarr_at_[hidden]>; The authoritative tables for case mapping are found at https://www.unicode.org/Public/16.0.0/ucd/, specifically CaseFolding.txt for case folding. It's all premade.   CaseFolding.txt is a 87 KB text file, most of which is comments. It's about 1654 lines, so assuming two UTF-32 characters a line that's a little under 13 KiB.  Of course, this includes complex case mappings of one codepoint to multiple codepoints. If we drop those, we can make the table a bit smaller.   

Received on 2025-07-08 17:23:54