On July 6, 2025 9:13:28 a.m. EDT, David Brown via Std-Proposals <std-proposals@lists.isocpp.org> wrote:
On 06/07/2025 14:38, Frederick Virchanza Gotham via Std-Proposals wrote:
On Thu, Jul 3, 2025 at 10:01 AM Jonathan Wakely wrote:
Meaning that this would fail:
setlocale(LC_ALL, "de_DE.iso8859-1");
char c uuml = 0xFC; // lowercase u with umlaut
char c = std::toupper(uuml);
constexpr char cc = std::toupper(uuml);
assert( c == cc );
With regard to Unicode:
* The Standard mentions Unicode and allows for Unicode escape
sequences (e.g. "\u00f1")
* The first 128 characters in Unicode (up to 0x7F) are ASCII
* The remaining characters up to 0xFF are ISO-8859-1 (aka Latin-1)
Therefore it makes sense that the Standard would provide inline
constexpr functions like:
namespace std {
namespace unicode {
inline constexpr char32_t tolower(char32_t) { . . . }
}
}
You might think that - it seems reasonable at first sight, for people used to nothing but a limited subset of Latin character uses. The reality of capitalisation is very much more complicated, however. Issues include :
The Latin letter "i" capitalises to "I" in most languages - but in Turkish languages, it capitalises to "İ" while "I" is the capital form of "ı".
In German languages, "ß" is sometimes capitalised to "ẞ", sometimes to the digraph (two letters) "SS".
Many languages have letter combinations that are sometimes capitalised together, sometimes not. The Dutch name for the country "Iceland" is "IJsland", as the digraph "ij" is treated as a single letter for capitalisation purposes.
Converting from capitals to lower case is typically even more complicated - the lowercase of the Greek capital "Σ" can be "ς" or "σ" depending on its position in the word. The handling of the iota subscript in case changes is a typographer's nightmare. Even in some names in plain ASCII, there are complications - try converting "MacBride" to capitals and back again.
It would be very nice if the it were possible to make such universal capitalisation functions like you suggest, but it is not possible.