Date: Tue, 8 Jul 2025 23:13:34 +0000
Yeah, but... Why do you want it?
Most cases of localization in applications can be achieved with prepared phrase books, no case-conversion required.
Tempting as it may to use it to deal with "case-insensitive" systems (either it being to deal with domain names, databases, or files systems), the reality is they don't implement the full Unicode spec, it's not truly "case-insensitive" it's more of a "whatever code ranges that system considers to be the same" (if not sometimes can only be tested at runtime like NTFS) , and that solution is not going to be the same depending on the system. So, using the standard to provide a "to_lower" to do that job instead of a "is_same_path" is misguided.
The real application I see for this would require:
1.
dealing with words that aren't prepared (and are thus not likely to be known at compile, thus making constexpr useless at best).
2.
having the need to do the conversion.
You are likely dealing with user input and transforming text, your application is probably a word processing like libre office. Or you are probably dealing with finicky government databases that needs to do case conversion of names for some reason. Or possibly you want to implement a search in a database.
All of those would generally have their own solutions and deal with the problem specifically for that application, and there will probably be a need for a localization expert in the team.
It feels like there should be a large use case for this, but when you look into it, it's really hard to find examples where you actually need it and was the right tool
________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>
Sent: Tuesday, July 8, 2025 7:20:41 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Jason McKesson <jmckesson_at_[hidden]m>
Subject: Re: [std-proposals] constexpr tolower, toupper, isalpha
On Tue, Jul 8, 2025 at 1:55 PM Thiago Macieira via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> On Tuesday, 8 July 2025 09:49:33 Pacific Daylight Time JJ Marr via Std-
> Proposals wrote:
> > CaseFolding.txt is a 87 KB text file, most of which is comments. It's about
> > 1654 lines, so assuming two UTF-32 characters a line that's a little under
> > 13 KiB.
>
> They also appear in ranges with predictable changes, like adding or
> subtracting 0x20. That means the codegen can be significantly better than one
> 13 kB table.
>
> > Of course, this includes complex case mappings of one codepoint to multiple
> > codepoints. If we drop those, we can make the table a bit smaller.
>
> You can't drop them. Case-mapping is a string operation.
But *simple* case folding is not. The term "simple case folding" is a
specific Unicode-defined subset of general case folding that is
locale-independent and only provides 1:1 mapping of codepoints.
Though not 1:1 mapping of any particular *encoding* of codepoints. A
UTF-8 string after simple case folding may not be the same encoded
length as one before.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Most cases of localization in applications can be achieved with prepared phrase books, no case-conversion required.
Tempting as it may to use it to deal with "case-insensitive" systems (either it being to deal with domain names, databases, or files systems), the reality is they don't implement the full Unicode spec, it's not truly "case-insensitive" it's more of a "whatever code ranges that system considers to be the same" (if not sometimes can only be tested at runtime like NTFS) , and that solution is not going to be the same depending on the system. So, using the standard to provide a "to_lower" to do that job instead of a "is_same_path" is misguided.
The real application I see for this would require:
1.
dealing with words that aren't prepared (and are thus not likely to be known at compile, thus making constexpr useless at best).
2.
having the need to do the conversion.
You are likely dealing with user input and transforming text, your application is probably a word processing like libre office. Or you are probably dealing with finicky government databases that needs to do case conversion of names for some reason. Or possibly you want to implement a search in a database.
All of those would generally have their own solutions and deal with the problem specifically for that application, and there will probably be a need for a localization expert in the team.
It feels like there should be a large use case for this, but when you look into it, it's really hard to find examples where you actually need it and was the right tool
________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>
Sent: Tuesday, July 8, 2025 7:20:41 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Jason McKesson <jmckesson_at_[hidden]m>
Subject: Re: [std-proposals] constexpr tolower, toupper, isalpha
On Tue, Jul 8, 2025 at 1:55 PM Thiago Macieira via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> On Tuesday, 8 July 2025 09:49:33 Pacific Daylight Time JJ Marr via Std-
> Proposals wrote:
> > CaseFolding.txt is a 87 KB text file, most of which is comments. It's about
> > 1654 lines, so assuming two UTF-32 characters a line that's a little under
> > 13 KiB.
>
> They also appear in ranges with predictable changes, like adding or
> subtracting 0x20. That means the codegen can be significantly better than one
> 13 kB table.
>
> > Of course, this includes complex case mappings of one codepoint to multiple
> > codepoints. If we drop those, we can make the table a bit smaller.
>
> You can't drop them. Case-mapping is a string operation.
But *simple* case folding is not. The term "simple case folding" is a
specific Unicode-defined subset of general case folding that is
locale-independent and only provides 1:1 mapping of codepoints.
Though not 1:1 mapping of any particular *encoding* of codepoints. A
UTF-8 string after simple case folding may not be the same encoded
length as one before.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2025-07-08 23:13:42