C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: David Brown <david.brown_at_[hidden]>
Date: Tue, 8 Jul 2025 09:23:53 +0200
On 08/07/2025 00:45, Thiago Macieira via Std-Proposals wrote:
> On Sunday, 6 July 2025 15:41:29 Pacific Daylight Time JJ Marr via Std-Proposals
> wrote:
>> If you want your characters to not have
>> capitalization, you can call `std::toCasefold` before doing comparisons,
>> which implements simple case folding from the Unicode standard.
>
> No, you call a case-ignoring comparison function, instead of case-folding a
> string before comparing.
>

Yes and no.

There are plenty of situations where you would want to do the
case-folding first. In particular, if you want to make multiple
comparisons, it would be more efficient to case-fold and then compare.
You may also want to case-fold, then hash, such as to make a
case-insensitive map.

As I have said in other posts, I think it is clear that you cannot have
a locale-independent "toLower" or "toUpper" other than for very
restricted cases (basically, plain ASCII). But I don't think it should
be infeasible to implement the standard Unicode case-folding functions
as constexpr functions in the C++ standard. The key argument (and I
think perhaps an overriding argument) against it, as I see it, would be
the changeability of the functions - as more characters are added to
Unicode, the Unicode folk will update their github with new tables and
functions, and C++ programmers should not have to wait for a new C++
standard version to be implemented before updating their code.

Certainly it should be clear that such a code-folding function is a
Unicode code-folding function, /not/ the correct way to compare strings
in a given language or locale. Full string comparison is not only
locale dependent, but also context dependent - sometimes names will be
compared in a slightly different manner from other words, for example.
As has been said by others, for big pieces of software that take this
seriously, you need dedicated and specialist developers - not a couple
of functions in the standard library.

On the other hand, there can be case where you, as the developer,
control the use of language - the Unicode simple case-folding system
could be good enough and provide a consistent, efficient and constexpr
solution that is independent of any locale settings on the host system.

Received on 2025-07-08 07:23:58