C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: Jan Schultke <janschultke_at_[hidden]>
Date: Tue, 8 Jul 2025 09:51:38 +0200
> As I have said in other posts, I think it is clear that you cannot have
> a locale-independent "toLower" or "toUpper" other than for very
> restricted cases (basically, plain ASCII).

This seems to have gotten lost in the conversion; I have a proposal
that proposes exactly that:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3688r0.html

> But I don't think it should
> be infeasible to implement the standard Unicode case-folding functions
> as constexpr functions in the C++ standard. The key argument (and I
> think perhaps an overriding argument) against it, as I see it, would be
> the changeability of the functions - as more characters are added to
> Unicode, the Unicode folk will update their github with new tables and
> functions, and C++ programmers should not have to wait for a new C++
> standard version to be implemented before updating their code.

This isn't really a problem right now because the C++ standard has a
floating dependency on the Unicode standard, so implementers are free
to pull in the latest changes regardless of the C++ standard.

> Certainly it should be clear that such a code-folding function is a
> Unicode code-folding function, /not/ the correct way to compare strings
> in a given language or locale. Full string comparison is not only
> locale dependent, but also context dependent - sometimes names will be
> compared in a slightly different manner from other words, for example.
> As has been said by others, for big pieces of software that take this
> seriously, you need dedicated and specialist developers - not a couple
> of functions in the standard library.

Agreed, it's honestly too complex and domain-specific to be in the C++
standard. This is more within the scope of the ICU, for those who need
fully-fledged Unicode case transformations/comparisons.

> On the other hand, there can be case where you, as the developer,
> control the use of language - the Unicode simple case-folding system
> could be good enough and provide a consistent, efficient and constexpr
> solution that is independent of any locale settings on the host system.

The people who don't care enough about the details can probably get
away with ASCII-only support, and the people who need full Unicode
support may not be okay with "half-baked" solutions.

Someone would need to do the research and figure out what is done in
practice. Any software already using the ICU would see very little
benefit from a few standard functions that have 1% of its
functionality. To be fair, people use CSS case transformations on
websites all the time, and those also just use the locale-independent
case mappings.

Received on 2025-07-08 07:51:52