C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Wed, 9 Jul 2025 16:48:54 -0400
On Wed, Jul 9, 2025 at 3:13 PM Oliver Hunt via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> I re-read this reply and it’s unreasonably mean/antagonistic so I apologize for that.
>
> The core issue here is that normalization is much more complex than just capitalization - it’s easy to think of capitalization as being the quintessential normalization that everything needs if you’re used to languages where that is essentially the only “no op” normalization, and this is an issue I’ve repeatedly encountered over the years so I think I reflexively responded unreasonably.

I think it is important to point this out: "normalization" has a
*very* specific definition in Unicode (technically 4 definitions), and
case operations have basically nothing to do with any of them. I just
want to make sure we're all on the same page.

> I think a more reasonable response would have been:
>
> Real world normalization is non-trivial, and it’s easy for people used to a single form of normalization to forget that other normalization schemes exist that are equivalently important for other languages. Given languages like C and C++ have historically made mistakes in assuming Eurocentric approaches to language (or more realistically anglo-centric) it’s important for us to avoid repeating the mistakes of the past.
>
> The reason ICU has such a complex API is because such an API is necessary for dealing with both real world language and also unicode (whether these are strictly the same is up for debate :D). *If* we were to integrate unicode directly into C++ it would realistically need to be incorporating C++ appropriate interfaces for the core unicode operations, not incorporating the actual tables or specific unicode releases - this would permit the standard library to forward to their own implementation of the unicode standard, the OS implementation, etc (which would allow a binary to be sure its interpretation of unicode matches the OS).

Does ICU actually have some way to back-door its own Unicode tables,
to replace its tables with the OS's tables or user-defined ones?

Because if it doesn't, I don't know why the C++ standard Unicode
support would need to.

Also, it's not clear what this has to do with implementing Unicode
simple case folding. It's a thing in the Unicode standard. It's a
thing that ICU explicitly supports
(https://github.com/unicode-org/icu/blob/main/docs/userguide/transforms/casemappings.md).
If the standard library is going to support Unicode, why wouldn't we
want it to support simple case folding?

It seems to me that your overall point is that simple case folding is
a bad idea and therefore the C++ standard library shouldn't support
it. But it's still part of Unicode, so it feels incomplete to provide
support for Unicode in the standard library without it.

Received on 2025-07-09 20:49:06