C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 09 Jul 2025 09:09:37 -0700
On Wednesday, 9 July 2025 08:26:37 Pacific Daylight Time Thiago Macieira via
Std-Proposals wrote:
> Normalising, case-folding, could come in handy if you're trying to store a
> dictionary for example (in the original meaning of "dictionary") so that
> look ups of equivalent inputs only need to normalise the user's runtime
> content. But this quickly degenerates into needing to create a hashing
> table, then a perfect hashing function, for which I think a proper
> generator is more indicated. Then you don't need #embed in the first place.

BTW, I should point out that there's something better than normalised text for
comparisons for equivalency in Unicode: collation. That's unicode/ucol.h in
ICU. You can get a sort key that can be used for sorting and equivalency
checking with a simple memcmp().

Embedding that into your binary could be useful... so long as the algorithm to
transform text into the sort key is stable. I don't think ICU guarantees it
will be.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Platform & System Engineering

Received on 2025-07-09 16:09:44