C++ Logo

std-proposals

Advanced search

Re: [std-proposals] constexpr tolower, toupper, isalpha

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Tue, 8 Jul 2025 14:20:23 -0400
On Tue, Jul 8, 2025 at 1:55 PM Thiago Macieira via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> On Tuesday, 8 July 2025 09:49:33 Pacific Daylight Time JJ Marr via Std-
> Proposals wrote:
> > CaseFolding.txt is a 87 KB text file, most of which is comments. It's about
> > 1654 lines, so assuming two UTF-32 characters a line that's a little under
> > 13 KiB.
>
> They also appear in ranges with predictable changes, like adding or
> subtracting 0x20. That means the codegen can be significantly better than one
> 13 kB table.
>
> > Of course, this includes complex case mappings of one codepoint to multiple
> > codepoints. If we drop those, we can make the table a bit smaller.
>
> You can't drop them. Case-mapping is a string operation.

But *simple* case folding is not. The term "simple case folding" is a
specific Unicode-defined subset of general case folding that is
locale-independent and only provides 1:1 mapping of codepoints.

Though not 1:1 mapping of any particular *encoding* of codepoints. A
UTF-8 string after simple case folding may not be the same encoded
length as one before.

Received on 2025-07-08 18:20:36