Date: Tue, 08 Jul 2025 10:55:37 -0700
On Tuesday, 8 July 2025 09:49:33 Pacific Daylight Time JJ Marr via Std-
Proposals wrote:
> CaseFolding.txt is a 87 KB text file, most of which is comments. It's about
> 1654 lines, so assuming two UTF-32 characters a line that's a little under
> 13 KiB.
They also appear in ranges with predictable changes, like adding or
subtracting 0x20. That means the codegen can be significantly better than one
13 kB table.
> Of course, this includes complex case mappings of one codepoint to multiple
> codepoints. If we drop those, we can make the table a bit smaller.
You can't drop them. Case-mapping is a string operation.
Proposals wrote:
> CaseFolding.txt is a 87 KB text file, most of which is comments. It's about
> 1654 lines, so assuming two UTF-32 characters a line that's a little under
> 13 KiB.
They also appear in ranges with predictable changes, like adding or
subtracting 0x20. That means the codegen can be significantly better than one
13 kB table.
> Of course, this includes complex case mappings of one codepoint to multiple
> codepoints. If we drop those, we can make the table a bit smaller.
You can't drop them. Case-mapping is a string operation.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Platform & System Engineering
Received on 2025-07-08 17:55:39