Date: Fri, 23 May 2025 15:34:05 +0100
On Fri, 23 May 2025 at 09:24, Jan Schultke <janschultke_at_[hidden]> wrote:
>>
>> It would add about 2MB to the on-disk and in-memory footprint of every C++ application, for something most programs will never use.
>>
>> The data file is publicly available, if your application needs to translate U+NNNN to names then it can figure out how to do that as a post-processing step. I don't think everybody needs this functionality.
>
>
> Firstly, I think 2MB is extremely pessimistic. There are existing implementations that take a small fraction of that like https://godbolt.org/z/4arrY6hjv The existing implementations are still brute-forcish in that they don't exploit much Unicode-specific knowledge. For example, code points within the same block often have almost identical names, and so it would seem much better to divide and conquer than to treat all names as one big string to compress. I suspect you can get it sub 100 KiB or 50 KiB, but the burden of proof is obviously on me.
>
> The argument of "not everybody needs this functionality" is also not particularly strong. Not everybody needs mathematical special functions, <simd>, executors, filesystems, <linalg>, multi-threading, parallel algorithms, std::hive and various other rarely-used containers, and many more features. Admittedly, not all of these contribute to binary or memory size,
Exactly. If you don't use them, they don't add anything to your program.
> I'm just pointing out that features are useful and appropriate for standardization even if a fraction of developers use them.
I suspect the fraction who use multi-threading is several orders of
magnitude larger than the fraction that would want to print out
Unicode names using std::format.
I think this would make more sense as a transformation that could be
used to adapt a utf_view, rather than something built in to
std::format. And it wouldn't need to be in the standard then.
>>
>> It would add about 2MB to the on-disk and in-memory footprint of every C++ application, for something most programs will never use.
>>
>> The data file is publicly available, if your application needs to translate U+NNNN to names then it can figure out how to do that as a post-processing step. I don't think everybody needs this functionality.
>
>
> Firstly, I think 2MB is extremely pessimistic. There are existing implementations that take a small fraction of that like https://godbolt.org/z/4arrY6hjv The existing implementations are still brute-forcish in that they don't exploit much Unicode-specific knowledge. For example, code points within the same block often have almost identical names, and so it would seem much better to divide and conquer than to treat all names as one big string to compress. I suspect you can get it sub 100 KiB or 50 KiB, but the burden of proof is obviously on me.
>
> The argument of "not everybody needs this functionality" is also not particularly strong. Not everybody needs mathematical special functions, <simd>, executors, filesystems, <linalg>, multi-threading, parallel algorithms, std::hive and various other rarely-used containers, and many more features. Admittedly, not all of these contribute to binary or memory size,
Exactly. If you don't use them, they don't add anything to your program.
> I'm just pointing out that features are useful and appropriate for standardization even if a fraction of developers use them.
I suspect the fraction who use multi-threading is several orders of
magnitude larger than the fraction that would want to print out
Unicode names using std::format.
I think this would make more sense as a transformation that could be
used to adapt a utf_view, rather than something built in to
std::format. And it wouldn't need to be in the standard then.
Received on 2025-05-23 14:34:25