Date: Sat, 18 May 2024 22:46:34 +0000
>> 2. Add support for charN_t in std::from_chars() and std::to_chars().
> This is easily doable but I think we can do better. But there is something else in this field that I’m experimenting with right now, and I think we can even do 1-step further with “encoding agnostic numerical conversions”. Instead of a to_chars and from_chars, we have a to_digits and from_digits, which instead of converting directly to/from a character encoding it converts between an intermediate numerical representation of the individual digits that then can easily be converted to any encoding be it charN_t or otherwise. I expect a minimal penalty in performance (<10%) or even parity to doing the conversion directly to charN_t. I will be working on this and provide with a playground repo soon.
> The “hard” part of the algorithm is to figure out which digits goes where, the encoding itself is trivial, and I think we can gain by separating these two things.
Just as a follow up: https://github.com/tmiguelf/utilities/blob/a56f19e10c374e87978e173853820f2ceb456a2c/CoreLib/benchmarks/CoreLib_benchmark_charconv/src/CoreLib_benchmark_charconv.cpp#L1175
I’m getting numbers between 30%-100% impact on performance by having an intermediate encoding agnostic form. Adding an offset and a copy is relatively expensive. I’m still going to try and play around with it to see if it can be better, but the algorithms are too efficient at this point that the cost of copying and offsetting takes a significant large portion of time, may not be able to improve much more than that ☹
The idea may still be conceptually valid, but it has its costs.
> This is easily doable but I think we can do better. But there is something else in this field that I’m experimenting with right now, and I think we can even do 1-step further with “encoding agnostic numerical conversions”. Instead of a to_chars and from_chars, we have a to_digits and from_digits, which instead of converting directly to/from a character encoding it converts between an intermediate numerical representation of the individual digits that then can easily be converted to any encoding be it charN_t or otherwise. I expect a minimal penalty in performance (<10%) or even parity to doing the conversion directly to charN_t. I will be working on this and provide with a playground repo soon.
> The “hard” part of the algorithm is to figure out which digits goes where, the encoding itself is trivial, and I think we can gain by separating these two things.
Just as a follow up: https://github.com/tmiguelf/utilities/blob/a56f19e10c374e87978e173853820f2ceb456a2c/CoreLib/benchmarks/CoreLib_benchmark_charconv/src/CoreLib_benchmark_charconv.cpp#L1175
I’m getting numbers between 30%-100% impact on performance by having an intermediate encoding agnostic form. Adding an offset and a copy is relatively expensive. I’m still going to try and play around with it to see if it can be better, but the algorithms are too efficient at this point that the cost of copying and offsetting takes a significant large portion of time, may not be able to improve much more than that ☹
The idea may still be conceptually valid, but it has its costs.
Received on 2024-05-18 22:46:38