ISOCPP std-proposals List: Re: [std-proposals] Low-level float parsing functions

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Mon, 28 Apr 2025 14:23:59 +0000

Funny you should mention that. I'm currently working on an implementation of this.
And I go even a step further than that.
Dealing with characters and encodings is way too specific.
I'm trying to create a library that uses the concept of digits, where you have a list of numbers from 0 to 9, that represents your digits and no reference to encoding.

The reason being to completely separate text parsing and encoding from the numeric conversion to and from digits. Converting from any text encoding into digits is a trivial task that anyone can easily write, while the conversion of the digits into floating point or vice versa is where the meat of the problem is.

And to answer the question regarding why you would want to do this, and is it faster?
The right reason isn't so much to make it faster, but the ability to do specialized formatting at all.
Which although you can do by "reparsing" the output of the current char_conv implementation, it is a lot of work, and I can confirm it is indeed much faster to just do it in more direct way.
Plus, to_chars can't do round-trip-shortest representation correctly since which representation ends up to be the shortest depends on how you represent it in text.

________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Jan Schultke via Std-Proposals <std-proposals_at_[hidden]>
Sent: Monday, April 28, 2025 3:41:57 PM
To: C++ Proposals <std-proposals_at_[hidden]>
Cc: Jan Schultke <janschultke_at_[hidden]>
Subject: [std-proposals] Low-level float parsing functions

Hey folks,

I am not a big fan of the interface of std::from_chars for floating-point numbers we have currently. The issue is that it conflates the "text-based parsing" of floating-point numbers with the numeric conversion. The latter part is vastly more complicated, and it would be nice if we had more direct access to it.

To clarify, it's possible that due to localization, we have digit separators, different radix points than a comma, etc. It's possible that we use "p", "P", "d", "D", "+10^" or other exponent separators than "E", which is what std::from_chars accepts. There are also use cases where we know that no exponent part exists, and it would be a waste of time to look for it during parsing. Some of these problems are covered by the std::chars_format options, but the world of floating-point formats is more complex than what fits into four enum constants.

A possible solution is to introduce a lower-level set of std::parse_float functions, with an interface like:

template<floating_point F>
struct parse_float_result {
F result;
errc ec;
};

// decimal float, only integer part
parse_float_result
parse_float_integer(string_view integer, int exponent = 0);
parse_float_result
parse_float_integer(string_view integer, string_view exponent);
// decimal float, only fractional part
parse_float_result
parse_float_fraction(string_view fraction, int exponent = 0);
parse_float_result
parse_float_fraction(string_view fraction, string_view exponent);
// decimal float, integer and fractional part
parse_float_result
parse_float(string_view integer, string_view fraction, int exponent = 0);
parse_float_result
parse_float(string_view integer, string_view fraction, string_view exponent);

// hex float, only integer part
parse_float_result
parse_float_hex_integer(string_view integer, int exponent = 0);
parse_float_result
parse_float_hex_integer(string_view integer, string_view exponent);
// hex float, only fractional part
parse_float_result
parse_float_hex_fraction(string_view fraction, int exponent = 0);
parse_float_result
parse_float_hex_fraction(string_view fraction, string_view exponent);
// hex float, integer and fractional part
parse_float_result
parse_float_hex(string_view integer, string_view fraction, int exponent = 0);
parse_float_result
parse_float_hex(string_view integer, string_view fraction, string_view exponent);

The implementation effort is relatively low. The same underlying floating-point parser can be used as for std::from_chars; we're just bypassing the initial steps of looking for the radix point, exponent separator, etc. and jumping straight into the computational part.

Note that this may have benefits even if we could have used std::from_chars. Floating-point parsing is often preceded by a tokenization, possibly with regex matching, and we get the subdivision into integer/fraction/exponent "for free" as a side product.

Received on 2025-04-28 14:24:02