C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Low-level float parsing functions

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Mon, 28 Apr 2025 20:57:26 +0000
To answer the question as to why you would want to do this.
char_conv is very good for configuration files.
But once you start working with localization char_conv is extremely lacking, and doesn't deal very well with specific nuances.

Let me give you an example, let's say that you are working that shows the number in the following form:
1,23×10⁻²³

Not only you have a coma instead of a period to denote the decimal separator, the "power 10 separator" is represented as "×10" instead of "E", and those characters denoting the power value (⁻²³) are Unicode superscript versions that do not map to the standard ASCII -23.
And there are other embedded systems that have their own custom code pages that don't map to ASCII, I have seen this quite commonly in aircraft systems that you have probably flown in.

"char_conv" can't deal with any of that.
Hence also the reason as to why I don't think just dealing with "where the decimal place is" is not low-level enough.

But this is not a difficult problem to solve at all, it looks complex, but it isn't.
if I know that my text could be represented in this way, it is very easy for me to write an algorithm to extract the digits (and even deal with ⁻²³), the hard part is once you have those digits how do you convert that to a float? (or vice-versa, if you have the float, how do you get the digits).
Those low-level functions could do that job.

However, they do present a challenge:
std::from_chars is quite easy to use, you can just give it to a user, provide a short description and they would be able to just use it.
These low-level implementations that I think are required, not so much. If I just give them to you it is not immediately clear how you would use it, you will need to know a fair amount about floating point and text encoding, and there's a lot of supporting code you need to write yourself before you even get to the point where you can use it.
Not beginner friendly. But if you know what you are doing, it's the ultimate tool, and in some situations the only tool (unless you like to write those converters over and over yourself).


-----Original Message-----
From: Std-Proposals <std-proposals-bounces_at_[hidden]cpp.org> On Behalf Of Jason McKesson via Std-Proposals
Sent: Monday, April 28, 2025 5:22 PM
To: std-proposals_at_lists.isocpp.org
Cc: Jason McKesson <jmckesson_at_[hidden]>
Subject: Re: [std-proposals] Low-level float parsing functions

On Mon, Apr 28, 2025 at 10:23 AM Jan Schultke <janschultke_at_googlemail.com> wrote:
>
> > Do you have evidence that this extra parsing time is a significant
> > problem that your solution would resolve?
>
> I haven't collected the data on this yet, but in any case, it's ultimately not what makes or breaks the idea. You seem to have misunderstood the idea as being entirely motivated by performance, which it isn't.
>
> The issue with the std::from_chars interface is first and foremost that it's inconvenient to build higher-level parsing functions on top of std::from_chars. For example, Fortran has floats in the format "1.23D+3" and Ada uses the "16#" prefix for hexadecimal numbers. Wolfram Alpha accepts a "_16" base suffix. It's also possible that we obtain the integer and exponent separately from two GUI fields. If we wanted to process numbers in these formats, we would have to concatenate them back into a buffer and convert them into the format that std::from_chars accepts. This isn't entirely trivial; we either allocate by using std::string, or have to pick a buffer size large enough for integer, fraction, and exponent, and it's not obvious what that buffer size is.
>
> The first thing that std::from_chars does is split the buffer back into integer, fraction, and exponent. It is utterly idiotic that we have to concatenate these parts to use std::from_chars, just so that they're immediately split back up.
>
> And it's still utterly idiotic even if the performance impact is relatively low.

OK, so what you want is to create a generic parsing framework that doesn't really do much on its own, but can be used to *build* number parsers that could handle arbitrary formats of numbers. That's interesting.

But why does this need to be in the *standard*?

The primary compelling motivation for `to/from_char`s is this: there is a lingua-franca number format, a de-facto standard for numeric interchange between programs. This standard is supported by most textual interchange formats: JSON, YAML, CSV (to the extent that this can be considered a format), etc. Almost all of these formats use the lingua-franca encoding for textual interchange (UTF-8), and they all basically agree on how numbers should be formatted. Lots of programs generate such numbers, and lots of programs have to read such numbers.

Because of the substantial widespread use of textual numbers in such formats, there is an obvious benefit to having a tool that can rapidly handle numbers in those formats.

Once you start talking about an omni-number parser, one which is basically a set of parsing tools that you compose to parse any format, the justification for standardization becomes a lot less clear cut.
Such a tool can be very useful for particular users. But the question of why it needs to be in the C++ standard? Is it useful enough to be worth adding, or can people just use the library?
--
Std-Proposals mailing list
Std-Proposals_at_lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-04-28 20:57:29