C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Low-level float parsing functions

From: Jan Schultke <janschultke_at_[hidden]>
Date: Wed, 30 Apr 2025 23:29:41 +0200
> In any case there are only 3 main numerical bases that you want to use for
> floats: decimal, hexadecimal, binary. The algorithms used for those 3 are
> all different. There's not going to be floating point conversion to a
> generic base.
>

The initial part seems to always be the same, regardless of base or whether
the floating-point number is decimal or binary. That is, forming a mantissa
integer from the integer and fraction, and parsing the exponent separately.

Since this initial part is always the same, and the lowest level one can
provide before the algorithm-specific parts begin, I think it makes sense
to expose it.



> I see no point in passing in Ints to represent the digits, that is not
> going to be efficient. It is just better to pass in a span of individual
> digits (because that is probably the form that they are going to be in
> anyway).
> This can be done by either having the digits mapped to ASCII (which is
> form the data will most likely be in anyways) or transcoded to binary digit
> representation.
>

It's going to be efficient because that's what dragonbox and fast_float do
anyway. The first part is computing the mantissa and exponent, and then the
interesting and difficult latter part begins.


> Just because you can do something, it doesn't always mean it's a good
> idea, and in this case it is premature pessimization.
> The only reason we do this is because humans can't understand binary. So,
> we have to transform back and forward to a form that monkeys can read with
> their eyes, that is the problem space.
>

The point isn't to optimize, but to provide an interface that abstracts
from the hard part of the problem, and is independent of text encoding.
Obtaining the mantissa and exponent seems like something that always needs
to be done, and with that interface, you can support digit separators too,
since the caller is filtering them out.

As I've said before, we should propose multiple levels of convenience on
top of that lowest-level function. That function which takes sequences of
ASCII digits would still exist, in addition.

Received on 2025-04-30 21:29:57