Date: Mon, 28 Apr 2025 11:22:09 -0400
On Mon, Apr 28, 2025 at 10:23 AM Jan Schultke
<janschultke_at_[hidden]> wrote:
>
> > Do you have evidence that this extra parsing time is a significant
> > problem that your solution would resolve?
>
> I haven't collected the data on this yet, but in any case, it's ultimately not what makes or breaks the idea. You seem to have misunderstood the idea as being entirely motivated by performance, which it isn't.
>
> The issue with the std::from_chars interface is first and foremost that it's inconvenient to build higher-level parsing functions on top of std::from_chars. For example, Fortran has floats in the format "1.23D+3" and Ada uses the "16#" prefix for hexadecimal numbers. Wolfram Alpha accepts a "_16" base suffix. It's also possible that we obtain the integer and exponent separately from two GUI fields. If we wanted to process numbers in these formats, we would have to concatenate them back into a buffer and convert them into the format that std::from_chars accepts. This isn't entirely trivial; we either allocate by using std::string, or have to pick a buffer size large enough for integer, fraction, and exponent, and it's not obvious what that buffer size is.
>
> The first thing that std::from_chars does is split the buffer back into integer, fraction, and exponent. It is utterly idiotic that we have to concatenate these parts to use std::from_chars, just so that they're immediately split back up.
>
> And it's still utterly idiotic even if the performance impact is relatively low.
OK, so what you want is to create a generic parsing framework that
doesn't really do much on its own, but can be used to *build* number
parsers that could handle arbitrary formats of numbers. That's
interesting.
But why does this need to be in the *standard*?
The primary compelling motivation for `to/from_char`s is this: there
is a lingua-franca number format, a de-facto standard for numeric
interchange between programs. This standard is supported by most
textual interchange formats: JSON, YAML, CSV (to the extent that this
can be considered a format), etc. Almost all of these formats use the
lingua-franca encoding for textual interchange (UTF-8), and they all
basically agree on how numbers should be formatted. Lots of programs
generate such numbers, and lots of programs have to read such numbers.
Because of the substantial widespread use of textual numbers in such
formats, there is an obvious benefit to having a tool that can rapidly
handle numbers in those formats.
Once you start talking about an omni-number parser, one which is
basically a set of parsing tools that you compose to parse any format,
the justification for standardization becomes a lot less clear cut.
Such a tool can be very useful for particular users. But the question
of why it needs to be in the C++ standard? Is it useful enough to be
worth adding, or can people just use the library?
<janschultke_at_[hidden]> wrote:
>
> > Do you have evidence that this extra parsing time is a significant
> > problem that your solution would resolve?
>
> I haven't collected the data on this yet, but in any case, it's ultimately not what makes or breaks the idea. You seem to have misunderstood the idea as being entirely motivated by performance, which it isn't.
>
> The issue with the std::from_chars interface is first and foremost that it's inconvenient to build higher-level parsing functions on top of std::from_chars. For example, Fortran has floats in the format "1.23D+3" and Ada uses the "16#" prefix for hexadecimal numbers. Wolfram Alpha accepts a "_16" base suffix. It's also possible that we obtain the integer and exponent separately from two GUI fields. If we wanted to process numbers in these formats, we would have to concatenate them back into a buffer and convert them into the format that std::from_chars accepts. This isn't entirely trivial; we either allocate by using std::string, or have to pick a buffer size large enough for integer, fraction, and exponent, and it's not obvious what that buffer size is.
>
> The first thing that std::from_chars does is split the buffer back into integer, fraction, and exponent. It is utterly idiotic that we have to concatenate these parts to use std::from_chars, just so that they're immediately split back up.
>
> And it's still utterly idiotic even if the performance impact is relatively low.
OK, so what you want is to create a generic parsing framework that
doesn't really do much on its own, but can be used to *build* number
parsers that could handle arbitrary formats of numbers. That's
interesting.
But why does this need to be in the *standard*?
The primary compelling motivation for `to/from_char`s is this: there
is a lingua-franca number format, a de-facto standard for numeric
interchange between programs. This standard is supported by most
textual interchange formats: JSON, YAML, CSV (to the extent that this
can be considered a format), etc. Almost all of these formats use the
lingua-franca encoding for textual interchange (UTF-8), and they all
basically agree on how numbers should be formatted. Lots of programs
generate such numbers, and lots of programs have to read such numbers.
Because of the substantial widespread use of textual numbers in such
formats, there is an obvious benefit to having a tool that can rapidly
handle numbers in those formats.
Once you start talking about an omni-number parser, one which is
basically a set of parsing tools that you compose to parse any format,
the justification for standardization becomes a lot less clear cut.
Such a tool can be very useful for particular users. But the question
of why it needs to be in the C++ standard? Is it useful enough to be
worth adding, or can people just use the library?
Received on 2025-04-28 15:22:22