Date: Thu, 16 Apr 2026 14:12:30 -0400
On 4/16/2026 5:26, Jan Schultke wrote:
> We ought to have some portable way to classify a floating-point type
> according to its ISO/IEC 60559 format. For example, one may want to
> detect at compile time whether float has the same representation as
> std::float32_t, and other such things.
>
> Merely checking whether the type is 32 bits large and is_iec559is true
> doesn't necessarily work. The floating-point type could also be in an
> extended 16-bit format, or in an interchange format that also has 32
> bits but is not binary32, or it could be binary16 with 16 bits of padding.
The way I see it, there's basically two approaches you can go here.
Either you go for the very concrete approach, where you have a
comprehensive list of formats it may be, or you go for the very abstract
approach where you describe the type by the Cartesian product of its
capabilities. If you compare to integers, the latter approach is
generally fully describable with bit width * signed representation (in
practice, {unsigned, 2's complement} or just unsigned or signed)--so
simple that the abstract approach is arguably simpler than the concrete
approach.
I've been working on a project where I'm trying to collect and model the
behavior of every floating-point format I can get my hands on. I don't
have enough coverage yet to be fully confident in the completeness my
description of the abstract properties of a generic floating-point
format. A starting point of radix * exponent range * bits in the
significand (the model that C uses when it discusses model
floating-point numbers) is a good start, but some formats just really
defy description that way (the PPC long double and posits being the most
notable examples).
Turning instead to the concrete model, where you instead identify the
"real" underlying type from a mostly closed set, seems easier at first
glance, but does have some pitfalls. Looking at formats supported in CPU
hardware that was made in the 21st century, the complete list of formats
seems to be IEEE 754's binary16/32/64/128, decimal32/64/128, IBM's hex
float 32/64/128, VAX F/G floating-point (available on Alpha), bfloat,
the 80-bit long double used on x86 [1], and the PPC long double type.
There's possibly a few more types supported via software emulation
(e.g., VAX H floating-point), but I can't confirm these. Accelerators as
of late have been introducing all sorts of weird tiny floating-point
types, but I think the likelihood of these being categorized as extended
floating-point types in C++ terminology is low, and I'm disinclined to
proactively support them before loud user demand for them exists.
But there is another small wrinkle in the concrete model, and that is
that some of these formats have multiple encodings. On MIPS, there is a
legacy choice for distinguishing sNaN from qNaN, so the standard binary
formats basically have IEEE and legacy MIPS NaN encoding rules. The
decimal floating-point formats have BID and DPD encodings. It is a
question worth explicitly answering whether or not the goal is to query
the logical format or the bitwise encoding format.
At the end of the day, your compiler implementation is going to boil the
floating-point types into its own "real" IR types, which consist of
binary16/32/64/128, and maybe weirdS/D/X (single, double, extended), a
platform-specific floating-point type. If weird types are present, then
you probably have a switch indicating whether to map float/double to
weird or IEEE types, and you may well provide dedicated weird types in
IEEE floating-point mode anyways. Beyond these types, any extra types
would only have a single, consistent spelling: a hypothetical
std::decimal32_t only refers to one format, and that format would have
no other spelling that constitutes a distinct type [2].
So a simple enum that consists of binaryN values and otherS/D/X values
probably suffices to distinguish all of the floating-point types
supported in the same compilation mode that might be shared.
> binary_arithmetic, // e.g. x87 80-bit long double;
>
> // has inf, qNaN, sNaN, and can represent
> numbers,
>
> // but is not an interchange format
>
Actually, x87 80-bit long double is a binary64x format.
[1] The m68k FPUs also had an identical 80-bit floating-point type in
format, though I don't know if the semantics are identical. However,
Motorola stopped producing them in the 1990s, from what I can tell, and
the NXP ColdFire has picked up the ISA but dropped support for the
80-bit floats in their products made this century, so technically this
doesn't meet my criteria for floating-point formats.
[2] Okay, you might come up with extended precision aliases. But if I
heard the 754 liason correctly, the next revision of IEEE 754 is likely
to drop extended precision formats, and I'm not aware of any
implementation of an extended precision for anything other than a weirdD
or binary64 format.
> We ought to have some portable way to classify a floating-point type
> according to its ISO/IEC 60559 format. For example, one may want to
> detect at compile time whether float has the same representation as
> std::float32_t, and other such things.
>
> Merely checking whether the type is 32 bits large and is_iec559is true
> doesn't necessarily work. The floating-point type could also be in an
> extended 16-bit format, or in an interchange format that also has 32
> bits but is not binary32, or it could be binary16 with 16 bits of padding.
The way I see it, there's basically two approaches you can go here.
Either you go for the very concrete approach, where you have a
comprehensive list of formats it may be, or you go for the very abstract
approach where you describe the type by the Cartesian product of its
capabilities. If you compare to integers, the latter approach is
generally fully describable with bit width * signed representation (in
practice, {unsigned, 2's complement} or just unsigned or signed)--so
simple that the abstract approach is arguably simpler than the concrete
approach.
I've been working on a project where I'm trying to collect and model the
behavior of every floating-point format I can get my hands on. I don't
have enough coverage yet to be fully confident in the completeness my
description of the abstract properties of a generic floating-point
format. A starting point of radix * exponent range * bits in the
significand (the model that C uses when it discusses model
floating-point numbers) is a good start, but some formats just really
defy description that way (the PPC long double and posits being the most
notable examples).
Turning instead to the concrete model, where you instead identify the
"real" underlying type from a mostly closed set, seems easier at first
glance, but does have some pitfalls. Looking at formats supported in CPU
hardware that was made in the 21st century, the complete list of formats
seems to be IEEE 754's binary16/32/64/128, decimal32/64/128, IBM's hex
float 32/64/128, VAX F/G floating-point (available on Alpha), bfloat,
the 80-bit long double used on x86 [1], and the PPC long double type.
There's possibly a few more types supported via software emulation
(e.g., VAX H floating-point), but I can't confirm these. Accelerators as
of late have been introducing all sorts of weird tiny floating-point
types, but I think the likelihood of these being categorized as extended
floating-point types in C++ terminology is low, and I'm disinclined to
proactively support them before loud user demand for them exists.
But there is another small wrinkle in the concrete model, and that is
that some of these formats have multiple encodings. On MIPS, there is a
legacy choice for distinguishing sNaN from qNaN, so the standard binary
formats basically have IEEE and legacy MIPS NaN encoding rules. The
decimal floating-point formats have BID and DPD encodings. It is a
question worth explicitly answering whether or not the goal is to query
the logical format or the bitwise encoding format.
At the end of the day, your compiler implementation is going to boil the
floating-point types into its own "real" IR types, which consist of
binary16/32/64/128, and maybe weirdS/D/X (single, double, extended), a
platform-specific floating-point type. If weird types are present, then
you probably have a switch indicating whether to map float/double to
weird or IEEE types, and you may well provide dedicated weird types in
IEEE floating-point mode anyways. Beyond these types, any extra types
would only have a single, consistent spelling: a hypothetical
std::decimal32_t only refers to one format, and that format would have
no other spelling that constitutes a distinct type [2].
So a simple enum that consists of binaryN values and otherS/D/X values
probably suffices to distinguish all of the floating-point types
supported in the same compilation mode that might be shared.
> binary_arithmetic, // e.g. x87 80-bit long double;
>
> // has inf, qNaN, sNaN, and can represent
> numbers,
>
> // but is not an interchange format
>
Actually, x87 80-bit long double is a binary64x format.
[1] The m68k FPUs also had an identical 80-bit floating-point type in
format, though I don't know if the semantics are identical. However,
Motorola stopped producing them in the 1990s, from what I can tell, and
the NXP ColdFire has picked up the ISA but dropped support for the
80-bit floats in their products made this century, so technically this
doesn't meet my criteria for floating-point formats.
[2] Okay, you might come up with extended precision aliases. But if I
heard the 754 liason correctly, the next revision of IEEE 754 is likely
to drop extended precision formats, and I'm not aware of any
implementation of an extended precision for anything other than a weirdD
or binary64 format.
Received on 2026-04-16 18:12:42
