Date: Thu, 28 Aug 2025 21:30:40 +0200
On 28/08/2025 18:41, Thiago Macieira via Std-Proposals wrote:
> On Thursday, 28 August 2025 00:14:07 Pacific Daylight Time David Brown via Std-
> Proposals wrote:
>> With 32-bit doubles, the compiler is not standards conforming
>
> In what way?
>
> The C and C++ Standards do not require double to have a better precision than
> float. See https://cigix.me/c17#E.p4.
>
>
The range for doubles can be the same (DBL_MAX need only be 1e37, the
same as FLT_MAX), but the precision has to be greater (DBL_DIG must be
at least 10, requiring about 30 or so bits for the mantissa, and
DBL_EPSILON must be 1e-9 or smaller).
Doubles don't have to be 64-bit to be conforming - I think you can get
away with about 48 bits, but I am not an expert on the matter.
Doubles don't have to be better than floats - /if/ you have floats that
satisfy the requirements for doubles. So 64-bit floats and doubles is
fine, but 32-bit doubles are not.
32-bit doubles are, however, more useful on these small microcontrollers
in practice - you are not going to get far with 64-bit software floating
point routines that take up all the flash code space for many devices!
> On Thursday, 28 August 2025 00:14:07 Pacific Daylight Time David Brown via Std-
> Proposals wrote:
>> With 32-bit doubles, the compiler is not standards conforming
>
> In what way?
>
> The C and C++ Standards do not require double to have a better precision than
> float. See https://cigix.me/c17#E.p4.
>
>
The range for doubles can be the same (DBL_MAX need only be 1e37, the
same as FLT_MAX), but the precision has to be greater (DBL_DIG must be
at least 10, requiring about 30 or so bits for the mantissa, and
DBL_EPSILON must be 1e-9 or smaller).
Doubles don't have to be 64-bit to be conforming - I think you can get
away with about 48 bits, but I am not an expert on the matter.
Doubles don't have to be better than floats - /if/ you have floats that
satisfy the requirements for doubles. So 64-bit floats and doubles is
fine, but 32-bit doubles are not.
32-bit doubles are, however, more useful on these small microcontrollers
in practice - you are not going to get far with 64-bit software floating
point routines that take up all the flash code space for many devices!
Received on 2025-08-28 19:30:47