Date: Wed, 6 Dec 2023 09:25:55 +0100
On 05/12/2023 23:55, Bongo Ferno via Std-Discussion wrote:
> Should this be a new feature for the c++ language,or is it already
> implemented?
No, this should not be part of the language (or standard library) - this
is compiler implementation territory. The C++ language does not work at
a level of registers and instructions - it's a high level language,
specified for an abstract machine. And that's good. After all, the
same language runs on 8-bit microcontrollers, 64-bit ThreadRippers, and
everything in between - including lots of devices few people have ever
heard of.
A compiler /implementation/ can have extensions for this sort of thing,
if it is important enough and useful enough, and can't be expressed well
in normal code. So it is possible to imagine a gcc "builtin" called
"__builtin_mux(a, b, c)" which has the effect of returning "b" or "c"
depending on the value of "a", in as efficient a manner as possible on
the target platform. But of course that's unnecessary as the semantics
can be expressed easily in C++ as a ternary operation.
Thus is it is a matter of compiler quality of implementation to generate
as good code as possible for the ternary operation - and the other ways
you have expressed the same functionality.
As for how this can best be implemented, I can pretty much guarantee
that whatever you think is the best choice, is wrong. It will be wrong
because there are many ways to handle this in generated code, and it all
depends on many factors. It will depend on the target, and the compiler
flags (especially optimisation choices and flags to specify the exact
target - there are vast differences between early 64-bit cpus and modern
ones).
Most importantly, the implementation will depend on surrounding code and
loops. (And if it is not in a loop of some sort, the speed is
irrelevant.) This could let it hold constants -1.0 and 1.0 in SIMD
registers throughout the loop - no need to load them later. Maybe the
way the result is used in other calculations mean there is no need to
calculate the actual "mapped" value. Maybe the branch can be pulled
further out - maybe some or part of the function can be cloned based on
"uintFlag" being 0 or 1, leading to other simplifications. There are a
/lot/ of "maybes".
(This applies to all sorts of code efficiency questions. It's only for
high-level efficiency issues that it makes sense to think about adding
to the C++ language or library.)
David
>
> I need to map a 64 bit unsigned integer from {0,1} to 64 bit double {1,-1}
>
> The available solutions are
>
> 1. An if
>
> double mapped = uintFlag ? double(-1.0) : double(1.0);
> // It's slow, because it introduces a branch
>
> 2. An arithmetic expression
> double mapped = 1 - 2 * int(uintFlag);
> // It's slow, because it needs type conversion
>
> 3. A constant array
> constexpr double mapArray[2] = {1.0, -1.0};
> double answer = mapArray[uintFlag];
> // It's slow, because it needs 10 cycles to fetch from memory, but
> tends to be faster, because it's branchless, and doesn't convert types.
>
> Some bitwise hacks are possible, but they depend on formats for negative
> integers, which are not portable between architectures.
>
> ¿is possible to create, or already exists, a smallMap instruction that
> emulates a little array inside the ALU, to select a small number of
> constants from a small integer index, or at least a boolean flag?
>
> For example, two constants are in 2 registers, and a flag decides which
> one is copied.
>
> Or else, an instruction to convert a boolean flag into a sign switcher
> operator?
>
> Or maybe a GOTO to an integer label.
>
> It would be useful in physics, geometry, graphics, and geometric
> algebra, where bitwise hacks over unsigned integers translate to a
> multitude of floating point sign switches, but the mapping of boolean to
> floating point is unnecessarily expensive.
>
> Should this be a new feature for the c++ language,or is it already
> implemented?
No, this should not be part of the language (or standard library) - this
is compiler implementation territory. The C++ language does not work at
a level of registers and instructions - it's a high level language,
specified for an abstract machine. And that's good. After all, the
same language runs on 8-bit microcontrollers, 64-bit ThreadRippers, and
everything in between - including lots of devices few people have ever
heard of.
A compiler /implementation/ can have extensions for this sort of thing,
if it is important enough and useful enough, and can't be expressed well
in normal code. So it is possible to imagine a gcc "builtin" called
"__builtin_mux(a, b, c)" which has the effect of returning "b" or "c"
depending on the value of "a", in as efficient a manner as possible on
the target platform. But of course that's unnecessary as the semantics
can be expressed easily in C++ as a ternary operation.
Thus is it is a matter of compiler quality of implementation to generate
as good code as possible for the ternary operation - and the other ways
you have expressed the same functionality.
As for how this can best be implemented, I can pretty much guarantee
that whatever you think is the best choice, is wrong. It will be wrong
because there are many ways to handle this in generated code, and it all
depends on many factors. It will depend on the target, and the compiler
flags (especially optimisation choices and flags to specify the exact
target - there are vast differences between early 64-bit cpus and modern
ones).
Most importantly, the implementation will depend on surrounding code and
loops. (And if it is not in a loop of some sort, the speed is
irrelevant.) This could let it hold constants -1.0 and 1.0 in SIMD
registers throughout the loop - no need to load them later. Maybe the
way the result is used in other calculations mean there is no need to
calculate the actual "mapped" value. Maybe the branch can be pulled
further out - maybe some or part of the function can be cloned based on
"uintFlag" being 0 or 1, leading to other simplifications. There are a
/lot/ of "maybes".
(This applies to all sorts of code efficiency questions. It's only for
high-level efficiency issues that it makes sense to think about adding
to the C++ language or library.)
David
>
> I need to map a 64 bit unsigned integer from {0,1} to 64 bit double {1,-1}
>
> The available solutions are
>
> 1. An if
>
> double mapped = uintFlag ? double(-1.0) : double(1.0);
> // It's slow, because it introduces a branch
>
> 2. An arithmetic expression
> double mapped = 1 - 2 * int(uintFlag);
> // It's slow, because it needs type conversion
>
> 3. A constant array
> constexpr double mapArray[2] = {1.0, -1.0};
> double answer = mapArray[uintFlag];
> // It's slow, because it needs 10 cycles to fetch from memory, but
> tends to be faster, because it's branchless, and doesn't convert types.
>
> Some bitwise hacks are possible, but they depend on formats for negative
> integers, which are not portable between architectures.
>
> ¿is possible to create, or already exists, a smallMap instruction that
> emulates a little array inside the ALU, to select a small number of
> constants from a small integer index, or at least a boolean flag?
>
> For example, two constants are in 2 registers, and a flag decides which
> one is copied.
>
> Or else, an instruction to convert a boolean flag into a sign switcher
> operator?
>
> Or maybe a GOTO to an integer label.
>
> It would be useful in physics, geometry, graphics, and geometric
> algebra, where bitwise hacks over unsigned integers translate to a
> multitude of floating point sign switches, but the mapping of boolean to
> floating point is unnecessarily expensive.
>
Received on 2023-12-06 08:26:03