Date: Fri, 16 Aug 2024 04:28:25 +0000
I can see a case for a facility that ensures that the whole parts of the statement are executed before evaluating the condition.
However, I'm a bit skeptical that you could win this argumentvon performance, considering how these things are implemented. Because:
a) the result of most operations that you would be interested to look at are stored in very ephemeral cpu flags
b) these operations are generally done with a single conditional jump.
If you need a move instruction to store the result of your flag (likely to the stack), you already lost. If you need to insert any instruction at all, you might as well take the jump for an opportunity that the first statement would be sufficient to break and not have to do anymore work at all.
Good luck beating that.
Anything that is simple enough that could beat that, it's going to be simple enough that your compiler can optimize.
You need to have a very good example where code generation is going to be simpler or where branch prediction is going to play a predictable and significant enough impact to make a difference.
Otherwise, it would be just premature pessimization.
Not impossible, but very very unlikely.
Guaranteeing that all statements are executed is a much better argument.
________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Robin Savonen Söderholm via Std-Proposals <std-proposals_at_[hidden]>
Sent: Thursday, August 15, 2024 8:39:51 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Robin Savonen Söderholm <robinsavonensoderholm_at_[hidden]>
Subject: [std-proposals] No shortcut logical operands-functions
Hi!
I saw a CPP-con talk a while ago about how the shortcutting logical operations may generate several branches that in some cases may hurt performance. E.g. if we have the function like:
```cpp
inline int foo(int a, int b) {
if(a > 0 && b != 0) { return 0; } else { return 1; }
}
```
, the compiler may be forced to create 2 rather than 1 branch here (a jump for the condition for "a" and a second jump for the condition of "b").
If we have this in a tight loop we may get various problems with performance. (I am aware of the branch predictor in modern processors, but still, the predictor has it's limitations and I also do work on embedded processors that may not even have a branch predictor, not to mention real-time systems as well...) I am looking for a relatively easy way to say "just make one branch of this".
For example, we could have something along the lines of:
```cpp
template <boolean-testable... Ts>
constexpr bool no_shortcut_and(Ts&&... vs) {
constexpr auto bool_to_int = [] (auto&& v) {
// If we know our compiler, we can guarantee that this is generated without a branch.
return v ? 1u : 0u;
};
// Once again, if we know our compiler, we can assume that this bit-wise operation is done without any extra branches.
return ( bool_to_int(std::forward<Ts>(vs) & ... & 1) == 1;
}
// To show it's usage:
inline int foo(int a, int b) {
if(std::no_shortcut_and(a > 0, b != 0)) { return 0; } else { return 1; }
}
```
, and similar things for or and xor (disclaimer: I have not tested the code above, it is simply a prototype of what I think should work). But we could maybe figure out some way using generator expressions or similar to make it easier/more readable to use.
This is barely a proposal but rather a "concept" of what I would like to have that should not be too easy to implement, but I feel no need to research deeply into the subject before I know that not something similar already exists or is on its way into the standard.
// With best regards, Robin
However, I'm a bit skeptical that you could win this argumentvon performance, considering how these things are implemented. Because:
a) the result of most operations that you would be interested to look at are stored in very ephemeral cpu flags
b) these operations are generally done with a single conditional jump.
If you need a move instruction to store the result of your flag (likely to the stack), you already lost. If you need to insert any instruction at all, you might as well take the jump for an opportunity that the first statement would be sufficient to break and not have to do anymore work at all.
Good luck beating that.
Anything that is simple enough that could beat that, it's going to be simple enough that your compiler can optimize.
You need to have a very good example where code generation is going to be simpler or where branch prediction is going to play a predictable and significant enough impact to make a difference.
Otherwise, it would be just premature pessimization.
Not impossible, but very very unlikely.
Guaranteeing that all statements are executed is a much better argument.
________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Robin Savonen Söderholm via Std-Proposals <std-proposals_at_[hidden]>
Sent: Thursday, August 15, 2024 8:39:51 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Robin Savonen Söderholm <robinsavonensoderholm_at_[hidden]>
Subject: [std-proposals] No shortcut logical operands-functions
Hi!
I saw a CPP-con talk a while ago about how the shortcutting logical operations may generate several branches that in some cases may hurt performance. E.g. if we have the function like:
```cpp
inline int foo(int a, int b) {
if(a > 0 && b != 0) { return 0; } else { return 1; }
}
```
, the compiler may be forced to create 2 rather than 1 branch here (a jump for the condition for "a" and a second jump for the condition of "b").
If we have this in a tight loop we may get various problems with performance. (I am aware of the branch predictor in modern processors, but still, the predictor has it's limitations and I also do work on embedded processors that may not even have a branch predictor, not to mention real-time systems as well...) I am looking for a relatively easy way to say "just make one branch of this".
For example, we could have something along the lines of:
```cpp
template <boolean-testable... Ts>
constexpr bool no_shortcut_and(Ts&&... vs) {
constexpr auto bool_to_int = [] (auto&& v) {
// If we know our compiler, we can guarantee that this is generated without a branch.
return v ? 1u : 0u;
};
// Once again, if we know our compiler, we can assume that this bit-wise operation is done without any extra branches.
return ( bool_to_int(std::forward<Ts>(vs) & ... & 1) == 1;
}
// To show it's usage:
inline int foo(int a, int b) {
if(std::no_shortcut_and(a > 0, b != 0)) { return 0; } else { return 1; }
}
```
, and similar things for or and xor (disclaimer: I have not tested the code above, it is simply a prototype of what I think should work). But we could maybe figure out some way using generator expressions or similar to make it easier/more readable to use.
This is barely a proposal but rather a "concept" of what I would like to have that should not be too easy to implement, but I feel no need to research deeply into the subject before I know that not something similar already exists or is on its way into the standard.
// With best regards, Robin
Received on 2024-08-16 04:28:28