Date: Sat, 7 Jun 2025 17:47:31 -0400
On Sat, Jun 7, 2025 at 5:15 PM Avi Kivity via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> Of course I could implement it in my own optional (and other classes - I gave optional as an example but it works for other use cases as well), but then it wouldn't work in the trillions of other optionals in my code (that I can't change) and others' code.
>
>
> I wrote a demonstration [1]. It is written to penalize the standard implementation by having the worst mix of optional and non-optional (but is not cheating in any way). If your machine has a better branch predictor than mine, you may need to increase the vector size (10,000).
>
>
> std::optional run:
>
> 4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
>
> Optional assuming std::string's default constructor is a dummy value:
>
> 3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
>
>
> Note it executed more instructions, but instructions are cheap compared to branches, so the increased IPC compensated.
>
> This change won't double your program's performance. It's a small change intended to shave a few branches from the binary. I expect anyone working with business code will agree that branches are the bane of performance.
>
>
> Raw data: first std, then mine.
>
> Performance counter stats for './optional-dummy':
>
> 42,368.58 msec task-clock # 1.000 CPUs utilized
> 477 context-switches # 11.258 /sec
> 49 cpu-migrations # 1.157 /sec
> 423 page-faults # 9.984 /sec
> 314,441,962,886 instructions # 1.56 insn per cycle
> # 0.11 stalled cycles per insn
> 202,073,359,408 cycles # 4.769 GHz
> 35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
> 94,770,734,694 branches # 2.237 G/sec
> 4,347,401,750 branch-misses # 4.59% of all branches
>
> 42.377120812 seconds time elapsed
>
> 42.175060000 seconds user
> 0.002988000 seconds sys
>
>
> Performance counter stats for './optional-dummy a':
>
> 32,803.96 msec task-clock # 1.000 CPUs utilized
> 284 context-switches # 8.657 /sec
> 37 cpu-migrations # 1.128 /sec
> 419 page-faults # 12.773 /sec
> 360,082,292,647 instructions # 2.32 insn per cycle
> # 0.05 stalled cycles per insn
> 155,116,568,268 cycles # 4.729 GHz
> 16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
> 89,856,063,602 branches # 2.739 G/sec
> 1,865,582,453 branch-misses # 2.08% of all branches
>
> 32.809259346 seconds time elapsed
>
> 32.659071000 seconds user
> 0.001994000 seconds sys
>
>
> PS. I'm proposing to change std::optional (so it won't be harder to use).
>
>
> [1] https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870
>
> On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-Proposals wrote:
>
> You can easily implement your own version of optional that does exactly that (it would even be a simpler class to implement, although not to use). The question then is why? If you somehow think that you would gain performance out of it, then my suggestion is: implement this type of optional and show both a benchmark where it makes a significant difference and tell us the use case where this performance gain can be leveraged.
>
> My 2 c.
>
> P.S. you may already "pay" for a branch in the case of std::string and the like due to SSO, I'm sure the compiler can do something smart with the extra branch in optional for both move and destruction.
> D.S
>
> // Robin
>
> On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> > > Why wouldn't it work with fundamental types?
> > > I explicitly marked them as supporting the protocol. Maybe I did a
> > > bad job of explaining myself.
> >
> > Because every possible value of type int is not an empty
> > std::optional
> > when you put it into std::optional already. You cannot define int(-1)
> > to be an empty std::optional after the fact, so you're stuck with
> > effectively storing an extra bool.
> >
> > This behavior can obviously not be changed at this point.
>
>
> But that is not my proposal. std::optional keeps its bool flag. The
> difference is in how the special member functions are generated:
>
>
> std::optional<T>::~optional() {
> if constexpr (uses the dummy protocol) {
> _value->~T();
> } else {
> if (_engaged) {
> _value->~T();
> }
> }
> }
>
> We avoided a conditional. For fundamental types, we didn't win
> anything, but (say) the move constructor becomes simpler:
>
>
> std::optional<T>::optional(optional&& o) {
> // omitting exception safety
> if constexpr (uses the dummy protocol) {
> _engaged = std::exchange(o._engaged, false);
> _value = std::move(o._value);
> } else {
> _engaged = std::exchange(o._engaged, false);
> if (_engaged) {
> // Can be replaced with std::relocate_at()
> std::construct_at(&_value, std::move(o._value));
> std::destroy_at(&o._value);
> }
> }
> }
>
> For fundamental types, and for trivially relocatable types, moving an
> optional becomes just data movement with no conditionals.
Fundamental types don't gain any performance from this whatsoever
because they don't have a conditional destructor call at all. `int`
doesn't really have a destructor, so the compiler should emit zero
code for `if(engaged) val.~int();` Indeed, for any type whose
destructor clearly does nothing, there should not be any performance
to be gained.
In any case, given that the performance difference is not especially
huge even in a tight loop (the best possible case for seeing a
performance gain), I'd say that this is too specialized and fiddly to
go into the standard. If business users need this performance in some
code, they can use a type specially designed for it. And indeed, such
a type could actively forbid the use of types that aren't able to do
this in order to avoid misuse.
<std-proposals_at_[hidden]> wrote:
>
> Of course I could implement it in my own optional (and other classes - I gave optional as an example but it works for other use cases as well), but then it wouldn't work in the trillions of other optionals in my code (that I can't change) and others' code.
>
>
> I wrote a demonstration [1]. It is written to penalize the standard implementation by having the worst mix of optional and non-optional (but is not cheating in any way). If your machine has a better branch predictor than mine, you may need to increase the vector size (10,000).
>
>
> std::optional run:
>
> 4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
>
> Optional assuming std::string's default constructor is a dummy value:
>
> 3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
>
>
> Note it executed more instructions, but instructions are cheap compared to branches, so the increased IPC compensated.
>
> This change won't double your program's performance. It's a small change intended to shave a few branches from the binary. I expect anyone working with business code will agree that branches are the bane of performance.
>
>
> Raw data: first std, then mine.
>
> Performance counter stats for './optional-dummy':
>
> 42,368.58 msec task-clock # 1.000 CPUs utilized
> 477 context-switches # 11.258 /sec
> 49 cpu-migrations # 1.157 /sec
> 423 page-faults # 9.984 /sec
> 314,441,962,886 instructions # 1.56 insn per cycle
> # 0.11 stalled cycles per insn
> 202,073,359,408 cycles # 4.769 GHz
> 35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
> 94,770,734,694 branches # 2.237 G/sec
> 4,347,401,750 branch-misses # 4.59% of all branches
>
> 42.377120812 seconds time elapsed
>
> 42.175060000 seconds user
> 0.002988000 seconds sys
>
>
> Performance counter stats for './optional-dummy a':
>
> 32,803.96 msec task-clock # 1.000 CPUs utilized
> 284 context-switches # 8.657 /sec
> 37 cpu-migrations # 1.128 /sec
> 419 page-faults # 12.773 /sec
> 360,082,292,647 instructions # 2.32 insn per cycle
> # 0.05 stalled cycles per insn
> 155,116,568,268 cycles # 4.729 GHz
> 16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
> 89,856,063,602 branches # 2.739 G/sec
> 1,865,582,453 branch-misses # 2.08% of all branches
>
> 32.809259346 seconds time elapsed
>
> 32.659071000 seconds user
> 0.001994000 seconds sys
>
>
> PS. I'm proposing to change std::optional (so it won't be harder to use).
>
>
> [1] https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870
>
> On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-Proposals wrote:
>
> You can easily implement your own version of optional that does exactly that (it would even be a simpler class to implement, although not to use). The question then is why? If you somehow think that you would gain performance out of it, then my suggestion is: implement this type of optional and show both a benchmark where it makes a significant difference and tell us the use case where this performance gain can be leveraged.
>
> My 2 c.
>
> P.S. you may already "pay" for a branch in the case of std::string and the like due to SSO, I'm sure the compiler can do something smart with the extra branch in optional for both move and destruction.
> D.S
>
> // Robin
>
> On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> > > Why wouldn't it work with fundamental types?
> > > I explicitly marked them as supporting the protocol. Maybe I did a
> > > bad job of explaining myself.
> >
> > Because every possible value of type int is not an empty
> > std::optional
> > when you put it into std::optional already. You cannot define int(-1)
> > to be an empty std::optional after the fact, so you're stuck with
> > effectively storing an extra bool.
> >
> > This behavior can obviously not be changed at this point.
>
>
> But that is not my proposal. std::optional keeps its bool flag. The
> difference is in how the special member functions are generated:
>
>
> std::optional<T>::~optional() {
> if constexpr (uses the dummy protocol) {
> _value->~T();
> } else {
> if (_engaged) {
> _value->~T();
> }
> }
> }
>
> We avoided a conditional. For fundamental types, we didn't win
> anything, but (say) the move constructor becomes simpler:
>
>
> std::optional<T>::optional(optional&& o) {
> // omitting exception safety
> if constexpr (uses the dummy protocol) {
> _engaged = std::exchange(o._engaged, false);
> _value = std::move(o._value);
> } else {
> _engaged = std::exchange(o._engaged, false);
> if (_engaged) {
> // Can be replaced with std::relocate_at()
> std::construct_at(&_value, std::move(o._value));
> std::destroy_at(&o._value);
> }
> }
> }
>
> For fundamental types, and for trivially relocatable types, moving an
> optional becomes just data movement with no conditionals.
Fundamental types don't gain any performance from this whatsoever
because they don't have a conditional destructor call at all. `int`
doesn't really have a destructor, so the compiler should emit zero
code for `if(engaged) val.~int();` Indeed, for any type whose
destructor clearly does nothing, there should not be any performance
to be gained.
In any case, given that the performance difference is not especially
huge even in a tight loop (the best possible case for seeing a
performance gain), I'd say that this is too specialized and fiddly to
go into the standard. If business users need this performance in some
code, they can use a type specially designed for it. And indeed, such
a type could actively forbid the use of types that aren't able to do
this in order to avoid misuse.
Received on 2025-06-07 21:47:45