Date: Sun, 08 Jun 2025 11:53:31 +0300
On Sat, 2025-06-07 at 17:47 -0400, Jason McKesson via Std-Proposals wrote:
> On Sat, Jun 7, 2025 at 5:15 PM Avi Kivity via Std-Proposals
> <[std-proposals_at_[hidden]](mailto:std-proposals_at_[hidden])> wrote:
>
> >
> > Of course I could implement it in my own optional (and other classes - I gave optional as an example but it works for other use cases as well), but then it wouldn't work in the trillions of other optionals in my code (that I can't change) and others' code.
> >
> >
> > I wrote a demonstration [1]. It is written to penalize the standard implementation by having the worst mix of optional and non-optional (but is not cheating in any way). If your machine has a better branch predictor than mine, you may need to increase the vector size (10,000).
> >
> >
> > std::optional run:
> >
> > 4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
> >
> > Optional assuming std::string's default constructor is a dummy value:
> >
> > 3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
> >
> >
> > Note it executed more instructions, but instructions are cheap compared to branches, so the increased IPC compensated.
> >
> > This change won't double your program's performance. It's a small change intended to shave a few branches from the binary. I expect anyone working with business code will agree that branches are the bane of performance.
> >
> >
> > Raw data: first std, then mine.
> >
> > Performance counter stats for './optional-dummy':
> >
> > 42,368.58 msec task-clock # 1.000 CPUs utilized
> > 477 context-switches # 11.258 /sec
> > 49 cpu-migrations # 1.157 /sec
> > 423 page-faults # 9.984 /sec
> > 314,441,962,886 instructions # 1.56 insn per cycle
> > # 0.11 stalled cycles per insn
> > 202,073,359,408 cycles # 4.769 GHz
> > 35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
> > 94,770,734,694 branches # 2.237 G/sec
> > 4,347,401,750 branch-misses # 4.59% of all branches
> >
> > 42.377120812 seconds time elapsed
> >
> > 42.175060000 seconds user
> > 0.002988000 seconds sys
> >
> >
> > Performance counter stats for './optional-dummy a':
> >
> > 32,803.96 msec task-clock # 1.000 CPUs utilized
> > 284 context-switches # 8.657 /sec
> > 37 cpu-migrations # 1.128 /sec
> > 419 page-faults # 12.773 /sec
> > 360,082,292,647 instructions # 2.32 insn per cycle
> > # 0.05 stalled cycles per insn
> > 155,116,568,268 cycles # 4.729 GHz
> > 16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
> > 89,856,063,602 branches # 2.739 G/sec
> > 1,865,582,453 branch-misses # 2.08% of all branches
> >
> > 32.809259346 seconds time elapsed
> >
> > 32.659071000 seconds user
> > 0.001994000 seconds sys
> >
> >
> > PS. I'm proposing to change std::optional (so it won't be harder to use).
> >
> >
> > [1] [https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870](https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870)
> >
> > On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-Proposals wrote:
> >
> > You can easily implement your own version of optional that does exactly that (it would even be a simpler class to implement, although not to use). The question then is why? If you somehow think that you would gain performance out of it, then my suggestion is: implement this type of optional and show both a benchmark where it makes a significant difference and tell us the use case where this performance gain can be leveraged.
> >
> > My 2 c.
> >
> > P.S. you may already "pay" for a branch in the case of std::string and the like due to SSO, I'm sure the compiler can do something smart with the extra branch in optional for both move and destruction.
> > D.S
> >
> > // Robin
> >
> > On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals <[std-proposals_at_[hidden]](mailto:std-proposals_at_[hidden])> wrote:
> >
> > On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> >
> > >
> > > > Why wouldn't it work with fundamental types?
> > > > I explicitly marked them as supporting the protocol. Maybe I did a
> > > > bad job of explaining myself.
> > >
> > >
> > > Because every possible value of type int is not an empty
> > > std::optional
> > > when you put it into std::optional already. You cannot define int(-1)
> > > to be an empty std::optional after the fact, so you're stuck with
> > > effectively storing an extra bool.
> > >
> > > This behavior can obviously not be changed at this point.
> >
> >
> >
> > But that is not my proposal. std::optional keeps its bool flag. The
> > difference is in how the special member functions are generated:
> >
> >
> > std::optional<T>::~optional() {
> > if constexpr (uses the dummy protocol) {
> > _value->~T();
> > } else {
> > if (_engaged) {
> > _value->~T();
> > }
> > }
> > }
> >
> > We avoided a conditional. For fundamental types, we didn't win
> > anything, but (say) the move constructor becomes simpler:
> >
> >
> > std::optional<T>::optional(optional&& o) {
> > // omitting exception safety
> > if constexpr (uses the dummy protocol) {
> > _engaged = std::exchange(o._engaged, false);
> > _value = std::move(o._value);
> > } else {
> > _engaged = std::exchange(o._engaged, false);
> > if (_engaged) {
> > // Can be replaced with std::relocate_at()
> > std::construct_at(&_value, std::move(o._value));
> > std::destroy_at(&o._value);
> > }
> > }
> > }
> >
> > For fundamental types, and for trivially relocatable types, moving an
> > optional becomes just data movement with no conditionals.
>
>
> Fundamental types don't gain any performance from this whatsoever
> because they don't have a conditional destructor call at all. `int`
> doesn't really have a destructor, so the compiler should emit zero
> code for `if(engaged) val.~int();` Indeed, for any type whose
> destructor clearly does nothing, there should not be any performance
> to be gained.
>
Look at the quoted code. Yes, std::destroy_at() compiles to nothing, but std::construct_at() compiles to a memory copy, and it is under an if statement. The compiler by itself cannot elide the conditional under most circumstances.
Of course, the library could implement this optimization for its own types without a new dummy value protocol, that's only needed for the user to improve their own types.
It's similar (again) to trivial relocation - the library could optimize for trivial relocation of its own types, but needs some protocol to know it's safe for user types.
> In any case, given that the performance difference is not especially
> huge even in a tight loop (the best possible case for seeing a
> performance gain), I'd say that this is too specialized and fiddly to
> go into the standard. If business users need this performance in some
> code, they can use a type specially designed for it. And indeed, such
> a type could actively forbid the use of types that aren't able to do
> this in order to avoid misuse.
This isn't worth a unique type by itself (which would also break templates that match std::optional<T>).
Just like we didn't add a new std::vector for trivial relocation, we (or the user) shouldn't add a new std::optional. And this isn't limited to std::optional! I thought of it while improving an interval class.
> On Sat, Jun 7, 2025 at 5:15 PM Avi Kivity via Std-Proposals
> <[std-proposals_at_[hidden]](mailto:std-proposals_at_[hidden])> wrote:
>
> >
> > Of course I could implement it in my own optional (and other classes - I gave optional as an example but it works for other use cases as well), but then it wouldn't work in the trillions of other optionals in my code (that I can't change) and others' code.
> >
> >
> > I wrote a demonstration [1]. It is written to penalize the standard implementation by having the worst mix of optional and non-optional (but is not cheating in any way). If your machine has a better branch predictor than mine, you may need to increase the vector size (10,000).
> >
> >
> > std::optional run:
> >
> > 4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
> >
> > Optional assuming std::string's default constructor is a dummy value:
> >
> > 3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
> >
> >
> > Note it executed more instructions, but instructions are cheap compared to branches, so the increased IPC compensated.
> >
> > This change won't double your program's performance. It's a small change intended to shave a few branches from the binary. I expect anyone working with business code will agree that branches are the bane of performance.
> >
> >
> > Raw data: first std, then mine.
> >
> > Performance counter stats for './optional-dummy':
> >
> > 42,368.58 msec task-clock # 1.000 CPUs utilized
> > 477 context-switches # 11.258 /sec
> > 49 cpu-migrations # 1.157 /sec
> > 423 page-faults # 9.984 /sec
> > 314,441,962,886 instructions # 1.56 insn per cycle
> > # 0.11 stalled cycles per insn
> > 202,073,359,408 cycles # 4.769 GHz
> > 35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
> > 94,770,734,694 branches # 2.237 G/sec
> > 4,347,401,750 branch-misses # 4.59% of all branches
> >
> > 42.377120812 seconds time elapsed
> >
> > 42.175060000 seconds user
> > 0.002988000 seconds sys
> >
> >
> > Performance counter stats for './optional-dummy a':
> >
> > 32,803.96 msec task-clock # 1.000 CPUs utilized
> > 284 context-switches # 8.657 /sec
> > 37 cpu-migrations # 1.128 /sec
> > 419 page-faults # 12.773 /sec
> > 360,082,292,647 instructions # 2.32 insn per cycle
> > # 0.05 stalled cycles per insn
> > 155,116,568,268 cycles # 4.729 GHz
> > 16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
> > 89,856,063,602 branches # 2.739 G/sec
> > 1,865,582,453 branch-misses # 2.08% of all branches
> >
> > 32.809259346 seconds time elapsed
> >
> > 32.659071000 seconds user
> > 0.001994000 seconds sys
> >
> >
> > PS. I'm proposing to change std::optional (so it won't be harder to use).
> >
> >
> > [1] [https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870](https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870)
> >
> > On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-Proposals wrote:
> >
> > You can easily implement your own version of optional that does exactly that (it would even be a simpler class to implement, although not to use). The question then is why? If you somehow think that you would gain performance out of it, then my suggestion is: implement this type of optional and show both a benchmark where it makes a significant difference and tell us the use case where this performance gain can be leveraged.
> >
> > My 2 c.
> >
> > P.S. you may already "pay" for a branch in the case of std::string and the like due to SSO, I'm sure the compiler can do something smart with the extra branch in optional for both move and destruction.
> > D.S
> >
> > // Robin
> >
> > On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals <[std-proposals_at_[hidden]](mailto:std-proposals_at_[hidden])> wrote:
> >
> > On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> >
> > >
> > > > Why wouldn't it work with fundamental types?
> > > > I explicitly marked them as supporting the protocol. Maybe I did a
> > > > bad job of explaining myself.
> > >
> > >
> > > Because every possible value of type int is not an empty
> > > std::optional
> > > when you put it into std::optional already. You cannot define int(-1)
> > > to be an empty std::optional after the fact, so you're stuck with
> > > effectively storing an extra bool.
> > >
> > > This behavior can obviously not be changed at this point.
> >
> >
> >
> > But that is not my proposal. std::optional keeps its bool flag. The
> > difference is in how the special member functions are generated:
> >
> >
> > std::optional<T>::~optional() {
> > if constexpr (uses the dummy protocol) {
> > _value->~T();
> > } else {
> > if (_engaged) {
> > _value->~T();
> > }
> > }
> > }
> >
> > We avoided a conditional. For fundamental types, we didn't win
> > anything, but (say) the move constructor becomes simpler:
> >
> >
> > std::optional<T>::optional(optional&& o) {
> > // omitting exception safety
> > if constexpr (uses the dummy protocol) {
> > _engaged = std::exchange(o._engaged, false);
> > _value = std::move(o._value);
> > } else {
> > _engaged = std::exchange(o._engaged, false);
> > if (_engaged) {
> > // Can be replaced with std::relocate_at()
> > std::construct_at(&_value, std::move(o._value));
> > std::destroy_at(&o._value);
> > }
> > }
> > }
> >
> > For fundamental types, and for trivially relocatable types, moving an
> > optional becomes just data movement with no conditionals.
>
>
> Fundamental types don't gain any performance from this whatsoever
> because they don't have a conditional destructor call at all. `int`
> doesn't really have a destructor, so the compiler should emit zero
> code for `if(engaged) val.~int();` Indeed, for any type whose
> destructor clearly does nothing, there should not be any performance
> to be gained.
>
Look at the quoted code. Yes, std::destroy_at() compiles to nothing, but std::construct_at() compiles to a memory copy, and it is under an if statement. The compiler by itself cannot elide the conditional under most circumstances.
Of course, the library could implement this optimization for its own types without a new dummy value protocol, that's only needed for the user to improve their own types.
It's similar (again) to trivial relocation - the library could optimize for trivial relocation of its own types, but needs some protocol to know it's safe for user types.
> In any case, given that the performance difference is not especially
> huge even in a tight loop (the best possible case for seeing a
> performance gain), I'd say that this is too specialized and fiddly to
> go into the standard. If business users need this performance in some
> code, they can use a type specially designed for it. And indeed, such
> a type could actively forbid the use of types that aren't able to do
> this in order to avoid misuse.
This isn't worth a unique type by itself (which would also break templates that match std::optional<T>).
Just like we didn't add a new std::vector for trivial relocation, we (or the user) shouldn't add a new std::optional. And this isn't limited to std::optional! I thought of it while improving an interval class.
Received on 2025-06-08 08:53:38