Date: Sun, 08 Jun 2025 00:15:03 +0300
Of course I could implement it in my own optional (and other classes -
I gave optional as an example but it works for other use cases as
well), but then it wouldn't work in the trillions of other optionals in
my code (that I can't change) and others' code.
I wrote a demonstration [1]. It is written to penalize the standard
implementation by having the worst mix of optional and non-optional
(but is not cheating in any way). If your machine has a better branch
predictor than mine, you may need to increase the vector size (10,000).
std::optional run:
4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
Optional assuming std::string's default constructor is a dummy value:
3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
Note it executed more instructions, but instructions are cheap compared
to branches, so the increased IPC compensated.
This change won't double your program's performance. It's a small
change intended to shave a few branches from the binary. I expect
anyone working with business code will agree that branches are the bane
of performance.
Raw data: first std, then mine.
Performance counter stats for './optional-dummy':
42,368.58 msec task-clock # 1.000 CPUs utilized
477 context-switches # 11.258 /sec
49 cpu-migrations # 1.157 /sec
423 page-faults # 9.984 /sec
314,441,962,886 instructions # 1.56 insn per cycle
# 0.11 stalled cycles per insn
202,073,359,408 cycles # 4.769 GHz
35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
94,770,734,694 branches # 2.237 G/sec
4,347,401,750 branch-misses # 4.59% of all branches
42.377120812 seconds time elapsed
42.175060000 seconds user
0.002988000 seconds sys
Performance counter stats for './optional-dummy a':
32,803.96 msec task-clock # 1.000 CPUs utilized
284 context-switches # 8.657 /sec
37 cpu-migrations # 1.128 /sec
419 page-faults # 12.773 /sec
360,082,292,647 instructions # 2.32 insn per cycle
# 0.05 stalled cycles per insn
155,116,568,268 cycles # 4.729 GHz
16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
89,856,063,602 branches # 2.739 G/sec
1,865,582,453 branch-misses # 2.08% of all branches
32.809259346 seconds time elapsed
32.659071000 seconds user
0.001994000 seconds sys
PS. I'm proposing to change std::optional (so it won't be harder to
use).
[1] https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870
On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-
Proposals wrote:
> You can easily implement your own version of optional that does
> exactly that (it would even be a simpler class to implement, although
> not to use). The question then is why? If you somehow think that you
> would gain performance out of it, then my suggestion is: implement
> this type of optional and show both a benchmark where it makes a
> significant difference and tell us the use case where this
> performance gain can be leveraged.
>
> My 2 c.
>
> P.S. you may already "pay" for a branch in the case of std::string
> and the like due to SSO, I'm sure the compiler can do something smart
> with the extra branch in optional for both move and destruction.
> D.S
>
> // Robin
>
> On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> > On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> > > > Why wouldn't it work with fundamental types?
> > > > I explicitly marked them as supporting the protocol. Maybe I
> > did a
> > > > bad job of explaining myself.
> > >
> > > Because every possible value of type int is not an empty
> > > std::optional
> > > when you put it into std::optional already. You cannot define
> > int(-1)
> > > to be an empty std::optional after the fact, so you're stuck with
> > > effectively storing an extra bool.
> > >
> > > This behavior can obviously not be changed at this point.
> >
> >
> > But that is not my proposal. std::optional keeps its bool flag. The
> > difference is in how the special member functions are generated:
> >
> >
> > std::optional<T>::~optional() {
> > if constexpr (uses the dummy protocol) {
> > _value->~T();
> > } else {
> > if (_engaged) {
> > _value->~T();
> > }
> > }
> > }
> >
> > We avoided a conditional. For fundamental types, we didn't win
> > anything, but (say) the move constructor becomes simpler:
> >
> >
> > std::optional<T>::optional(optional&& o) {
> > // omitting exception safety
> > if constexpr (uses the dummy protocol) {
> > _engaged = std::exchange(o._engaged, false);
> > _value = std::move(o._value);
> > } else {
> > _engaged = std::exchange(o._engaged, false);
> > if (_engaged) {
> > // Can be replaced with std::relocate_at()
> > std::construct_at(&_value, std::move(o._value));
> > std::destroy_at(&o._value);
> > }
> > }
> > }
> >
> > For fundamental types, and for trivially relocatable types, moving
> > an
> > optional becomes just data movement with no conditionals.
> >
> >
> > > On Sat, 7 Jun 2025 at 19:12, Avi Kivity <avi_at_[hidden]> wrote:
> > > >
> > > > Why wouldn't it work with fundamental types? I explicitly
> > marked
> > > > them as supporting the protocol. Maybe I did a bad job of
> > > > explaining myself.
> > > >
> > > > And again some raises std::expected, it's not related. This is
> > for
> > > > every container that can conditionally carry a value,
> > std::optional
> > > > is the best example but I encountered the problem while working
> > on
> > > > a different class (an interval type).
> > > >
> > > > On Sat, 2025-06-07 at 13:24 +0200, Jan Schultke wrote:
> > > >
> > > > I don't think it's worth the effort if it cannot work for
> > > > fundamental
> > > > types, and the ABI for std::optional is set in stone. Ideally,
> > we
> > > > would be able to say that std::optional<X*> where nullptr
> > stores
> > > > the
> > > > empty value, or std::optional<int> where -1 represents an empty
> > > > value,
> > > > etc.
> > > >
> > > > Those things seem feasible with std::expected with a bit of
> > library
> > > > support. We could have something like
> > > >
> > > > std::expected<int, std::nonvalue<-1>>
> > > >
> > > >
> > > > Which would use the value "-1" to represent an unexpected
> > object.
> > > > This
> > > > would also not break any existing code since it would use novel
> > > > types
> > > > and a novel protocol, even in the case of fundamental types.
> > > >
> > > >
I gave optional as an example but it works for other use cases as
well), but then it wouldn't work in the trillions of other optionals in
my code (that I can't change) and others' code.
I wrote a demonstration [1]. It is written to penalize the standard
implementation by having the worst mix of optional and non-optional
(but is not cheating in any way). If your machine has a better branch
predictor than mine, you may need to increase the vector size (10,000).
std::optional run:
4.2ns/iteration, 4.6% branch misses, 17% cycles idle, 1.56 IPC.
Optional assuming std::string's default constructor is a dummy value:
3.3ns/iteration, 2.1% branch misses, 11% cycles idle, 2.32 IPC.
Note it executed more instructions, but instructions are cheap compared
to branches, so the increased IPC compensated.
This change won't double your program's performance. It's a small
change intended to shave a few branches from the binary. I expect
anyone working with business code will agree that branches are the bane
of performance.
Raw data: first std, then mine.
Performance counter stats for './optional-dummy':
42,368.58 msec task-clock # 1.000 CPUs utilized
477 context-switches # 11.258 /sec
49 cpu-migrations # 1.157 /sec
423 page-faults # 9.984 /sec
314,441,962,886 instructions # 1.56 insn per cycle
# 0.11 stalled cycles per insn
202,073,359,408 cycles # 4.769 GHz
35,633,149,897 stalled-cycles-frontend # 17.63% frontend cycles idle
94,770,734,694 branches # 2.237 G/sec
4,347,401,750 branch-misses # 4.59% of all branches
42.377120812 seconds time elapsed
42.175060000 seconds user
0.002988000 seconds sys
Performance counter stats for './optional-dummy a':
32,803.96 msec task-clock # 1.000 CPUs utilized
284 context-switches # 8.657 /sec
37 cpu-migrations # 1.128 /sec
419 page-faults # 12.773 /sec
360,082,292,647 instructions # 2.32 insn per cycle
# 0.05 stalled cycles per insn
155,116,568,268 cycles # 4.729 GHz
16,296,909,094 stalled-cycles-frontend # 10.51% frontend cycles idle
89,856,063,602 branches # 2.739 G/sec
1,865,582,453 branch-misses # 2.08% of all branches
32.809259346 seconds time elapsed
32.659071000 seconds user
0.001994000 seconds sys
PS. I'm proposing to change std::optional (so it won't be harder to
use).
[1] https://gist.github.com/avikivity/b22aba5684edd09a5842df18b1954870
On Sat, 2025-06-07 at 20:44 +0200, Robin Savonen Söderholm via Std-
Proposals wrote:
> You can easily implement your own version of optional that does
> exactly that (it would even be a simpler class to implement, although
> not to use). The question then is why? If you somehow think that you
> would gain performance out of it, then my suggestion is: implement
> this type of optional and show both a benchmark where it makes a
> significant difference and tell us the use case where this
> performance gain can be leveraged.
>
> My 2 c.
>
> P.S. you may already "pay" for a branch in the case of std::string
> and the like due to SSO, I'm sure the compiler can do something smart
> with the extra branch in optional for both move and destruction.
> D.S
>
> // Robin
>
> On Sat, Jun 7, 2025, 19:48 Avi Kivity via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> > On Sat, 2025-06-07 at 19:15 +0200, Jan Schultke wrote:
> > > > Why wouldn't it work with fundamental types?
> > > > I explicitly marked them as supporting the protocol. Maybe I
> > did a
> > > > bad job of explaining myself.
> > >
> > > Because every possible value of type int is not an empty
> > > std::optional
> > > when you put it into std::optional already. You cannot define
> > int(-1)
> > > to be an empty std::optional after the fact, so you're stuck with
> > > effectively storing an extra bool.
> > >
> > > This behavior can obviously not be changed at this point.
> >
> >
> > But that is not my proposal. std::optional keeps its bool flag. The
> > difference is in how the special member functions are generated:
> >
> >
> > std::optional<T>::~optional() {
> > if constexpr (uses the dummy protocol) {
> > _value->~T();
> > } else {
> > if (_engaged) {
> > _value->~T();
> > }
> > }
> > }
> >
> > We avoided a conditional. For fundamental types, we didn't win
> > anything, but (say) the move constructor becomes simpler:
> >
> >
> > std::optional<T>::optional(optional&& o) {
> > // omitting exception safety
> > if constexpr (uses the dummy protocol) {
> > _engaged = std::exchange(o._engaged, false);
> > _value = std::move(o._value);
> > } else {
> > _engaged = std::exchange(o._engaged, false);
> > if (_engaged) {
> > // Can be replaced with std::relocate_at()
> > std::construct_at(&_value, std::move(o._value));
> > std::destroy_at(&o._value);
> > }
> > }
> > }
> >
> > For fundamental types, and for trivially relocatable types, moving
> > an
> > optional becomes just data movement with no conditionals.
> >
> >
> > > On Sat, 7 Jun 2025 at 19:12, Avi Kivity <avi_at_[hidden]> wrote:
> > > >
> > > > Why wouldn't it work with fundamental types? I explicitly
> > marked
> > > > them as supporting the protocol. Maybe I did a bad job of
> > > > explaining myself.
> > > >
> > > > And again some raises std::expected, it's not related. This is
> > for
> > > > every container that can conditionally carry a value,
> > std::optional
> > > > is the best example but I encountered the problem while working
> > on
> > > > a different class (an interval type).
> > > >
> > > > On Sat, 2025-06-07 at 13:24 +0200, Jan Schultke wrote:
> > > >
> > > > I don't think it's worth the effort if it cannot work for
> > > > fundamental
> > > > types, and the ABI for std::optional is set in stone. Ideally,
> > we
> > > > would be able to say that std::optional<X*> where nullptr
> > stores
> > > > the
> > > > empty value, or std::optional<int> where -1 represents an empty
> > > > value,
> > > > etc.
> > > >
> > > > Those things seem feasible with std::expected with a bit of
> > library
> > > > support. We could have something like
> > > >
> > > > std::expected<int, std::nonvalue<-1>>
> > > >
> > > >
> > > > Which would use the value "-1" to represent an unexpected
> > object.
> > > > This
> > > > would also not break any existing code since it would use novel
> > > > types
> > > > and a novel protocol, even in the case of fundamental types.
> > > >
> > > >
Received on 2025-06-07 21:15:08