C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Extension to runtime polymorphism proposed

From: Simon Schröder <dr.simon.schroeder_at_[hidden]>
Date: Sat, 4 Apr 2026 19:20:34 +0200
I feel like you are misunderstanding how compilers work (I am also not a professional in this field). From my understanding modern compilers don’t do much optimization of the AST (though this is different for JIT compilers). LLVM has several different frontends for different languages, but the backend is the same. In a first step the programming language is translated to IR (intermediate representation). This is much like an architecture independent assembly language. (Almost?) all optimizations are done on this IR. Some optimizers will look for patterns that stem from some way a certain programming language is written. BTW, if a totally new paradigm is introduced that no optimizer recognizes it will either be slow or take away time to improve some existing optimizer, e.g. the optimizer for std::visit. There is not magically more programmers for a compiler if you introduce a new feature.

You are also mixing some concepts: virtual member functions are for runtime polymorphism. I’m not sure if you are aware that compilers can optimize the vtables out if it can figure out the type of an object at compile time. std::variant with std::visit is compile time polymorphism. virtual member functions don’t matter in this context at all. And std::move also works with std::variant. As with many things in life you need to weigh the pros and cons of a certain solution. Yes, currently we don’t have solution that has all the advantages of both runtime and compile time polymorphism. But, in 99% of the cases this is fine.

You also mentioned expression templates. You are correct that they don’t work with runtime polymorphism. However, I don’t see why they wouldn’t work with std::variant (and I also don’t know any example where we would need polymorphism with expression templates at all). Unless you mean something different with expression templates than I do (I know them from Blitz++ and Eigen) it is still not something normal. I have yet to encounter them outside of linear algebra libraries. I would claim few C++ programmers know about this technique or are even able to implement something like this. Just as I said: Not normal! (And certainly not necessary in most contexts.)

I noticed that now we are restricting your proposal to heterogeneous lists known at compile time (basically something like std::tuple). I need to make clear that the examples I have given for motivation (JSON, XML, SQL queries) are for heterogeneous lists created at runtime! I didn’t find anything comparable to your problem in the implementation of your Turing virtual machine when searching for heterogeneous lists. This gives the ball back to you to motivate why we actually need the proposed feature. It needs to be useful in more than just one single place to justify standardizing it.

One thing should also be clear for standardization: syntactic sugar, especially if you don’t like the look of existing solutions, is not a sole reason for standardization. Neither is verbosity; C++ programmers are used to not having the nicest syntax around (the main reason being backwards compatibility).

The only acceptable reason I see so far is that std::visit is (supposedly) not well enough optimized in one out of three major compilers. This does not prove that the problem is inherent to std::visit and cannot be optimized. It also does not matter that you don’t like how switch statements look. I’m sure that if instead of indexes you provide an enum class compilers will use an indexed approach if they figure out it is the optimal solution (I’m quite certain they have this optimization potential built in). template for has already been mentioned as a way to reduce the necessary boiler plate code you need to write. And contrary to what you claimed, reflection might make this even easier. In the long run we will get something like queue injection. Currently, you most likely could use reflection to generate source code in a separate file (see Herb Sutter’s last talk about C++26’s reflection features).

I do know the narrative that providing more context to the compiler should make it easier to optimize. It is said that this why we write in high level languages. From what little I do understand about compilers this statement is quite theoretical and academic; it is not how real compilers work. You should show how we write code today and how it could be improved with a new syntax. And it should be in real world situations and not just some tiny toy examples (which you already provided). So far, there is just one real world use case of your Turing virtual machine. We need more than that to understand that this feature is useful in general.

On Apr 4, 2026, at 5:53 PM, Muneem via Std-Proposals <std-proposals_at_[hidden]> wrote:


Hi!
I couldnt sleep because when I close my eyes, tears come which makes pain just heat up, so I spent a few hours thinking and combining an answer to all your welcomed questions and feedback in one email:
(The answer to Sebastian and Thiago is in the end)
I will try to first summarize my construct in it's latest form and the issue it solves before giving answers to individual emails and naming everyone who was kind enough to send them:
****Short form****
What is the problem:
The probably with classic polymorphism used as the implementation, ie:
Std::vector<base*> 
Is that it:
1.Does not provide a new expression value type that can have its own unique semantics on a language level.
2. Does not let the user move the polymorphic objects (constructors cant be virtual).
3. Too verbose
4. You can't use expression templates (literally a feature considered normal at even books written in 2013).
5. Does not give a free hand to the compiler by providing the intent that I am trying to index, which might make the abstract tree look too abstract(hiding intent) to be optimized.
Many more points but I don't want to make long (plus my left eye hurts so I want to write this fast).

Philosophically: the problem is that the solution is too incomplete, and tries to be too general.

Solution:
My construct that provides:
1.A new expression value type of the set
2. A new expression value type of each element in the set:
*The second one would introduce the special features*:
1.Just like rvalues introducing move semantics, this new value type can introduce user defined optimizations, and it would introduce itself into the abstract tree making it easy to optimize the abstract tree. 
2.the compiler has to produce the code it thinks is the best based on the usage of the expression of this value type. Variables can be of this type, infact we can have a whole class of optimization just like move semantics that revolve around this new expression type.


My full detailed answers to Simon and Mr Marcin:
Thank very very much for your feedback!!!!
(All answers are ordered by number, so number if you numbered your point "1)" or "1", I would number it "1"), and If there is no numbering then I don't number myself.
> I believe the picture is now getting clearer (Muneem, please correct me if I am misrepresenting your ideas). At compile time the type of x is not known. This is one of the reasons why Muneem suggested to use a JIT: just include the source code/AST/intermediate representation in the executable and finish compilation (using the JIT) when running the program. I see several problems with this: 1) C++ programmers love their performance. Waiting for JIT compilation is going to be slow. And every time we hit this place we might have to look up if the function has already been compiled for this type (which is a branch, of course, that we wanted to avoid). 2) If I want to call x.capacity() the member function needs to be linked in the executable. Linkers currently will only link the functions that are used by the program and eliminate all other functions. The reason is that we want to reduce the size of executables. 3) This is even exacerbated if we are using template types like std::vector: Its member functions are not available in any library that could be linked, but templates are instantiated when used. This means that we can’t even pre-generate all possible instantiations of std::vector::capacity (which through recursive use of std::vector are infinite) to be linked into the executable. At least with this approach it is highly questionable if the advantages of the proposal weigh heavier than the disadvantages I have mentioned. Most people most likely prefer smaller binaries and shorter compile time (less unnecessary template instantiations). It also is in contrast to the zero overhead principle: in order for this to work we would have to link everything every time even if we are not using this feature.
> Now, I’m taking a turn to heterogenous lists: I didn’t know what these were and I’m not sure I understand them well enough to make a fully qualified judgement.
***********Answer***********
1. My original proposal (the one with JIT) was very general, and meant to be narrowed down through feedback and criticisms.
2. The same can be said about std::variant:
The compiler would instantiate capacity for all containers, but the ****difference in my case****:
3.I am introducing two new expression types to optimize this, one is obviously for the set and the other is the element type of the set. 
****Why is this important****:
For every indexing point, the compiler can make decisions on whether to use vtables, whether to instantiate, which might sound the same as std::visit but for this case:
As described before in the short answer that the expression value type will help allow users to provide optimizations, like rvalues do using move semantics.
It will also decide whether it can do extra optimizations (it has in the abstract tree due to the new expression value types) without impractical compile time overhead at an intermediate representation, which isn't the case with std::visit which either can be traced impractically or just left out.

***********Answer***********
>
> First, for the problems they are trying to solve: From what I can find they could be helpful when parsing JSON or XML or when returning results from SQL queries. However, if I’m not mistaken, all these formats have a known number of possible types. In general, in C++ we could use either std::variant or subtyping polymorphism to restrict the number of possible types in the heterogeneous list.
***********Answer***********
 Let me try to clarify: 
Std::visit is proven to be slow in my discussion with mr.thiago, who said that we should wait for compiler backends to become faster, which is irrelevant because c++ is supposed to have constructs that make it fast on an intermediate representation level. It can't even be optimized on an abstract tree level and/or dosent provide std::move(variant_obj)
***********Answer***********
>
> Muneem mentioned the programming language Go as an example. Go does not have subtyping. Instead, you can define interfaces. Traditionally, heterogeneous arrays have been defined as type []interface{} which is an array of an empty interface (disclaimer: I have never written a single line of Go). The empty interface somehow allows to store any type. The modern version of this seems to be to use an array of the ‘any’ type. This would perfectly map to std::vector<std::any> in C++. The example that I have seen is that it is still possible to at least println() every single entry inside an []interface{}. So, there needs to be some sort of runtime polymorphism. This again has the aforementioned problem of instantiating all templates and linking all functions.
***********Answer***********
Yes, std::visit has the problem(described the last sentence of your paragraph) of instigating all templates, as shown in answer to your paragraph 2. GO was honestly a very bad example, but I used it because I learned GO a few years ago(when I was 14) and the experience I had was good(easy to learn), so I didn't think twice before using it, but I stopped using it because it isn't enough to get things done (only c++ is).

***********Answer***********

> Let’s think about how we could achieve runtime polymorphism for std::any (I don’t know how Go is actually doing it): First version that comes to mind would be to use runtime reflection and reflect on the string of the function name (e.g. “capacity”). This gets more complicated if function overloading is involved. (BTW, from what I have read about heterogeneous lists in C# it is convenient to use them, but slow!) Still, this does not achieve the goal of Muneem for this solution to be fast. We would need string comparisons and branching. I do believe that a decently fast solution would be possible as the Objective C runtime from Apple has shown (they use some clever tricks with caching, etc.). Still, it would add a performance overhead.
***********Answer**********
C# is a bad example because c# doesn't support move semantics (among other things) and instead relies on garbage collection. The problem isn't there in c# because it is inherintly slower due to the lack of these features (rvalues, copy elision, and probably countless many more).
***********Answer***********
>
> Muneem also mentioned something about indexing. What I could imagine is a function pointer for each entry in the heterogenous list. We could easily achieve something like this with std::vector<std::pair<std::any,std::function>> (or directly a function pointer instead of std::function). Then we could use the function pointer through the same index as the object. This gets more complicated if we want to have more than just a single function pointer. And we would need make it easier to create these kind of heterogeneous lists without explicitly mentioning which functions we want to have function pointers to. Also, if you have objects of the same type inside the heterogeneous list we might be able to reuse a set of function pointers and thus optimize storage. This sounds a lot like vtables. So, with this approach we have reinvented regular C++ runtime polymorphism.
***********Answer***********
But this technique also has fixed the problem of providing the optimizations that c++ already has (move semantics, copy elision) to "regular polymorphism".
***********Answer***********
> As I have demonstrated before, std::any has problems with template instantiations and linking. An efficient solution (both in space and time) would need to know the exact types possible in the heterogeneous list. This kind of brings us back to std::variant. I would say that the current solution of std::visit over Muneem’s proposal is that I don’t have to figure out/know the index of the function I want to call on an object. std::visit is doing the job for me depending on the type. We could add some syntactic sugar and allow operator-> on std::variant to basically do the same job as std::visit with an automatically templated lambda (or even better with operator. if we get that one into the standard). The compiler might actually create a list of function pointers (even now with std::visit) that can be indexed based on the type. I don’t know if function pointers are better in any way for std::visit than the solution we have right now.
***********Answer***********
The compilers can do optimizations on abstract tree level, but without significant overhead, it can't do those on std::visit. At least not to counter the fact that move semantics are missing(constructs can't be virtual).
***********Answer***********
>
> One thing I observe in Muneem’s examples is that the heterogeneous list is known at compile time. And the difference is that std::visit does polymorphism on function calls and what Muneem seems to envision is polymorphism on the object (type), i.e. when accessing an object from a heterogeneous list I don’t get a polymorphic object, but the object of its exact type. The biggest hurdle in this is how to get a runtime index into a compile time template parameter (preferably automatically): std::get<i>() on a std::tuple only works for compile time known values of i (some examples might work with the new ‘template for’ from C++26 reflection when iterating over all entries of the heterogeneous list if using a std::tuple). I’m currently stuck at this point (especially with Muneem’s actual problem with his Turing virtual machine).
> Muneem, one last thing: I’m not sure if heterogenous lists are currently fast in any language. If you know of a truly efficient implementation worthy of C++ in any language, let us know. And with the way you are currently describing it I’m not sure if it works with a static type system. We would need to map runtime known indexes to compile time known types. This is practically impossible.
***********Answer***********
I don't think any language relies on expression types for 90% of their optimizations, like we guys relies on rvalue references so much that I can't believe a life without it, yet we are hesitinant to add a new expression value type.
***********Answer***********
> --

But isn't this classic XY problem?
If `std::variant` vs `switch` matter, why do any check in this place of code?
Move the check a couple of layers higher to avoid polluting tight loops with it.

Like:
```
void foo(int i)
{
    switch(i) { ... }
}
void bar(int i)
{
    for (auto& x: data) for(i);
}
```

change into:

```
template<int i>
void foo()
{
    //code based on i, like `std::get<i>`
}
void bar(int i)
{
     switch (i){
          case 0:
               for (auto& x: data)
               {
                    foo<0>();
               }
               break;
          case 1:
               for (auto& x: data)
               {
                    foo<1>();
               }
               break;
          //etc.
    }
}
```

Now the compiler has MORE info and an easier time to optimize each loop.

Only only problem I have is `switch (i){` is tidus to write, but it
could be easy fixed if `case:` can be expanded in `template for`:

```
template<auto Indexes>
void bar(int i)
{
     switch (i){
          template for (constexpr int Index : Indexes)
          {
               case Index: for (auto& x: data) foo<Index>();
               break;
          }
    }
}
```

This will allow easy transition from runtime value `i` into template
value `Index`.
It happens often like "how to deserialize `std::variant`?"
***********Answer***********
The technique you showed would almost always backfire without context that only abstract tree can provide. To give that abstract tree context, the compiler needs to see the whole abstract tree to make the best possible decision, and for that, it need a value type representing this index.
***********Answer***********

>Please learn how to properly reply to emails instead of duplicating it
two times as it is impossible to properly answer them on a mailing
list.
***********Answer***********
I am really really sorry
Why I do it by mistake:
I am just a 17 year old guy who is trying to make his code shorter and make fun of his brother who uses AI to write code by saying:
I use metaprogramming instead (proceeds to say "code that writes" code to his brother).
When I get bombarded with fun questions, I get too excited, so I try to respond fast.

***********Answer***********
>I did not use std::variant or std::visit in my “standard approach”. Please look again and identify your issue(s).
***********Answer***********
I am really really sorry, I updated my email to clearly explain the problems
***********Answer***********
>It is a heterogeneous collection, so it is more like std::tuple.
***********Answer***********
Yes and no. It's a no because it introduces new value types on a abstract tree level
***********Answer***********

>I think others on this list have understood your idea faster than me, and already gave feedback, when I still tried (am trying) to understand, what is was about.

 

 

After the problem statement, to understand your idea for speedup. You talked about cashlines and less indirect accesses and therefore rejected a vtable based approach.

 

Now I am trying to understand your low-level solution:

 

 

If in your vector (of containers or other types) we store the following

 

[1]

pointer to Type1.op1()

pointer to Type1.op2()

instance of Type1 (directly stored in full length, not as pointer)

[2]

pointer to Type2.op1()

pointer to Type2.op2()

instance of Type2 (directly stored in full length, not as pointer)

[3]

pointer to Type3.op1()

pointer to Type3.op2()

instance of Type3 (directly stored in full length, not as pointer)

 

 

So instead of the indirection over a vtable, you directly get the function pointers, but with the cost of the entries being longer, as with each entry all the used operations have to be stored for that type?

Type1.op1()

Type2.op1()

Type3.op1()

 

all would have the same call signature. So no switch or branch is needed.


 

 

Is that what you want under the hood?
***********Answer***********

Hi, Mr.Sebastian
Was hoping for you to join this wonderful discussion. What's under the hood would be determined by the semantics of this new value type that I am willing to discuss and debate.

***********Answer***********
void foo(int i)
> {
> switch(i) { ... }
> }
> void bar(int i)
> {
> for (auto& x: data) for(i);
> }
> ```
>
> change into:
>
> ```
> template<int i>
> void foo()
> {
> //code based on i, like `std::get<i>`
> }
> void bar(int i)
> {
> switch (i){
> case 0:
> for (auto& x: data)
> {
> foo<0>();
> }
> break;
> case 1:
> for (auto& x: data)
> {
> foo<1>();
> }
> break;
> //etc.
> }
> }
> ```

Compilers already do that on their own.

They may not be doing that all the time, though. So if you do need to enforce
this, implementing manually may be necessary.

Do we need a new language construct for this? Especially in the presence of
template for?

Please provide the proof that you've used to make the assertion that it is
hard to optimise at the intermediate representation level.

In addition, you've said this problem is widespread. I've asked before and
will ask again: please provide links to 10 different cases where it is visible
or was fixed/worked around in existing, open source codebases. You should have
no problem finding at least 3.

I'm asking this so you realise that you may be making unfounded claims. If you
want to argue that it must be so, prove it. Otherwise, please rephrase your
statements to indicate that the problem may not exist at all, except in your
one case.
***********Answer***********
Your example shows that I mean, compilers convert them into switch case statements on a assembly level, not on a abstract tree level. We need a new value type to change the value abstract trees work.

Try moving a polymorphic object using std::move, and you will see how widespread the problem is.
***********Answer***********




On Sat, 4 Apr 2026, 8:41 pm Thiago Macieira via Std-Proposals, <std-proposals_at_[hidden]> wrote:
On Friday, 3 April 2026 21:39:04 Pacific Daylight Time Muneem via Std-Proposals
wrote:
> The current techniques/(std::variant) is to verbose and too hard to
> optimize at a intermediate representation level.

Please provide the proof that you've used to make the assertion that it is
hard to optimise at the intermediate representation level.

In addition, you've said this problem is widespread. I've asked before and
will ask again: please provide links to 10 different cases where it is visible
or was fixed/worked around in existing, open source codebases. You should have
no problem finding at least 3.

I'm asking this so you realise that you may be making unfounded claims. If you
want to argue that it must be so, prove it. Otherwise, please rephrase your
statements to indicate that the problem may not exist at all, except in your
one case.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Data Center - Platform & Sys. Eng.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2026-04-04 17:20:50