C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Fwd: Extension to runtime polymorphism proposed

From: Muneem <itfllow123_at_[hidden]>
Date: Sat, 4 Apr 2026 04:02:54 +0500
>context for readers:second one was a class of problems fixed by
heterogeneous lists.
It's the second one, and I am really really sorry for the confusion.


On Sat, 4 Apr 2026, 3:58 am Steve Weinrich via Std-Proposals, <
std-proposals_at_[hidden]> wrote:

> If I may, a heterogeneous list is not a problem, it is a solution to a
> problem. So there are now two possibilities:
>
> 1. You are trying to create a new std container to solve a class of
> problems.
> 2. You have some other problem that you have used a heterogenous list
> to solve.
>
>
>
> What say you?
>
>
>
> Cheers,
> Steve
>
>
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On Behalf
> Of *Muneem via Std-Proposals
> *Sent:* Friday, April 3, 2026 4:54 PM
> *To:* std-proposals_at_[hidden]
> *Cc:* Muneem <itfllow123_at_[hidden]>
> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
> proposed
>
>
>
> Sorry, I meant "heterogeneous lists is the problem", so really really
> sorry for that mistake, It was an auto correct mistake.
>
>
>
> On Sat, 4 Apr 2026, 3:52 am Muneem, <itfllow123_at_[hidden]> wrote:
>
> Heterogeneous problems is the problem that leads to the code bloat or
> verbosity that I described, but it's hard to chose weather chicken came
> first or the egg in this regard.
>
>
>
> Regards, Muneem
>
>
>
> On Sat, 4 Apr 2026, 3:50 am Steve Weinrich via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
> Hi Muneem,
>
>
>
> If you don’t mind, I would like to limit this portion of our interaction
> to simply describing the problem. I want to get a 100% understanding of
> the problem, unfettered by any assumptions (including C++ limitations) or
> previous solutions. Is a heterogenous list actually the problem or is that
> simply a solution that you think fits the problem at hand?
>
>
>
> Cheers,
> Steve
>
>
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On Behalf
> Of *Muneem via Std-Proposals
> *Sent:* Friday, April 3, 2026 4:40 PM
> *To:* std-proposals_at_[hidden]
> *Cc:* Muneem <itfllow123_at_[hidden]>
> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
> proposed
>
>
>
> Hi!
>
> Thanks for your response!!!
>
>
>
> Yes, your understanding of the problem is mostly correct with a few
> details left out. The alternative to using classes would be switch
> statements that as discussed would lead to code bloat and might backfire.
> The goal is to make a construct that makes the indexing into heterogenous
> lists more easy. There is how ever one thing that I would like to clarify:
>
> The goal isn't the classic polymorphism but to be able to return an object
> of any type through a specified interface(indexing heterogenous lists),
> which your example dosent do. Basically, just like GO and many other
> compiled languages support heterogenous lists, I want c++ to do so as well.
> Say you want index a bunch of containers(to use top() function),std::visit
> fails because it can't return any return type that you would wish for, so
> you can either use switch case to write expression top for each container
> or this long verbose technique:
>
> #include<vector>
>
> #include<deque>
>
> template<typename T>
>
> struct Base{
>
> virtual T top_wrapper();
>
> };
>
>
>
> template<typename T>
>
> struct Derived_1: Base<T>, std::vector<T>{
>
> T top_wrapper() override{
>
> return T{*this.top()};//T{} is just to show that it works even
> if top had some other return type
>
> }
>
> };
>
> template<typename T>
>
> struct Derived_2: Base<T>, std:: deque<T>{
>
> T top_wrapper() override{
>
> return T{*this.top()};//T{} is just to show that it works even
> if top had some other return type
>
> }
>
> };
>
> int main(){
>
> std::vector<Base<int>> a;
>
> //The compiler would probably optimize this example, but not an example
> where you index the vector using real time input
>
> return 0;
>
> }
>
> //An vector of std::variant only works if I am willing to write a helper
> top(std::variant<Args...> obj) function that includes std::visit to call
> top(), that in of itself is not only verbose but obscures intent and
> context.
>
>
>
>
>
> Regards, Muneem.
>
>
>
> On Sat, 4 Apr 2026, 2:43 am Steve Weinrich via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
> Hi Muneem,
>
>
>
> I would like to make sure that I understand this problem before going on.
>
>
>
> I think there are several classes that have a portion (or all) of their
> interface in common. Each method of the interface returns a constant value:
>
>
>
> class First
>
> {
>
> public:
>
> int funcA () const { return 1123; }
>
> int funcB () const ( return 1234; }
>
> int funcC () const { return 1456; }
>
> };
>
>
>
> class Second
>
> {
>
> public:
>
> int funcA () const { return 2123; }
>
> int funcB () const ( return 2234; }
>
> int funcC () const { return 2456; }
>
> };
>
>
>
> class Third
>
> {
>
> public:
>
> int funcA () const { return 3123; }
>
> int funcB () const ( return 3234; }
>
> int funcC () const { return 3456; }
>
> };
>
>
>
> 1. We would like a means to be able to add more classes easily.
> 2. We would like a means to be able to add to the shared interface
> easily.
> 3. We would like to be able to use the shared interface in a
> polymorphic way (like a virtual method).
> 4. Performance is of the utmost importance.
>
>
>
> Is my understanding correct?
>
>
>
> Steve
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On Behalf
> Of *Muneem via Std-Proposals
> *Sent:* Friday, April 3, 2026 1:54 PM
> *To:* std-proposals_at_[hidden]
> *Cc:* Muneem <itfllow123_at_[hidden]>
> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
> proposed
>
>
>
> Sorry for sending two emails at once!
>
> I just wanted to revise the fact that the point of the whole proposal is
> to provide intent, the code that Mr. Maciera was kind enough to bring
> forward proves my exact point, that with enough intent, the compiler can
> optimize anythjng, and these optimizations grow larger as the scale of the
> program grows larger. Microbenchmarks might show a single example but even
> that single example should get us thinking that why is it so slow for this
> one example? Does this overhead force people to write switch case
> statements that can lead to code bloat which can again backfire in terms of
> performance?
>
> Regards, Muneem.
>
>
>
> On Sat, 4 Apr 2026, 12:48 am Muneem, <itfllow123_at_[hidden]> wrote:
>
> Hi!
>
> Thanks again for your feedback, Macieira. 👍
>
> >micro benchmark is misleading
>
> 1. The reason that I gave you microbenchmarks is that some asked for it,
> and even I was too relectunt to use them despite the quote of Bjarne
> Stroustrups
>
> "Don't assume, measure" because in this case, the goal is to either make
> the compiler smaller or runtime faster, both of which are targeted by my
> new proposal.
>
> 2. You are right that the compiler might have folded the loop into half,
> but the point is that it still shows that the observable behaviour is the
> same, infact, if the loop body was to index into a heterogeneous set(using
> the proposed construct) and do some operation then the compiler would
> optimize the indexing if the source of the index is one. This proves that
> intent. An help the compiler do wonders:
>
> 1.Fold loops even when I used volatile to avoid it.
>
> 2.Avoid the entire indexing operations (if in a loop with the most minimal
> compile time overhead)
>
> 3. Store the result immediately after it takes input into some memory
> location (if that solution is the fasted).
>
>
>
>
>
> 3.Optimize a single expression for the sake of the whole program.
>
> Currently, the optimizer might in fact be able to optimize checks in a
> loop, but it's not as easy or as gurrentied because there are no semantical
> promises that we can make with the existing constructs to make it happen.
>
> 4.My main point isn't weather my benchmark is correct or wrong, but rather
> that expressing intent is better. The bench mark was merely to show that
> std::visit is slower (according to g++ and Microsoft visual studio 2026
> compiled programs, using std::chorno and visual studio 2026 CPU usage
> measurement tools to prove my point), but even if some compiler or all
> compilers optimize their performance; we still have compile time overhead
> for taking std::visit and making it faster, and the optimization might
> backfire since it would be to optimize single statements independent of
> what's in the rest of the program. Why? Because unlike my proposed
> construct, std::visit does not have enough context and intent to tell the
> compiler what's going on so that it can generate code that has the exact
> "book keeping" data and access code that fits the entire program.
>
>
>
> 3. In case, someone's think a few nano seconds in a single example isn't a
> big deal, then rethink it because if my construct is passed then yes, it
> would not be a big deal because the compiler can optimize many indexing
> operations into a single heterogenous set and maybe cache the result
> afterwards somewhere. The issue is that this can't be done with the current
> techniques because of the lack of intent. Compilers are much smarter than
> we could ever be because they are work of many people's entire career, not
> just one very smart guy from Intel, so blaming/restricting compilers whose
> job is to be as general for the sake of the whole program.
>
> 4.>I suppose it decided to unroll the loop a >bit
>
> >and made two calls to sink() per loop:
>
> >template <typename T> void sink(const T >&) { asm volatile("" :::
> "memory"); }
>
> Even if it optimized switch case statement using volatile("" :::
> "memory"); but not std::visit
>
> That's my point isn't that switch case is magically faster, but rather the
> compiler has more room to cheat and skip things. Infact the standard allows
> it a lot of free room as long as the observable behaviour is the same, even
> more so by giving it free room with sets of observable behaviours
> (unspecified behaviours)
>
> 5. Microbe marking wasent to show that std::visit is inherintly slower,
> but rather the compiler can and should do mistakes in optimizing it, in
> order to avoid massive compile time overhead.
>
>
>
>
>
>
>
> On Fri, 3 Apr 2026, 8:33 pm Thiago Macieira via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
> On Thursday, 2 April 2026 19:15:42 Pacific Daylight Time Thiago Macieira
> via
> Std-Proposals wrote:
> > Even in this case, I have profiled the code above (after fixing it and
> > removing the std::cout itself) and found that overall, the switched case
> > ran 2x faster, at 0.113 ns per iteration, while the variant case required
> > 0.227 ns per iteration. Looking at the CPU performance counters, the
> > std::variant code has 2 branches per iteration and takes 1 cycle per
> > iteration, running at 5 IPC (thus, 5 instructions per iteration).
> > Meanwhile, the switched case has 0.5 branch per iteration and takes 0.5
> > cycle per iteration, running at 2 IPC. The half cycle numbers make sense
> > because I believe the two instructions are getting macrofused together
> and
> > execute as a single uop, which causes confusing numbers.
>
> This half a cycle and ninth of a nanosecond problem has been on my mind
> for a
> while. The execution time of anything needs to be a multiple of the cycle
> time, so a CPU running at 4.5 GHz line mine was shouldn't have a
> difference of
> one ninth of a nanosecond. One explanation would be that somehow the CPU
> was
> executing two iterations of the loop at the same time, pipelining.
>
> But disassembling the binary shows a simpler explanation. The switch loop
> was:
>
> 40149f: mov $0x3b9aca00,%eax
> 4014a4: nop
> 4014a5: data16 cs nopw 0x0(%rax,%rax,1)
> 4014b0: sub $0x2,%eax
> 4014b3: jne 4014b0
>
> [Note how there is no test for what was being indexed in the loop!]
>
> Here's what I had missed: sub $2. I'm not entirely certain what GCC was
> thinking here, but it's subtracting 2 instead of 1, so this looped half a
> billion times (0x3b9aca00 / 2). I suppose it decided to unroll the loop a
> bit
> and made two calls to sink() per loop:
>
> template <typename T> void sink(const T &) { asm volatile("" :::
> "memory"); }
>
> But that expanded to nothing in the output. I could add "nop" so we'd see
> what
> happened and the CPU would be obligated to retire those instructions,
> increasing the instruction executed counter (I can't quickly find how many
> the
> TGL processor / WLC core can retire per cycle, but I recall it's 6, so
> adding
> 2 more instructions shouldn't affect the execution time). But I don't
> think I
> need to further benchmark this to prove my point:
>
> The microbenchmark is misleading.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2026-04-03 23:03:09