ISOCPP std-proposals List: Re: [std-proposals] Fwd: Extension to runtime polymorphism proposed

From: Muneem <itfllow123_at_[hidden]>
Date: Sat, 4 Apr 2026 03:53:31 +0500

Sorry, I meant "heterogeneous lists is the problem", so really really sorry
for that mistake, It was an auto correct mistake.

On Sat, 4 Apr 2026, 3:52 am Muneem, <itfllow123_at_[hidden]> wrote:

> Heterogeneous problems is the problem that leads to the code bloat or
> verbosity that I described, but it's hard to chose weather chicken came
> first or the egg in this regard.
>
> Regards, Muneem
>
> On Sat, 4 Apr 2026, 3:50 am Steve Weinrich via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
>> Hi Muneem,
>>
>>
>>
>> If you don’t mind, I would like to limit this portion of our interaction
>> to simply describing the problem. I want to get a 100% understanding of
>> the problem, unfettered by any assumptions (including C++ limitations) or
>> previous solutions. Is a heterogenous list actually the problem or is that
>> simply a solution that you think fits the problem at hand?
>>
>>
>>
>> Cheers,
>> Steve
>>
>>
>>
>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>> Behalf Of *Muneem via Std-Proposals
>> *Sent:* Friday, April 3, 2026 4:40 PM
>> *To:* std-proposals_at_[hidden]
>> *Cc:* Muneem <itfllow123_at_[hidden]>
>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>> proposed
>>
>>
>>
>> Hi!
>>
>> Thanks for your response!!!
>>
>>
>>
>> Yes, your understanding of the problem is mostly correct with a few
>> details left out. The alternative to using classes would be switch
>> statements that as discussed would lead to code bloat and might backfire.
>> The goal is to make a construct that makes the indexing into heterogenous
>> lists more easy. There is how ever one thing that I would like to clarify:
>>
>> The goal isn't the classic polymorphism but to be able to return an
>> object of any type through a specified interface(indexing heterogenous
>> lists), which your example dosent do. Basically, just like GO and many
>> other compiled languages support heterogenous lists, I want c++ to do so as
>> well. Say you want index a bunch of containers(to use top()
>> function),std::visit fails because it can't return any return type that you
>> would wish for, so you can either use switch case to write expression top
>> for each container or this long verbose technique:
>>
>> #include<vector>
>>
>> #include<deque>
>>
>> template<typename T>
>>
>> struct Base{
>>
>> virtual T top_wrapper();
>>
>> };
>>
>>
>>
>> template<typename T>
>>
>> struct Derived_1: Base<T>, std::vector<T>{
>>
>> T top_wrapper() override{
>>
>> return T{*this.top()};//T{} is just to show that it works even
>> if top had some other return type
>>
>> }
>>
>> };
>>
>> template<typename T>
>>
>> struct Derived_2: Base<T>, std:: deque<T>{
>>
>> T top_wrapper() override{
>>
>> return T{*this.top()};//T{} is just to show that it works even
>> if top had some other return type
>>
>> }
>>
>> };
>>
>> int main(){
>>
>> std::vector<Base<int>> a;
>>
>> //The compiler would probably optimize this example, but not an example
>> where you index the vector using real time input
>>
>> return 0;
>>
>> }
>>
>> //An vector of std::variant only works if I am willing to write a helper
>> top(std::variant<Args...> obj) function that includes std::visit to call
>> top(), that in of itself is not only verbose but obscures intent and
>> context.
>>
>>
>>
>>
>>
>> Regards, Muneem.
>>
>>
>>
>> On Sat, 4 Apr 2026, 2:43 am Steve Weinrich via Std-Proposals, <
>> std-proposals_at_[hidden]> wrote:
>>
>> Hi Muneem,
>>
>>
>>
>> I would like to make sure that I understand this problem before going on.
>>
>>
>>
>> I think there are several classes that have a portion (or all) of their
>> interface in common. Each method of the interface returns a constant value:
>>
>>
>>
>> class First
>>
>> {
>>
>> public:
>>
>> int funcA () const { return 1123; }
>>
>> int funcB () const ( return 1234; }
>>
>> int funcC () const { return 1456; }
>>
>> };
>>
>>
>>
>> class Second
>>
>> {
>>
>> public:
>>
>> int funcA () const { return 2123; }
>>
>> int funcB () const ( return 2234; }
>>
>> int funcC () const { return 2456; }
>>
>> };
>>
>>
>>
>> class Third
>>
>> {
>>
>> public:
>>
>> int funcA () const { return 3123; }
>>
>> int funcB () const ( return 3234; }
>>
>> int funcC () const { return 3456; }
>>
>> };
>>
>>
>>
>> 1. We would like a means to be able to add more classes easily.
>> 2. We would like a means to be able to add to the shared interface
>> easily.
>> 3. We would like to be able to use the shared interface in a
>> polymorphic way (like a virtual method).
>> 4. Performance is of the utmost importance.
>>
>>
>>
>> Is my understanding correct?
>>
>>
>>
>> Steve
>>
>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>> Behalf Of *Muneem via Std-Proposals
>> *Sent:* Friday, April 3, 2026 1:54 PM
>> *To:* std-proposals_at_[hidden]
>> *Cc:* Muneem <itfllow123_at_[hidden]>
>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>> proposed
>>
>>
>>
>> Sorry for sending two emails at once!
>>
>> I just wanted to revise the fact that the point of the whole proposal is
>> to provide intent, the code that Mr. Maciera was kind enough to bring
>> forward proves my exact point, that with enough intent, the compiler can
>> optimize anythjng, and these optimizations grow larger as the scale of the
>> program grows larger. Microbenchmarks might show a single example but even
>> that single example should get us thinking that why is it so slow for this
>> one example? Does this overhead force people to write switch case
>> statements that can lead to code bloat which can again backfire in terms of
>> performance?
>>
>> Regards, Muneem.
>>
>>
>>
>> On Sat, 4 Apr 2026, 12:48 am Muneem, <itfllow123_at_[hidden]> wrote:
>>
>> Hi!
>>
>> Thanks again for your feedback, Macieira. 👍
>>
>> >micro benchmark is misleading
>>
>> 1. The reason that I gave you microbenchmarks is that some asked for it,
>> and even I was too relectunt to use them despite the quote of Bjarne
>> Stroustrups
>>
>> "Don't assume, measure" because in this case, the goal is to either make
>> the compiler smaller or runtime faster, both of which are targeted by my
>> new proposal.
>>
>> 2. You are right that the compiler might have folded the loop into half,
>> but the point is that it still shows that the observable behaviour is the
>> same, infact, if the loop body was to index into a heterogeneous set(using
>> the proposed construct) and do some operation then the compiler would
>> optimize the indexing if the source of the index is one. This proves that
>> intent. An help the compiler do wonders:
>>
>> 1.Fold loops even when I used volatile to avoid it.
>>
>> 2.Avoid the entire indexing operations (if in a loop with the most
>> minimal compile time overhead)
>>
>> 3. Store the result immediately after it takes input into some memory
>> location (if that solution is the fasted).
>>
>>
>>
>>
>>
>> 3.Optimize a single expression for the sake of the whole program.
>>
>> Currently, the optimizer might in fact be able to optimize checks in a
>> loop, but it's not as easy or as gurrentied because there are no semantical
>> promises that we can make with the existing constructs to make it happen.
>>
>> 4.My main point isn't weather my benchmark is correct or wrong, but
>> rather that expressing intent is better. The bench mark was merely to show
>> that std::visit is slower (according to g++ and Microsoft visual studio
>> 2026 compiled programs, using std::chorno and visual studio 2026 CPU usage
>> measurement tools to prove my point), but even if some compiler or all
>> compilers optimize their performance; we still have compile time overhead
>> for taking std::visit and making it faster, and the optimization might
>> backfire since it would be to optimize single statements independent of
>> what's in the rest of the program. Why? Because unlike my proposed
>> construct, std::visit does not have enough context and intent to tell the
>> compiler what's going on so that it can generate code that has the exact
>> "book keeping" data and access code that fits the entire program.
>>
>>
>>
>> 3. In case, someone's think a few nano seconds in a single example isn't
>> a big deal, then rethink it because if my construct is passed then yes, it
>> would not be a big deal because the compiler can optimize many indexing
>> operations into a single heterogenous set and maybe cache the result
>> afterwards somewhere. The issue is that this can't be done with the current
>> techniques because of the lack of intent. Compilers are much smarter than
>> we could ever be because they are work of many people's entire career, not
>> just one very smart guy from Intel, so blaming/restricting compilers whose
>> job is to be as general for the sake of the whole program.
>>
>> 4.>I suppose it decided to unroll the loop a >bit
>>
>> >and made two calls to sink() per loop:
>>
>> >template <typename T> void sink(const T >&) { asm volatile("" :::
>> "memory"); }
>>
>> Even if it optimized switch case statement using volatile("" :::
>> "memory"); but not std::visit
>>
>> That's my point isn't that switch case is magically faster, but rather
>> the compiler has more room to cheat and skip things. Infact the standard
>> allows it a lot of free room as long as the observable behaviour is the
>> same, even more so by giving it free room with sets of observable
>> behaviours (unspecified behaviours)
>>
>> 5. Microbe marking wasent to show that std::visit is inherintly slower,
>> but rather the compiler can and should do mistakes in optimizing it, in
>> order to avoid massive compile time overhead.
>>
>>
>>
>>
>>
>>
>>
>> On Fri, 3 Apr 2026, 8:33 pm Thiago Macieira via Std-Proposals, <
>> std-proposals_at_[hidden]> wrote:
>>
>> On Thursday, 2 April 2026 19:15:42 Pacific Daylight Time Thiago Macieira
>> via
>> Std-Proposals wrote:
>> > Even in this case, I have profiled the code above (after fixing it and
>> > removing the std::cout itself) and found that overall, the switched case
>> > ran 2x faster, at 0.113 ns per iteration, while the variant case
>> required
>> > 0.227 ns per iteration. Looking at the CPU performance counters, the
>> > std::variant code has 2 branches per iteration and takes 1 cycle per
>> > iteration, running at 5 IPC (thus, 5 instructions per iteration).
>> > Meanwhile, the switched case has 0.5 branch per iteration and takes 0.5
>> > cycle per iteration, running at 2 IPC. The half cycle numbers make sense
>> > because I believe the two instructions are getting macrofused together
>> and
>> > execute as a single uop, which causes confusing numbers.
>>
>> This half a cycle and ninth of a nanosecond problem has been on my mind
>> for a
>> while. The execution time of anything needs to be a multiple of the cycle
>> time, so a CPU running at 4.5 GHz line mine was shouldn't have a
>> difference of
>> one ninth of a nanosecond. One explanation would be that somehow the CPU
>> was
>> executing two iterations of the loop at the same time, pipelining.
>>
>> But disassembling the binary shows a simpler explanation. The switch loop
>> was:
>>
>> 40149f: mov $0x3b9aca00,%eax
>> 4014a4: nop
>> 4014a5: data16 cs nopw 0x0(%rax,%rax,1)
>> 4014b0: sub $0x2,%eax
>> 4014b3: jne 4014b0
>>
>> [Note how there is no test for what was being indexed in the loop!]
>>
>> Here's what I had missed: sub $2. I'm not entirely certain what GCC was
>> thinking here, but it's subtracting 2 instead of 1, so this looped half a
>> billion times (0x3b9aca00 / 2). I suppose it decided to unroll the loop a
>> bit
>> and made two calls to sink() per loop:
>>
>> template <typename T> void sink(const T &) { asm volatile("" :::
>> "memory"); }
>>
>> But that expanded to nothing in the output. I could add "nop" so we'd see
>> what
>> happened and the CPU would be obligated to retire those instructions,
>> increasing the instruction executed counter (I can't quickly find how
>> many the
>> TGL processor / WLC core can retire per cycle, but I recall it's 6, so
>> adding
>> 2 more instructions shouldn't affect the execution time). But I don't
>> think I
>> need to further benchmark this to prove my point:
>>
>> The microbenchmark is misleading.
>>
>> --
>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>

Received on 2026-04-03 22:53:48