Date: Sat, 4 Apr 2026 06:43:25 +0500
I am really really sorry if my response was no satisfactory.
In short, all I want is to solve the problem where we can't tell the
compiler to "render an object of any type in place at the intermediate code
level". This problem is a problem because we have to do it our selves
unlike in languages like GO. One this problem is solved then heterogenous
lists would be possible.
Regards, Muneem
On Sat, 4 Apr 2026, 6:24 am Steve Weinrich via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> Hi Maneem,
>
> I had hoped by asking you to explain the problem you are trying to solve,
> I might be able to help you describe things in a way that would make it
> easier for you to describe the exact language feature you are proposing.
>
> Either am not smart enough to understand you (a good probability) or you
> are incapable of describing what you seek in C++ language terms. Keep in
> mind that the committee only deals on the language!
>
> I wish you all the best.
>
> Cheers,
> Steve
>
> On Fri, Apr 3, 2026, 19:09 Muneem via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> Thank you for your response!
>> what is the problem(the question you asked)?
>> There are many possible answers to what your question might have
>> meant(hope any of these answer you):
>>
>> 1.This problem before c++ existed and before it had this static type
>> system could be quantified as branching overhead and verbosity, and why we
>> needed better techniques that avoided it.
>> This wilkipedia article explains the first time that branching was a
>> problem(all credits to wilkipedia:
>> https://en.wikipedia.org/wiki/Branch_table):
>> Use of branch tables and other raw data encoding was common in the early
>> days of computing when memory was expensive, CPUs were slower and compact
>> data representation and efficient choice of alternatives were important.
>> This fact is truer now more than ever. In particular the issue in
>> branching: pointer redirection using any implementation like vtables or
>> function pointers. Sometimes, branching may infact be faster, but again,
>> constructs that give no context and intent to the compiler are hard for the
>> compiler to decide what to do with. This is why c++ came and provides if
>> statements, switch statements, ternary statements, and many more so that it
>> can provide the best intermediate code representation possible. Each type
>> of branch statements isn't just a new syntax but it makes a user write a
>> certain way, and let's the compiler do optimizations before the compiler
>> backend reads the code.
>>
>> 2. The problem is the lack of a construct for describing a language level
>> construct for type erasure that can result in optimized intermediate
>> representation of the code.
>>
>> 3.The problem is virtual functions don't do it, switch case statements
>> don't do it, nothing does, manual type erasure code dosent so it. The "it"
>> is type erasure patterns. Switch case (and others) statements fails
>> completely if you don't write code for each object again and again.
>>
>> 4. The problem is verbosity of current branching/polymorphism techniques
>> for type erasure. Not only that but you can't even overload a polymorphic
>> function to return a different type based on an argument(unless the return
>> type is a polymorphic class(known as "Return Type Relaxation" in C++ 4th by
>> Bajrne Stroustrup" section 20.3.6)) in order to fix this problem by the
>> visitor pattern or some other double dispatch pattern.
>>
>> 5. The problem is lack of clear expression of type erasure.
>>
>>
>> 2. I don't want make an heterogeneous list in the traditional sense, but
>> rather a list of const references of any type, so it isnt against c++
>> philosophies, it's just trying to automate the process of manual type
>> erasure and leave it to the compiler to produce optimal intermediate code
>> representation based on the specific program, context of every
>> subscripting, and the source of every subscripting operation. That is as
>> c++ as one can get. It would not be c++ if I were to just ask for
>> heterogeneous list that is completely up to the implementation, but rather
>> I want type erasure for const lvalue references at a language
>> level(optimized intermediate code representation). Think of it like
>> templates, if it helps making it easy to reason. In fact, this is a form of
>> "Return Type Relaxation"(in C++ 4th by Bajrne Stroustrup" section 20.3.6),
>> but instead of pointers and references, but I want only want to use const
>> references.
>>
>>
>>
>> On Sat, 4 Apr 2026, 5:22 am Steve Weinrich via Std-Proposals, <
>> std-proposals_at_[hidden]> wrote:
>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> I am not trying to be difficult, but the “problems” that you have
>>> described are {potentially} the result of implementation choices and/or C++
>>> limitations. What I am currently interested in is the problem that was
>>> presented before making those choices.
>>>
>>>
>>>
>>> Let me see if I can give an example. Someone says, I have a problem.
>>> Every time I insert on object into std::vector, I have to re-sort the
>>> vector. Obviously, they are using the wrong container. *But even if
>>> they switch to a std::map, we don’t know if the data truly needs to be
>>> sorted. Or how frequently it needs to be sorted.*
>>>
>>>
>>>
>>> You say, “You want to choose between three container objects but they
>>> don't have different type.”
>>>
>>>
>>>
>>> That could look like:
>>>
>>>
>>>
>>> std::vector<int> alpha;
>>>
>>> std::vector<int> beta;
>>>
>>> std::vector<int> gamma;
>>>
>>>
>>>
>>> This immediately raises the question, “Why three vectors?”
>>>
>>>
>>>
>>> You say, “You want to choose between any three objects but they have
>>> different types.”
>>>
>>>
>>>
>>> That could look like:
>>>
>>>
>>>
>>> class Alpha
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Alpha specific data
>>>
>>> };
>>>
>>>
>>>
>>> class Beta
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Beta specific data
>>>
>>> };
>>>
>>>
>>>
>>> class Gamma
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Gamma specific data
>>>
>>> };
>>>
>>>
>>>
>>> That immediately raises the question, “Will virtual functions work?”
>>>
>>>
>>>
>>> So I ask again, what is the problem (not the solution)?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 6:03 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> An extended reason to why this class of problems described in the
>>> previous emails exists is because the current constructs dosent give enough
>>> context to intermediate code generation, and solely rely on llvms or other
>>> backends to be advanced enough, which for languages don't make sense
>>> because language are meant to be less verbose and more explicit than a
>>> compiler backend code generation tools.
>>>
>>>
>>>
>>> (Really Sorry for sending two emails at one)
>>>
>>>
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:54 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> The actual class of problems is code repitition and obscurement of
>>> intent:
>>>
>>> 1.Say you want to choose between three container objects but they don't
>>> have different type.
>>>
>>> 2. Say you want to choose between any three objects but they have
>>> different types.
>>>
>>>
>>>
>>> The current fix for both, to allocate those objects in a
>>> std::vector/array of std::variant element type or to use switch statements.
>>> All of these obscure intent making it hard to optimize code in intermediate
>>> code generation. This is actually more common than we think, infact, we can
>>> destroy 99% of germs(branch statements) with function objects that can be
>>> indexed.
>>>
>>>
>>>
>>> In short the class of problems is using too many branch statements for
>>> everything that can't be indexed by current containers.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:24 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> You mention “this class of problems” I would like to know what that
>>> is? Please forget about a heterogenous list. What is the root problem
>>> that the heterogenous list solves? Please describe an actual problem, not
>>> a “it would be nice to solve.”
>>>
>>>
>>>
>>> To make this a little easier and to involve less people, here is my
>>> email: weinrich.steve_at_[hidden]
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 5:17 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Thanks for your interested ❤️❤️❤️
>>>
>>> If you believe that any other solution maybe a better option for this
>>> class of problems then please let me know. In fact, we would collaborate on
>>> the proposal. This is not at odds with c++ because the (recommended)
>>> semantics is a construct that captures values using const references, as
>>> opposed to storing them directly as a struct would, and for the compiler to
>>> access the value by reference, inline code for each branch, or do what it
>>> thinks it best. It is similiar to the heterogeneous lists provided by GO
>>> but it isn't because it captures by const value reference, and references
>>> aren't the same as non const pointers, in the that they can be optimized
>>> like const pointer can't be. Sorry for writing too much, I just got too
>>> excited by someone taking openly interest in this proposal ❤️❤️❤️❤️.
>>>
>>>
>>>
>>> Regards, Muneem
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:11 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> Thanks. I am interested in the original problem. I am curious if a
>>> heterogeneous list is the optimal solution for the problem at hand. Over
>>> the last 50+ years of programming, I have encountered many questions like
>>> yours, which presume a particular solution. Sometimes that solution is at
>>> odds with the language at hand (or other issues). I find that
>>> understanding the original problem allows me to better understand why one
>>> is suggesting a particular language enhancement.
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 5:03 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> >context for readers:second one was a class of problems fixed by
>>> heterogeneous lists.
>>>
>>> It's the second one, and I am really really sorry for the confusion.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:58 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> If I may, a heterogeneous list is not a problem, it is a solution to a
>>> problem. So there are now two possibilities:
>>>
>>> 1. You are trying to create a new std container to solve a class of
>>> problems.
>>> 2. You have some other problem that you have used a heterogenous
>>> list to solve.
>>>
>>>
>>>
>>> What say you?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 4:54 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Sorry, I meant "heterogeneous lists is the problem", so really really
>>> sorry for that mistake, It was an auto correct mistake.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:52 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> Heterogeneous problems is the problem that leads to the code bloat or
>>> verbosity that I described, but it's hard to chose weather chicken came
>>> first or the egg in this regard.
>>>
>>>
>>>
>>> Regards, Muneem
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:50 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> If you don’t mind, I would like to limit this portion of our interaction
>>> to simply describing the problem. I want to get a 100% understanding of
>>> the problem, unfettered by any assumptions (including C++ limitations) or
>>> previous solutions. Is a heterogenous list actually the problem or is that
>>> simply a solution that you think fits the problem at hand?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 4:40 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Hi!
>>>
>>> Thanks for your response!!!
>>>
>>>
>>>
>>> Yes, your understanding of the problem is mostly correct with a few
>>> details left out. The alternative to using classes would be switch
>>> statements that as discussed would lead to code bloat and might backfire.
>>> The goal is to make a construct that makes the indexing into heterogenous
>>> lists more easy. There is how ever one thing that I would like to clarify:
>>>
>>> The goal isn't the classic polymorphism but to be able to return an
>>> object of any type through a specified interface(indexing heterogenous
>>> lists), which your example dosent do. Basically, just like GO and many
>>> other compiled languages support heterogenous lists, I want c++ to do so as
>>> well. Say you want index a bunch of containers(to use top()
>>> function),std::visit fails because it can't return any return type that you
>>> would wish for, so you can either use switch case to write expression top
>>> for each container or this long verbose technique:
>>>
>>> #include<vector>
>>>
>>> #include<deque>
>>>
>>> template<typename T>
>>>
>>> struct Base{
>>>
>>> virtual T top_wrapper();
>>>
>>> };
>>>
>>>
>>>
>>> template<typename T>
>>>
>>> struct Derived_1: Base<T>, std::vector<T>{
>>>
>>> T top_wrapper() override{
>>>
>>> return T{*this.top()};//T{} is just to show that it works
>>> even if top had some other return type
>>>
>>> }
>>>
>>> };
>>>
>>> template<typename T>
>>>
>>> struct Derived_2: Base<T>, std:: deque<T>{
>>>
>>> T top_wrapper() override{
>>>
>>> return T{*this.top()};//T{} is just to show that it works
>>> even if top had some other return type
>>>
>>> }
>>>
>>> };
>>>
>>> int main(){
>>>
>>> std::vector<Base<int>> a;
>>>
>>> //The compiler would probably optimize this example, but not an example
>>> where you index the vector using real time input
>>>
>>> return 0;
>>>
>>> }
>>>
>>> //An vector of std::variant only works if I am willing to write a helper
>>> top(std::variant<Args...> obj) function that includes std::visit to call
>>> top(), that in of itself is not only verbose but obscures intent and
>>> context.
>>>
>>>
>>>
>>>
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 2:43 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> I would like to make sure that I understand this problem before going on.
>>>
>>>
>>>
>>> I think there are several classes that have a portion (or all) of their
>>> interface in common. Each method of the interface returns a constant value:
>>>
>>>
>>>
>>> class First
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 1123; }
>>>
>>> int funcB () const ( return 1234; }
>>>
>>> int funcC () const { return 1456; }
>>>
>>> };
>>>
>>>
>>>
>>> class Second
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 2123; }
>>>
>>> int funcB () const ( return 2234; }
>>>
>>> int funcC () const { return 2456; }
>>>
>>> };
>>>
>>>
>>>
>>> class Third
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 3123; }
>>>
>>> int funcB () const ( return 3234; }
>>>
>>> int funcC () const { return 3456; }
>>>
>>> };
>>>
>>>
>>>
>>> 1. We would like a means to be able to add more classes easily.
>>> 2. We would like a means to be able to add to the shared interface
>>> easily.
>>> 3. We would like to be able to use the shared interface in a
>>> polymorphic way (like a virtual method).
>>> 4. Performance is of the utmost importance.
>>>
>>>
>>>
>>> Is my understanding correct?
>>>
>>>
>>>
>>> Steve
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 1:54 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Sorry for sending two emails at once!
>>>
>>> I just wanted to revise the fact that the point of the whole proposal is
>>> to provide intent, the code that Mr. Maciera was kind enough to bring
>>> forward proves my exact point, that with enough intent, the compiler can
>>> optimize anythjng, and these optimizations grow larger as the scale of the
>>> program grows larger. Microbenchmarks might show a single example but even
>>> that single example should get us thinking that why is it so slow for this
>>> one example? Does this overhead force people to write switch case
>>> statements that can lead to code bloat which can again backfire in terms of
>>> performance?
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 12:48 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> Hi!
>>>
>>> Thanks again for your feedback, Macieira. 👍
>>>
>>> >micro benchmark is misleading
>>>
>>> 1. The reason that I gave you microbenchmarks is that some asked for it,
>>> and even I was too relectunt to use them despite the quote of Bjarne
>>> Stroustrups
>>>
>>> "Don't assume, measure" because in this case, the goal is to either make
>>> the compiler smaller or runtime faster, both of which are targeted by my
>>> new proposal.
>>>
>>> 2. You are right that the compiler might have folded the loop into
>>> half, but the point is that it still shows that the observable behaviour
>>> is the same, infact, if the loop body was to index into a heterogeneous
>>> set(using the proposed construct) and do some operation then the compiler
>>> would optimize the indexing if the source of the index is one. This proves
>>> that intent. An help the compiler do wonders:
>>>
>>> 1.Fold loops even when I used volatile to avoid it.
>>>
>>> 2.Avoid the entire indexing operations (if in a loop with the most
>>> minimal compile time overhead)
>>>
>>> 3. Store the result immediately after it takes input into some memory
>>> location (if that solution is the fasted).
>>>
>>>
>>>
>>>
>>>
>>> 3.Optimize a single expression for the sake of the whole program.
>>>
>>> Currently, the optimizer might in fact be able to optimize checks in a
>>> loop, but it's not as easy or as gurrentied because there are no semantical
>>> promises that we can make with the existing constructs to make it happen.
>>>
>>> 4.My main point isn't weather my benchmark is correct or wrong, but
>>> rather that expressing intent is better. The bench mark was merely to show
>>> that std::visit is slower (according to g++ and Microsoft visual studio
>>> 2026 compiled programs, using std::chorno and visual studio 2026 CPU usage
>>> measurement tools to prove my point), but even if some compiler or all
>>> compilers optimize their performance; we still have compile time overhead
>>> for taking std::visit and making it faster, and the optimization might
>>> backfire since it would be to optimize single statements independent of
>>> what's in the rest of the program. Why? Because unlike my proposed
>>> construct, std::visit does not have enough context and intent to tell the
>>> compiler what's going on so that it can generate code that has the exact
>>> "book keeping" data and access code that fits the entire program.
>>>
>>>
>>>
>>> 3. In case, someone's think a few nano seconds in a single example isn't
>>> a big deal, then rethink it because if my construct is passed then yes, it
>>> would not be a big deal because the compiler can optimize many indexing
>>> operations into a single heterogenous set and maybe cache the result
>>> afterwards somewhere. The issue is that this can't be done with the current
>>> techniques because of the lack of intent. Compilers are much smarter than
>>> we could ever be because they are work of many people's entire career, not
>>> just one very smart guy from Intel, so blaming/restricting compilers whose
>>> job is to be as general for the sake of the whole program.
>>>
>>> 4.>I suppose it decided to unroll the loop a >bit
>>>
>>> >and made two calls to sink() per loop:
>>>
>>> >template <typename T> void sink(const T >&) { asm volatile("" :::
>>> "memory"); }
>>>
>>> Even if it optimized switch case statement using volatile("" :::
>>> "memory"); but not std::visit
>>>
>>> That's my point isn't that switch case is magically faster, but rather
>>> the compiler has more room to cheat and skip things. Infact the standard
>>> allows it a lot of free room as long as the observable behaviour is the
>>> same, even more so by giving it free room with sets of observable
>>> behaviours (unspecified behaviours)
>>>
>>> 5. Microbe marking wasent to show that std::visit is inherintly slower,
>>> but rather the compiler can and should do mistakes in optimizing it, in
>>> order to avoid massive compile time overhead.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 3 Apr 2026, 8:33 pm Thiago Macieira via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> On Thursday, 2 April 2026 19:15:42 Pacific Daylight Time Thiago Macieira
>>> via
>>> Std-Proposals wrote:
>>> > Even in this case, I have profiled the code above (after fixing it and
>>> > removing the std::cout itself) and found that overall, the switched
>>> case
>>> > ran 2x faster, at 0.113 ns per iteration, while the variant case
>>> required
>>> > 0.227 ns per iteration. Looking at the CPU performance counters, the
>>> > std::variant code has 2 branches per iteration and takes 1 cycle per
>>> > iteration, running at 5 IPC (thus, 5 instructions per iteration).
>>> > Meanwhile, the switched case has 0.5 branch per iteration and takes 0.5
>>> > cycle per iteration, running at 2 IPC. The half cycle numbers make
>>> sense
>>> > because I believe the two instructions are getting macrofused together
>>> and
>>> > execute as a single uop, which causes confusing numbers.
>>>
>>> This half a cycle and ninth of a nanosecond problem has been on my mind
>>> for a
>>> while. The execution time of anything needs to be a multiple of the
>>> cycle
>>> time, so a CPU running at 4.5 GHz line mine was shouldn't have a
>>> difference of
>>> one ninth of a nanosecond. One explanation would be that somehow the CPU
>>> was
>>> executing two iterations of the loop at the same time, pipelining.
>>>
>>> But disassembling the binary shows a simpler explanation. The switch
>>> loop was:
>>>
>>> 40149f: mov $0x3b9aca00,%eax
>>> 4014a4: nop
>>> 4014a5: data16 cs nopw 0x0(%rax,%rax,1)
>>> 4014b0: sub $0x2,%eax
>>> 4014b3: jne 4014b0
>>>
>>> [Note how there is no test for what was being indexed in the loop!]
>>>
>>> Here's what I had missed: sub $2. I'm not entirely certain what GCC was
>>> thinking here, but it's subtracting 2 instead of 1, so this looped half
>>> a
>>> billion times (0x3b9aca00 / 2). I suppose it decided to unroll the loop
>>> a bit
>>> and made two calls to sink() per loop:
>>>
>>> template <typename T> void sink(const T &) { asm volatile("" :::
>>> "memory"); }
>>>
>>> But that expanded to nothing in the output. I could add "nop" so we'd
>>> see what
>>> happened and the CPU would be obligated to retire those instructions,
>>> increasing the instruction executed counter (I can't quickly find how
>>> many the
>>> TGL processor / WLC core can retire per cycle, but I recall it's 6, so
>>> adding
>>> 2 more instructions shouldn't affect the execution time). But I don't
>>> think I
>>> need to further benchmark this to prove my point:
>>>
>>> The microbenchmark is misleading.
>>>
>>> --
>>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
In short, all I want is to solve the problem where we can't tell the
compiler to "render an object of any type in place at the intermediate code
level". This problem is a problem because we have to do it our selves
unlike in languages like GO. One this problem is solved then heterogenous
lists would be possible.
Regards, Muneem
On Sat, 4 Apr 2026, 6:24 am Steve Weinrich via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> Hi Maneem,
>
> I had hoped by asking you to explain the problem you are trying to solve,
> I might be able to help you describe things in a way that would make it
> easier for you to describe the exact language feature you are proposing.
>
> Either am not smart enough to understand you (a good probability) or you
> are incapable of describing what you seek in C++ language terms. Keep in
> mind that the committee only deals on the language!
>
> I wish you all the best.
>
> Cheers,
> Steve
>
> On Fri, Apr 3, 2026, 19:09 Muneem via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> Thank you for your response!
>> what is the problem(the question you asked)?
>> There are many possible answers to what your question might have
>> meant(hope any of these answer you):
>>
>> 1.This problem before c++ existed and before it had this static type
>> system could be quantified as branching overhead and verbosity, and why we
>> needed better techniques that avoided it.
>> This wilkipedia article explains the first time that branching was a
>> problem(all credits to wilkipedia:
>> https://en.wikipedia.org/wiki/Branch_table):
>> Use of branch tables and other raw data encoding was common in the early
>> days of computing when memory was expensive, CPUs were slower and compact
>> data representation and efficient choice of alternatives were important.
>> This fact is truer now more than ever. In particular the issue in
>> branching: pointer redirection using any implementation like vtables or
>> function pointers. Sometimes, branching may infact be faster, but again,
>> constructs that give no context and intent to the compiler are hard for the
>> compiler to decide what to do with. This is why c++ came and provides if
>> statements, switch statements, ternary statements, and many more so that it
>> can provide the best intermediate code representation possible. Each type
>> of branch statements isn't just a new syntax but it makes a user write a
>> certain way, and let's the compiler do optimizations before the compiler
>> backend reads the code.
>>
>> 2. The problem is the lack of a construct for describing a language level
>> construct for type erasure that can result in optimized intermediate
>> representation of the code.
>>
>> 3.The problem is virtual functions don't do it, switch case statements
>> don't do it, nothing does, manual type erasure code dosent so it. The "it"
>> is type erasure patterns. Switch case (and others) statements fails
>> completely if you don't write code for each object again and again.
>>
>> 4. The problem is verbosity of current branching/polymorphism techniques
>> for type erasure. Not only that but you can't even overload a polymorphic
>> function to return a different type based on an argument(unless the return
>> type is a polymorphic class(known as "Return Type Relaxation" in C++ 4th by
>> Bajrne Stroustrup" section 20.3.6)) in order to fix this problem by the
>> visitor pattern or some other double dispatch pattern.
>>
>> 5. The problem is lack of clear expression of type erasure.
>>
>>
>> 2. I don't want make an heterogeneous list in the traditional sense, but
>> rather a list of const references of any type, so it isnt against c++
>> philosophies, it's just trying to automate the process of manual type
>> erasure and leave it to the compiler to produce optimal intermediate code
>> representation based on the specific program, context of every
>> subscripting, and the source of every subscripting operation. That is as
>> c++ as one can get. It would not be c++ if I were to just ask for
>> heterogeneous list that is completely up to the implementation, but rather
>> I want type erasure for const lvalue references at a language
>> level(optimized intermediate code representation). Think of it like
>> templates, if it helps making it easy to reason. In fact, this is a form of
>> "Return Type Relaxation"(in C++ 4th by Bajrne Stroustrup" section 20.3.6),
>> but instead of pointers and references, but I want only want to use const
>> references.
>>
>>
>>
>> On Sat, 4 Apr 2026, 5:22 am Steve Weinrich via Std-Proposals, <
>> std-proposals_at_[hidden]> wrote:
>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> I am not trying to be difficult, but the “problems” that you have
>>> described are {potentially} the result of implementation choices and/or C++
>>> limitations. What I am currently interested in is the problem that was
>>> presented before making those choices.
>>>
>>>
>>>
>>> Let me see if I can give an example. Someone says, I have a problem.
>>> Every time I insert on object into std::vector, I have to re-sort the
>>> vector. Obviously, they are using the wrong container. *But even if
>>> they switch to a std::map, we don’t know if the data truly needs to be
>>> sorted. Or how frequently it needs to be sorted.*
>>>
>>>
>>>
>>> You say, “You want to choose between three container objects but they
>>> don't have different type.”
>>>
>>>
>>>
>>> That could look like:
>>>
>>>
>>>
>>> std::vector<int> alpha;
>>>
>>> std::vector<int> beta;
>>>
>>> std::vector<int> gamma;
>>>
>>>
>>>
>>> This immediately raises the question, “Why three vectors?”
>>>
>>>
>>>
>>> You say, “You want to choose between any three objects but they have
>>> different types.”
>>>
>>>
>>>
>>> That could look like:
>>>
>>>
>>>
>>> class Alpha
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Alpha specific data
>>>
>>> };
>>>
>>>
>>>
>>> class Beta
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Beta specific data
>>>
>>> };
>>>
>>>
>>>
>>> class Gamma
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA ();
>>>
>>>
>>>
>>> private:
>>>
>>> // Some Gamma specific data
>>>
>>> };
>>>
>>>
>>>
>>> That immediately raises the question, “Will virtual functions work?”
>>>
>>>
>>>
>>> So I ask again, what is the problem (not the solution)?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 6:03 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> An extended reason to why this class of problems described in the
>>> previous emails exists is because the current constructs dosent give enough
>>> context to intermediate code generation, and solely rely on llvms or other
>>> backends to be advanced enough, which for languages don't make sense
>>> because language are meant to be less verbose and more explicit than a
>>> compiler backend code generation tools.
>>>
>>>
>>>
>>> (Really Sorry for sending two emails at one)
>>>
>>>
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:54 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> The actual class of problems is code repitition and obscurement of
>>> intent:
>>>
>>> 1.Say you want to choose between three container objects but they don't
>>> have different type.
>>>
>>> 2. Say you want to choose between any three objects but they have
>>> different types.
>>>
>>>
>>>
>>> The current fix for both, to allocate those objects in a
>>> std::vector/array of std::variant element type or to use switch statements.
>>> All of these obscure intent making it hard to optimize code in intermediate
>>> code generation. This is actually more common than we think, infact, we can
>>> destroy 99% of germs(branch statements) with function objects that can be
>>> indexed.
>>>
>>>
>>>
>>> In short the class of problems is using too many branch statements for
>>> everything that can't be indexed by current containers.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:24 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> You mention “this class of problems” I would like to know what that
>>> is? Please forget about a heterogenous list. What is the root problem
>>> that the heterogenous list solves? Please describe an actual problem, not
>>> a “it would be nice to solve.”
>>>
>>>
>>>
>>> To make this a little easier and to involve less people, here is my
>>> email: weinrich.steve_at_[hidden]
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 5:17 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Thanks for your interested ❤️❤️❤️
>>>
>>> If you believe that any other solution maybe a better option for this
>>> class of problems then please let me know. In fact, we would collaborate on
>>> the proposal. This is not at odds with c++ because the (recommended)
>>> semantics is a construct that captures values using const references, as
>>> opposed to storing them directly as a struct would, and for the compiler to
>>> access the value by reference, inline code for each branch, or do what it
>>> thinks it best. It is similiar to the heterogeneous lists provided by GO
>>> but it isn't because it captures by const value reference, and references
>>> aren't the same as non const pointers, in the that they can be optimized
>>> like const pointer can't be. Sorry for writing too much, I just got too
>>> excited by someone taking openly interest in this proposal ❤️❤️❤️❤️.
>>>
>>>
>>>
>>> Regards, Muneem
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 4:11 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> Thanks. I am interested in the original problem. I am curious if a
>>> heterogeneous list is the optimal solution for the problem at hand. Over
>>> the last 50+ years of programming, I have encountered many questions like
>>> yours, which presume a particular solution. Sometimes that solution is at
>>> odds with the language at hand (or other issues). I find that
>>> understanding the original problem allows me to better understand why one
>>> is suggesting a particular language enhancement.
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 5:03 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> >context for readers:second one was a class of problems fixed by
>>> heterogeneous lists.
>>>
>>> It's the second one, and I am really really sorry for the confusion.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:58 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> If I may, a heterogeneous list is not a problem, it is a solution to a
>>> problem. So there are now two possibilities:
>>>
>>> 1. You are trying to create a new std container to solve a class of
>>> problems.
>>> 2. You have some other problem that you have used a heterogenous
>>> list to solve.
>>>
>>>
>>>
>>> What say you?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 4:54 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Sorry, I meant "heterogeneous lists is the problem", so really really
>>> sorry for that mistake, It was an auto correct mistake.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:52 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> Heterogeneous problems is the problem that leads to the code bloat or
>>> verbosity that I described, but it's hard to chose weather chicken came
>>> first or the egg in this regard.
>>>
>>>
>>>
>>> Regards, Muneem
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 3:50 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> If you don’t mind, I would like to limit this portion of our interaction
>>> to simply describing the problem. I want to get a 100% understanding of
>>> the problem, unfettered by any assumptions (including C++ limitations) or
>>> previous solutions. Is a heterogenous list actually the problem or is that
>>> simply a solution that you think fits the problem at hand?
>>>
>>>
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 4:40 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Hi!
>>>
>>> Thanks for your response!!!
>>>
>>>
>>>
>>> Yes, your understanding of the problem is mostly correct with a few
>>> details left out. The alternative to using classes would be switch
>>> statements that as discussed would lead to code bloat and might backfire.
>>> The goal is to make a construct that makes the indexing into heterogenous
>>> lists more easy. There is how ever one thing that I would like to clarify:
>>>
>>> The goal isn't the classic polymorphism but to be able to return an
>>> object of any type through a specified interface(indexing heterogenous
>>> lists), which your example dosent do. Basically, just like GO and many
>>> other compiled languages support heterogenous lists, I want c++ to do so as
>>> well. Say you want index a bunch of containers(to use top()
>>> function),std::visit fails because it can't return any return type that you
>>> would wish for, so you can either use switch case to write expression top
>>> for each container or this long verbose technique:
>>>
>>> #include<vector>
>>>
>>> #include<deque>
>>>
>>> template<typename T>
>>>
>>> struct Base{
>>>
>>> virtual T top_wrapper();
>>>
>>> };
>>>
>>>
>>>
>>> template<typename T>
>>>
>>> struct Derived_1: Base<T>, std::vector<T>{
>>>
>>> T top_wrapper() override{
>>>
>>> return T{*this.top()};//T{} is just to show that it works
>>> even if top had some other return type
>>>
>>> }
>>>
>>> };
>>>
>>> template<typename T>
>>>
>>> struct Derived_2: Base<T>, std:: deque<T>{
>>>
>>> T top_wrapper() override{
>>>
>>> return T{*this.top()};//T{} is just to show that it works
>>> even if top had some other return type
>>>
>>> }
>>>
>>> };
>>>
>>> int main(){
>>>
>>> std::vector<Base<int>> a;
>>>
>>> //The compiler would probably optimize this example, but not an example
>>> where you index the vector using real time input
>>>
>>> return 0;
>>>
>>> }
>>>
>>> //An vector of std::variant only works if I am willing to write a helper
>>> top(std::variant<Args...> obj) function that includes std::visit to call
>>> top(), that in of itself is not only verbose but obscures intent and
>>> context.
>>>
>>>
>>>
>>>
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 2:43 am Steve Weinrich via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> Hi Muneem,
>>>
>>>
>>>
>>> I would like to make sure that I understand this problem before going on.
>>>
>>>
>>>
>>> I think there are several classes that have a portion (or all) of their
>>> interface in common. Each method of the interface returns a constant value:
>>>
>>>
>>>
>>> class First
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 1123; }
>>>
>>> int funcB () const ( return 1234; }
>>>
>>> int funcC () const { return 1456; }
>>>
>>> };
>>>
>>>
>>>
>>> class Second
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 2123; }
>>>
>>> int funcB () const ( return 2234; }
>>>
>>> int funcC () const { return 2456; }
>>>
>>> };
>>>
>>>
>>>
>>> class Third
>>>
>>> {
>>>
>>> public:
>>>
>>> int funcA () const { return 3123; }
>>>
>>> int funcB () const ( return 3234; }
>>>
>>> int funcC () const { return 3456; }
>>>
>>> };
>>>
>>>
>>>
>>> 1. We would like a means to be able to add more classes easily.
>>> 2. We would like a means to be able to add to the shared interface
>>> easily.
>>> 3. We would like to be able to use the shared interface in a
>>> polymorphic way (like a virtual method).
>>> 4. Performance is of the utmost importance.
>>>
>>>
>>>
>>> Is my understanding correct?
>>>
>>>
>>>
>>> Steve
>>>
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>> Behalf Of *Muneem via Std-Proposals
>>> *Sent:* Friday, April 3, 2026 1:54 PM
>>> *To:* std-proposals_at_[hidden]
>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>> proposed
>>>
>>>
>>>
>>> Sorry for sending two emails at once!
>>>
>>> I just wanted to revise the fact that the point of the whole proposal is
>>> to provide intent, the code that Mr. Maciera was kind enough to bring
>>> forward proves my exact point, that with enough intent, the compiler can
>>> optimize anythjng, and these optimizations grow larger as the scale of the
>>> program grows larger. Microbenchmarks might show a single example but even
>>> that single example should get us thinking that why is it so slow for this
>>> one example? Does this overhead force people to write switch case
>>> statements that can lead to code bloat which can again backfire in terms of
>>> performance?
>>>
>>> Regards, Muneem.
>>>
>>>
>>>
>>> On Sat, 4 Apr 2026, 12:48 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>
>>> Hi!
>>>
>>> Thanks again for your feedback, Macieira. 👍
>>>
>>> >micro benchmark is misleading
>>>
>>> 1. The reason that I gave you microbenchmarks is that some asked for it,
>>> and even I was too relectunt to use them despite the quote of Bjarne
>>> Stroustrups
>>>
>>> "Don't assume, measure" because in this case, the goal is to either make
>>> the compiler smaller or runtime faster, both of which are targeted by my
>>> new proposal.
>>>
>>> 2. You are right that the compiler might have folded the loop into
>>> half, but the point is that it still shows that the observable behaviour
>>> is the same, infact, if the loop body was to index into a heterogeneous
>>> set(using the proposed construct) and do some operation then the compiler
>>> would optimize the indexing if the source of the index is one. This proves
>>> that intent. An help the compiler do wonders:
>>>
>>> 1.Fold loops even when I used volatile to avoid it.
>>>
>>> 2.Avoid the entire indexing operations (if in a loop with the most
>>> minimal compile time overhead)
>>>
>>> 3. Store the result immediately after it takes input into some memory
>>> location (if that solution is the fasted).
>>>
>>>
>>>
>>>
>>>
>>> 3.Optimize a single expression for the sake of the whole program.
>>>
>>> Currently, the optimizer might in fact be able to optimize checks in a
>>> loop, but it's not as easy or as gurrentied because there are no semantical
>>> promises that we can make with the existing constructs to make it happen.
>>>
>>> 4.My main point isn't weather my benchmark is correct or wrong, but
>>> rather that expressing intent is better. The bench mark was merely to show
>>> that std::visit is slower (according to g++ and Microsoft visual studio
>>> 2026 compiled programs, using std::chorno and visual studio 2026 CPU usage
>>> measurement tools to prove my point), but even if some compiler or all
>>> compilers optimize their performance; we still have compile time overhead
>>> for taking std::visit and making it faster, and the optimization might
>>> backfire since it would be to optimize single statements independent of
>>> what's in the rest of the program. Why? Because unlike my proposed
>>> construct, std::visit does not have enough context and intent to tell the
>>> compiler what's going on so that it can generate code that has the exact
>>> "book keeping" data and access code that fits the entire program.
>>>
>>>
>>>
>>> 3. In case, someone's think a few nano seconds in a single example isn't
>>> a big deal, then rethink it because if my construct is passed then yes, it
>>> would not be a big deal because the compiler can optimize many indexing
>>> operations into a single heterogenous set and maybe cache the result
>>> afterwards somewhere. The issue is that this can't be done with the current
>>> techniques because of the lack of intent. Compilers are much smarter than
>>> we could ever be because they are work of many people's entire career, not
>>> just one very smart guy from Intel, so blaming/restricting compilers whose
>>> job is to be as general for the sake of the whole program.
>>>
>>> 4.>I suppose it decided to unroll the loop a >bit
>>>
>>> >and made two calls to sink() per loop:
>>>
>>> >template <typename T> void sink(const T >&) { asm volatile("" :::
>>> "memory"); }
>>>
>>> Even if it optimized switch case statement using volatile("" :::
>>> "memory"); but not std::visit
>>>
>>> That's my point isn't that switch case is magically faster, but rather
>>> the compiler has more room to cheat and skip things. Infact the standard
>>> allows it a lot of free room as long as the observable behaviour is the
>>> same, even more so by giving it free room with sets of observable
>>> behaviours (unspecified behaviours)
>>>
>>> 5. Microbe marking wasent to show that std::visit is inherintly slower,
>>> but rather the compiler can and should do mistakes in optimizing it, in
>>> order to avoid massive compile time overhead.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 3 Apr 2026, 8:33 pm Thiago Macieira via Std-Proposals, <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>> On Thursday, 2 April 2026 19:15:42 Pacific Daylight Time Thiago Macieira
>>> via
>>> Std-Proposals wrote:
>>> > Even in this case, I have profiled the code above (after fixing it and
>>> > removing the std::cout itself) and found that overall, the switched
>>> case
>>> > ran 2x faster, at 0.113 ns per iteration, while the variant case
>>> required
>>> > 0.227 ns per iteration. Looking at the CPU performance counters, the
>>> > std::variant code has 2 branches per iteration and takes 1 cycle per
>>> > iteration, running at 5 IPC (thus, 5 instructions per iteration).
>>> > Meanwhile, the switched case has 0.5 branch per iteration and takes 0.5
>>> > cycle per iteration, running at 2 IPC. The half cycle numbers make
>>> sense
>>> > because I believe the two instructions are getting macrofused together
>>> and
>>> > execute as a single uop, which causes confusing numbers.
>>>
>>> This half a cycle and ninth of a nanosecond problem has been on my mind
>>> for a
>>> while. The execution time of anything needs to be a multiple of the
>>> cycle
>>> time, so a CPU running at 4.5 GHz line mine was shouldn't have a
>>> difference of
>>> one ninth of a nanosecond. One explanation would be that somehow the CPU
>>> was
>>> executing two iterations of the loop at the same time, pipelining.
>>>
>>> But disassembling the binary shows a simpler explanation. The switch
>>> loop was:
>>>
>>> 40149f: mov $0x3b9aca00,%eax
>>> 4014a4: nop
>>> 4014a5: data16 cs nopw 0x0(%rax,%rax,1)
>>> 4014b0: sub $0x2,%eax
>>> 4014b3: jne 4014b0
>>>
>>> [Note how there is no test for what was being indexed in the loop!]
>>>
>>> Here's what I had missed: sub $2. I'm not entirely certain what GCC was
>>> thinking here, but it's subtracting 2 instead of 1, so this looped half
>>> a
>>> billion times (0x3b9aca00 / 2). I suppose it decided to unroll the loop
>>> a bit
>>> and made two calls to sink() per loop:
>>>
>>> template <typename T> void sink(const T &) { asm volatile("" :::
>>> "memory"); }
>>>
>>> But that expanded to nothing in the output. I could add "nop" so we'd
>>> see what
>>> happened and the CPU would be obligated to retire those instructions,
>>> increasing the instruction executed counter (I can't quickly find how
>>> many the
>>> TGL processor / WLC core can retire per cycle, but I recall it's 6, so
>>> adding
>>> 2 more instructions shouldn't affect the execution time). But I don't
>>> think I
>>> need to further benchmark this to prove my point:
>>>
>>> The microbenchmark is misleading.
>>>
>>> --
>>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2026-04-04 01:43:41
