ISOCPP std-proposals List: Re: [std-proposals] Extension to runtime polymorphism proposed

From: Steve Weinrich <weinrich.steve_at_[hidden]>
Date: Fri, 3 Apr 2026 20:32:09 -0600

Let me give you a example.

A programmer (a long time ago) notices in Fortran that the ABS function has
different names for different types of arguments. There is IABS (for
integers), FABS (for floating point), etc.

They propose a new feature. Let the compiler figure out which function to
call based on the type of the argument. As they say, "The rest is history."

What you need to do is first (and I really mean it) is show us using a very
simple example of the language deficiency you are trying to address. You
keep talking about aspects of what you have in mind, but they are all back
end.

Let us pretend there is no ++ operator. Everyone writes: a = a + 1;

Someone proposes a new feature to simplify that to: ++a;

They can point out it's benefits. Point out the syntax. It is a clearly
describable thing.

Show us what you want! No complicated exposition. No unproven claims.
Start with some code.

I hope that helps!

Steve

On Fri, Apr 3, 2026, 19:47 Muneem via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> I really wish that you can advice me on how to present this problem
> because it does seem like I may be incapable of putting it into words. Like
> please do a counter proposal, I would really appreciate it.
> Regards, Muneem.
>
> On Sat, 4 Apr 2026, 6:43 am Muneem, <itfllow123_at_[hidden]> wrote:
>
>> I am really really sorry if my response was no satisfactory.
>> In short, all I want is to solve the problem where we can't tell the
>> compiler to "render an object of any type in place at the intermediate code
>> level". This problem is a problem because we have to do it our selves
>> unlike in languages like GO. One this problem is solved then heterogenous
>> lists would be possible.
>> Regards, Muneem
>>
>> On Sat, 4 Apr 2026, 6:24 am Steve Weinrich via Std-Proposals, <
>> std-proposals_at_[hidden]> wrote:
>>
>>> Hi Maneem,
>>>
>>> I had hoped by asking you to explain the problem you are trying to
>>> solve, I might be able to help you describe things in a way that would make
>>> it easier for you to describe the exact language feature you are proposing.
>>>
>>> Either am not smart enough to understand you (a good probability) or you
>>> are incapable of describing what you seek in C++ language terms. Keep in
>>> mind that the committee only deals on the language!
>>>
>>> I wish you all the best.
>>>
>>> Cheers,
>>> Steve
>>>
>>> On Fri, Apr 3, 2026, 19:09 Muneem via Std-Proposals <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>>> Thank you for your response!
>>>> what is the problem(the question you asked)?
>>>> There are many possible answers to what your question might have
>>>> meant(hope any of these answer you):
>>>>
>>>> 1.This problem before c++ existed and before it had this static type
>>>> system could be quantified as branching overhead and verbosity, and why we
>>>> needed better techniques that avoided it.
>>>> This wilkipedia article explains the first time that branching was a
>>>> problem(all credits to wilkipedia:
>>>> https://en.wikipedia.org/wiki/Branch_table):
>>>> Use of branch tables and other raw data encoding was common in the
>>>> early days of computing when memory was expensive, CPUs were slower and
>>>> compact data representation and efficient choice of alternatives were
>>>> important. This fact is truer now more than ever. In particular the issue
>>>> in branching: pointer redirection using any implementation like vtables or
>>>> function pointers. Sometimes, branching may infact be faster, but again,
>>>> constructs that give no context and intent to the compiler are hard for the
>>>> compiler to decide what to do with. This is why c++ came and provides if
>>>> statements, switch statements, ternary statements, and many more so that it
>>>> can provide the best intermediate code representation possible. Each type
>>>> of branch statements isn't just a new syntax but it makes a user write a
>>>> certain way, and let's the compiler do optimizations before the compiler
>>>> backend reads the code.
>>>>
>>>> 2. The problem is the lack of a construct for describing a language
>>>> level construct for type erasure that can result in optimized intermediate
>>>> representation of the code.
>>>>
>>>> 3.The problem is virtual functions don't do it, switch case statements
>>>> don't do it, nothing does, manual type erasure code dosent so it. The "it"
>>>> is type erasure patterns. Switch case (and others) statements fails
>>>> completely if you don't write code for each object again and again.
>>>>
>>>> 4. The problem is verbosity of current branching/polymorphism
>>>> techniques for type erasure. Not only that but you can't even overload a
>>>> polymorphic function to return a different type based on an argument(unless
>>>> the return type is a polymorphic class(known as "Return Type Relaxation" in
>>>> C++ 4th by Bajrne Stroustrup" section 20.3.6)) in order to fix this problem
>>>> by the visitor pattern or some other double dispatch pattern.
>>>>
>>>> 5. The problem is lack of clear expression of type erasure.
>>>>
>>>>
>>>> 2. I don't want make an heterogeneous list in the traditional sense,
>>>> but rather a list of const references of any type, so it isnt against c++
>>>> philosophies, it's just trying to automate the process of manual type
>>>> erasure and leave it to the compiler to produce optimal intermediate code
>>>> representation based on the specific program, context of every
>>>> subscripting, and the source of every subscripting operation. That is as
>>>> c++ as one can get. It would not be c++ if I were to just ask for
>>>> heterogeneous list that is completely up to the implementation, but rather
>>>> I want type erasure for const lvalue references at a language
>>>> level(optimized intermediate code representation). Think of it like
>>>> templates, if it helps making it easy to reason. In fact, this is a form of
>>>> "Return Type Relaxation"(in C++ 4th by Bajrne Stroustrup" section 20.3.6),
>>>> but instead of pointers and references, but I want only want to use const
>>>> references.
>>>>
>>>>
>>>>
>>>> On Sat, 4 Apr 2026, 5:22 am Steve Weinrich via Std-Proposals, <
>>>> std-proposals_at_[hidden]> wrote:
>>>>
>>>>> Hi Muneem,
>>>>>
>>>>>
>>>>>
>>>>> I am not trying to be difficult, but the “problems” that you have
>>>>> described are {potentially} the result of implementation choices and/or C++
>>>>> limitations. What I am currently interested in is the problem that was
>>>>> presented before making those choices.
>>>>>
>>>>>
>>>>>
>>>>> Let me see if I can give an example. Someone says, I have a problem.
>>>>> Every time I insert on object into std::vector, I have to re-sort the
>>>>> vector. Obviously, they are using the wrong container. *But even if
>>>>> they switch to a std::map, we don’t know if the data truly needs to be
>>>>> sorted. Or how frequently it needs to be sorted.*
>>>>>
>>>>>
>>>>>
>>>>> You say, “You want to choose between three container objects but they
>>>>> don't have different type.”
>>>>>
>>>>>
>>>>>
>>>>> That could look like:
>>>>>
>>>>>
>>>>>
>>>>> std::vector<int> alpha;
>>>>>
>>>>> std::vector<int> beta;
>>>>>
>>>>> std::vector<int> gamma;
>>>>>
>>>>>
>>>>>
>>>>> This immediately raises the question, “Why three vectors?”
>>>>>
>>>>>
>>>>>
>>>>> You say, “You want to choose between any three objects but they have
>>>>> different types.”
>>>>>
>>>>>
>>>>>
>>>>> That could look like:
>>>>>
>>>>>
>>>>>
>>>>> class Alpha
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA ();
>>>>>
>>>>>
>>>>>
>>>>> private:
>>>>>
>>>>> // Some Alpha specific data
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> class Beta
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA ();
>>>>>
>>>>>
>>>>>
>>>>> private:
>>>>>
>>>>> // Some Beta specific data
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> class Gamma
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA ();
>>>>>
>>>>>
>>>>>
>>>>> private:
>>>>>
>>>>> // Some Gamma specific data
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> That immediately raises the question, “Will virtual functions work?”
>>>>>
>>>>>
>>>>>
>>>>> So I ask again, what is the problem (not the solution)?
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Steve
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 6:03 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> An extended reason to why this class of problems described in the
>>>>> previous emails exists is because the current constructs dosent give enough
>>>>> context to intermediate code generation, and solely rely on llvms or other
>>>>> backends to be advanced enough, which for languages don't make sense
>>>>> because language are meant to be less verbose and more explicit than a
>>>>> compiler backend code generation tools.
>>>>>
>>>>>
>>>>>
>>>>> (Really Sorry for sending two emails at one)
>>>>>
>>>>>
>>>>>
>>>>> Regards, Muneem.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 4:54 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>>>
>>>>> The actual class of problems is code repitition and obscurement of
>>>>> intent:
>>>>>
>>>>> 1.Say you want to choose between three container objects but they
>>>>> don't have different type.
>>>>>
>>>>> 2. Say you want to choose between any three objects but they have
>>>>> different types.
>>>>>
>>>>>
>>>>>
>>>>> The current fix for both, to allocate those objects in a
>>>>> std::vector/array of std::variant element type or to use switch statements.
>>>>> All of these obscure intent making it hard to optimize code in intermediate
>>>>> code generation. This is actually more common than we think, infact, we can
>>>>> destroy 99% of germs(branch statements) with function objects that can be
>>>>> indexed.
>>>>>
>>>>>
>>>>>
>>>>> In short the class of problems is using too many branch statements for
>>>>> everything that can't be indexed by current containers.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 4:24 am Steve Weinrich via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> Hi Muneem,
>>>>>
>>>>>
>>>>>
>>>>> You mention “this class of problems” I would like to know what that
>>>>> is? Please forget about a heterogenous list. What is the root problem
>>>>> that the heterogenous list solves? Please describe an actual problem, not
>>>>> a “it would be nice to solve.”
>>>>>
>>>>>
>>>>>
>>>>> To make this a little easier and to involve less people, here is my
>>>>> email: weinrich.steve_at_[hidden]
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 5:17 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> Thanks for your interested ❤️❤️❤️
>>>>>
>>>>> If you believe that any other solution maybe a better option for this
>>>>> class of problems then please let me know. In fact, we would collaborate on
>>>>> the proposal. This is not at odds with c++ because the (recommended)
>>>>> semantics is a construct that captures values using const references, as
>>>>> opposed to storing them directly as a struct would, and for the compiler to
>>>>> access the value by reference, inline code for each branch, or do what it
>>>>> thinks it best. It is similiar to the heterogeneous lists provided by GO
>>>>> but it isn't because it captures by const value reference, and references
>>>>> aren't the same as non const pointers, in the that they can be optimized
>>>>> like const pointer can't be. Sorry for writing too much, I just got too
>>>>> excited by someone taking openly interest in this proposal ❤️❤️❤️❤️.
>>>>>
>>>>>
>>>>>
>>>>> Regards, Muneem
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 4:11 am Steve Weinrich via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> Hi Muneem,
>>>>>
>>>>>
>>>>>
>>>>> Thanks. I am interested in the original problem. I am curious if a
>>>>> heterogeneous list is the optimal solution for the problem at hand. Over
>>>>> the last 50+ years of programming, I have encountered many questions like
>>>>> yours, which presume a particular solution. Sometimes that solution is at
>>>>> odds with the language at hand (or other issues). I find that
>>>>> understanding the original problem allows me to better understand why one
>>>>> is suggesting a particular language enhancement.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 5:03 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> >context for readers:second one was a class of problems fixed by
>>>>> heterogeneous lists.
>>>>>
>>>>> It's the second one, and I am really really sorry for the confusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 3:58 am Steve Weinrich via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> If I may, a heterogeneous list is not a problem, it is a solution to
>>>>> a problem. So there are now two possibilities:
>>>>>
>>>>> 1. You are trying to create a new std container to solve a class
>>>>> of problems.
>>>>> 2. You have some other problem that you have used a heterogenous
>>>>> list to solve.
>>>>>
>>>>>
>>>>>
>>>>> What say you?
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 4:54 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> Sorry, I meant "heterogeneous lists is the problem", so really really
>>>>> sorry for that mistake, It was an auto correct mistake.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 3:52 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>>>
>>>>> Heterogeneous problems is the problem that leads to the code bloat or
>>>>> verbosity that I described, but it's hard to chose weather chicken came
>>>>> first or the egg in this regard.
>>>>>
>>>>>
>>>>>
>>>>> Regards, Muneem
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 3:50 am Steve Weinrich via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> Hi Muneem,
>>>>>
>>>>>
>>>>>
>>>>> If you don’t mind, I would like to limit this portion of our
>>>>> interaction to simply describing the problem. I want to get a 100%
>>>>> understanding of the problem, unfettered by any assumptions (including C++
>>>>> limitations) or previous solutions. Is a heterogenous list actually the
>>>>> problem or is that simply a solution that you think fits the problem at
>>>>> hand?
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 4:40 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> Hi!
>>>>>
>>>>> Thanks for your response!!!
>>>>>
>>>>>
>>>>>
>>>>> Yes, your understanding of the problem is mostly correct with a few
>>>>> details left out. The alternative to using classes would be switch
>>>>> statements that as discussed would lead to code bloat and might backfire.
>>>>> The goal is to make a construct that makes the indexing into heterogenous
>>>>> lists more easy. There is how ever one thing that I would like to clarify:
>>>>>
>>>>> The goal isn't the classic polymorphism but to be able to return an
>>>>> object of any type through a specified interface(indexing heterogenous
>>>>> lists), which your example dosent do. Basically, just like GO and many
>>>>> other compiled languages support heterogenous lists, I want c++ to do so as
>>>>> well. Say you want index a bunch of containers(to use top()
>>>>> function),std::visit fails because it can't return any return type that you
>>>>> would wish for, so you can either use switch case to write expression top
>>>>> for each container or this long verbose technique:
>>>>>
>>>>> #include<vector>
>>>>>
>>>>> #include<deque>
>>>>>
>>>>> template<typename T>
>>>>>
>>>>> struct Base{
>>>>>
>>>>> virtual T top_wrapper();
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> template<typename T>
>>>>>
>>>>> struct Derived_1: Base<T>, std::vector<T>{
>>>>>
>>>>> T top_wrapper() override{
>>>>>
>>>>> return T{*this.top()};//T{} is just to show that it works
>>>>> even if top had some other return type
>>>>>
>>>>> }
>>>>>
>>>>> };
>>>>>
>>>>> template<typename T>
>>>>>
>>>>> struct Derived_2: Base<T>, std:: deque<T>{
>>>>>
>>>>> T top_wrapper() override{
>>>>>
>>>>> return T{*this.top()};//T{} is just to show that it works
>>>>> even if top had some other return type
>>>>>
>>>>> }
>>>>>
>>>>> };
>>>>>
>>>>> int main(){
>>>>>
>>>>> std::vector<Base<int>> a;
>>>>>
>>>>> //The compiler would probably optimize this example, but not an
>>>>> example where you index the vector using real time input
>>>>>
>>>>> return 0;
>>>>>
>>>>> }
>>>>>
>>>>> //An vector of std::variant only works if I am willing to write a
>>>>> helper top(std::variant<Args...> obj) function that includes std::visit to
>>>>> call top(), that in of itself is not only verbose but obscures intent and
>>>>> context.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards, Muneem.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 2:43 am Steve Weinrich via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> Hi Muneem,
>>>>>
>>>>>
>>>>>
>>>>> I would like to make sure that I understand this problem before going
>>>>> on.
>>>>>
>>>>>
>>>>>
>>>>> I think there are several classes that have a portion (or all) of
>>>>> their interface in common. Each method of the interface returns a constant
>>>>> value:
>>>>>
>>>>>
>>>>>
>>>>> class First
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA () const { return 1123; }
>>>>>
>>>>> int funcB () const ( return 1234; }
>>>>>
>>>>> int funcC () const { return 1456; }
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> class Second
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA () const { return 2123; }
>>>>>
>>>>> int funcB () const ( return 2234; }
>>>>>
>>>>> int funcC () const { return 2456; }
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> class Third
>>>>>
>>>>> {
>>>>>
>>>>> public:
>>>>>
>>>>> int funcA () const { return 3123; }
>>>>>
>>>>> int funcB () const ( return 3234; }
>>>>>
>>>>> int funcC () const { return 3456; }
>>>>>
>>>>> };
>>>>>
>>>>>
>>>>>
>>>>> 1. We would like a means to be able to add more classes easily.
>>>>> 2. We would like a means to be able to add to the shared interface
>>>>> easily.
>>>>> 3. We would like to be able to use the shared interface in a
>>>>> polymorphic way (like a virtual method).
>>>>> 4. Performance is of the utmost importance.
>>>>>
>>>>>
>>>>>
>>>>> Is my understanding correct?
>>>>>
>>>>>
>>>>>
>>>>> Steve
>>>>>
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On
>>>>> Behalf Of *Muneem via Std-Proposals
>>>>> *Sent:* Friday, April 3, 2026 1:54 PM
>>>>> *To:* std-proposals_at_[hidden]
>>>>> *Cc:* Muneem <itfllow123_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Fwd: Extension to runtime polymorphism
>>>>> proposed
>>>>>
>>>>>
>>>>>
>>>>> Sorry for sending two emails at once!
>>>>>
>>>>> I just wanted to revise the fact that the point of the whole proposal
>>>>> is to provide intent, the code that Mr. Maciera was kind enough to bring
>>>>> forward proves my exact point, that with enough intent, the compiler can
>>>>> optimize anythjng, and these optimizations grow larger as the scale of the
>>>>> program grows larger. Microbenchmarks might show a single example but even
>>>>> that single example should get us thinking that why is it so slow for this
>>>>> one example? Does this overhead force people to write switch case
>>>>> statements that can lead to code bloat which can again backfire in terms of
>>>>> performance?
>>>>>
>>>>> Regards, Muneem.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 4 Apr 2026, 12:48 am Muneem, <itfllow123_at_[hidden]> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Thanks again for your feedback, Macieira. 👍
>>>>>
>>>>> >micro benchmark is misleading
>>>>>
>>>>> 1. The reason that I gave you microbenchmarks is that some asked for
>>>>> it, and even I was too relectunt to use them despite the quote of Bjarne
>>>>> Stroustrups
>>>>>
>>>>> "Don't assume, measure" because in this case, the goal is to either
>>>>> make the compiler smaller or runtime faster, both of which are targeted by
>>>>> my new proposal.
>>>>>
>>>>> 2. You are right that the compiler might have folded the loop into
>>>>> half, but the point is that it still shows that the observable behaviour
>>>>> is the same, infact, if the loop body was to index into a heterogeneous
>>>>> set(using the proposed construct) and do some operation then the compiler
>>>>> would optimize the indexing if the source of the index is one. This proves
>>>>> that intent. An help the compiler do wonders:
>>>>>
>>>>> 1.Fold loops even when I used volatile to avoid it.
>>>>>
>>>>> 2.Avoid the entire indexing operations (if in a loop with the most
>>>>> minimal compile time overhead)
>>>>>
>>>>> 3. Store the result immediately after it takes input into some memory
>>>>> location (if that solution is the fasted).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 3.Optimize a single expression for the sake of the whole program.
>>>>>
>>>>> Currently, the optimizer might in fact be able to optimize checks in a
>>>>> loop, but it's not as easy or as gurrentied because there are no semantical
>>>>> promises that we can make with the existing constructs to make it happen.
>>>>>
>>>>> 4.My main point isn't weather my benchmark is correct or wrong, but
>>>>> rather that expressing intent is better. The bench mark was merely to show
>>>>> that std::visit is slower (according to g++ and Microsoft visual studio
>>>>> 2026 compiled programs, using std::chorno and visual studio 2026 CPU usage
>>>>> measurement tools to prove my point), but even if some compiler or all
>>>>> compilers optimize their performance; we still have compile time overhead
>>>>> for taking std::visit and making it faster, and the optimization might
>>>>> backfire since it would be to optimize single statements independent of
>>>>> what's in the rest of the program. Why? Because unlike my proposed
>>>>> construct, std::visit does not have enough context and intent to tell the
>>>>> compiler what's going on so that it can generate code that has the exact
>>>>> "book keeping" data and access code that fits the entire program.
>>>>>
>>>>>
>>>>>
>>>>> 3. In case, someone's think a few nano seconds in a single example
>>>>> isn't a big deal, then rethink it because if my construct is passed then
>>>>> yes, it would not be a big deal because the compiler can optimize many
>>>>> indexing operations into a single heterogenous set and maybe cache the
>>>>> result afterwards somewhere. The issue is that this can't be done with the
>>>>> current techniques because of the lack of intent. Compilers are much
>>>>> smarter than we could ever be because they are work of many people's entire
>>>>> career, not just one very smart guy from Intel, so blaming/restricting
>>>>> compilers whose job is to be as general for the sake of the whole program.
>>>>>
>>>>> 4.>I suppose it decided to unroll the loop a >bit
>>>>>
>>>>> >and made two calls to sink() per loop:
>>>>>
>>>>> >template <typename T> void sink(const T >&) { asm volatile("" :::
>>>>> "memory"); }
>>>>>
>>>>> Even if it optimized switch case statement using volatile("" :::
>>>>> "memory"); but not std::visit
>>>>>
>>>>> That's my point isn't that switch case is magically faster, but rather
>>>>> the compiler has more room to cheat and skip things. Infact the standard
>>>>> allows it a lot of free room as long as the observable behaviour is the
>>>>> same, even more so by giving it free room with sets of observable
>>>>> behaviours (unspecified behaviours)
>>>>>
>>>>> 5. Microbe marking wasent to show that std::visit is inherintly
>>>>> slower, but rather the compiler can and should do mistakes in optimizing
>>>>> it, in order to avoid massive compile time overhead.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 3 Apr 2026, 8:33 pm Thiago Macieira via Std-Proposals, <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>>
>>>>> On Thursday, 2 April 2026 19:15:42 Pacific Daylight Time Thiago
>>>>> Macieira via
>>>>> Std-Proposals wrote:
>>>>> > Even in this case, I have profiled the code above (after fixing it
>>>>> and
>>>>> > removing the std::cout itself) and found that overall, the switched
>>>>> case
>>>>> > ran 2x faster, at 0.113 ns per iteration, while the variant case
>>>>> required
>>>>> > 0.227 ns per iteration. Looking at the CPU performance counters, the
>>>>> > std::variant code has 2 branches per iteration and takes 1 cycle per
>>>>> > iteration, running at 5 IPC (thus, 5 instructions per iteration).
>>>>> > Meanwhile, the switched case has 0.5 branch per iteration and takes
>>>>> 0.5
>>>>> > cycle per iteration, running at 2 IPC. The half cycle numbers make
>>>>> sense
>>>>> > because I believe the two instructions are getting macrofused
>>>>> together and
>>>>> > execute as a single uop, which causes confusing numbers.
>>>>>
>>>>> This half a cycle and ninth of a nanosecond problem has been on my
>>>>> mind for a
>>>>> while. The execution time of anything needs to be a multiple of the
>>>>> cycle
>>>>> time, so a CPU running at 4.5 GHz line mine was shouldn't have a
>>>>> difference of
>>>>> one ninth of a nanosecond. One explanation would be that somehow the
>>>>> CPU was
>>>>> executing two iterations of the loop at the same time, pipelining.
>>>>>
>>>>> But disassembling the binary shows a simpler explanation. The switch
>>>>> loop was:
>>>>>
>>>>> 40149f: mov $0x3b9aca00,%eax
>>>>> 4014a4: nop
>>>>> 4014a5: data16 cs nopw 0x0(%rax,%rax,1)
>>>>> 4014b0: sub $0x2,%eax
>>>>> 4014b3: jne 4014b0
>>>>>
>>>>> [Note how there is no test for what was being indexed in the loop!]
>>>>>
>>>>> Here's what I had missed: sub $2. I'm not entirely certain what GCC
>>>>> was
>>>>> thinking here, but it's subtracting 2 instead of 1, so this looped
>>>>> half a
>>>>> billion times (0x3b9aca00 / 2). I suppose it decided to unroll the
>>>>> loop a bit
>>>>> and made two calls to sink() per loop:
>>>>>
>>>>> template <typename T> void sink(const T &) { asm volatile("" :::
>>>>> "memory"); }
>>>>>
>>>>> But that expanded to nothing in the output. I could add "nop" so we'd
>>>>> see what
>>>>> happened and the CPU would be obligated to retire those instructions,
>>>>> increasing the instruction executed counter (I can't quickly find how
>>>>> many the
>>>>> TGL processor / WLC core can retire per cycle, but I recall it's 6, so
>>>>> adding
>>>>> 2 more instructions shouldn't affect the execution time). But I don't
>>>>> think I
>>>>> need to further benchmark this to prove my point:
>>>>>
>>>>> The microbenchmark is misleading.
>>>>>
>>>>> --
>>>>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>>>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>> --
>>>> Std-Proposals mailing list
>>>> Std-Proposals_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2026-04-04 02:32:26