C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Fwd: Extension to runtime polymorphism proposed

From: Muneem <itfllow123_at_[hidden]>
Date: Fri, 3 Apr 2026 00:18:25 +0500
>why is std::variant object static
I made the std::variant static because the array that I was getting was
also a global variable, infact even when I make the class objects return an
array variable instead of a constant in that std::variant, the rise in
latency is almost 0 because the latency of std::visit itself is around 470
ns. Again, context and intent are the golden words for any compiler.
regards, Muneem

On Fri, Apr 3, 2026 at 12:14 AM Muneem <itfllow123_at_[hidden]> wrote:

> So, I am sorry for confusing you, but the think is that currently, you
> need std::visit or branching to index homogeneous values with a runtime
> index, this means that the context and intent given to the compiler is
> minimal, where as for something like template inlining, the compiler has a
> lot of context and intent, from compiler optimization flags to
> instantiation time (which can be delayed till link time for the linker,
> which can lead to even better optimizations), to the context of defining
> the template definition, to the context of argument dependent lookup even,
> which for templates is even more flexible since it can loop for it in any
> namespace to find the most perfect match, which again gives the compiler
> more information. The goal is to give compiler context and intent so that
> it can find the best time to optimize (any phase in compilation or
> linkage), and it knows as much as possible on how to optimize. here are the
> bench marks in visual studio 2026
> Time (ns) for switch: 97
> Time (ns) for visit: 534
> Time (ns) for ternary: 41
> Time (ns) for subscript: 61
>
> On Thu, Apr 2, 2026 at 3:02 PM Marcin Jaczewski <
> marcinjaczewski86_at_[hidden]> wrote:
>
>> czw., 2 kwi 2026 o 11:45 Muneem <itfllow123_at_[hidden]> napisał(a):
>> >
>> > That's the point, std variant isn't usable here, nor is any other
>> polymorphic technique, hence we need the features that I described in the
>> standard. The results consistently show that subscription and ternary
>> statements can be easier to optimize than the others, which show the need
>> for a new construct to deal with homegenous sets of values. Current
>> branching techniques work, but can lead to verbose code, is adhoc, and is
>> just not efficient when compared to subcripting; thats why we need
>> subscripting like capabilities for homegenous sets of values. Context is
>> the most important for a compiler because it can reorder expressions in
>> that context in many ways (as long as the observable behaviour is the same,
>> refer to the standard for a defintion of "observable behaviour" if you
>> would like).
>> > Std::variant and polymorphism fail at fixing everything, and I think
>> that we can fix this specific issue of homogenous lists with this new
>> feature. The goal is to give the compiler context and intent.
>> > Regards, Muneem.
>> >
>>
>> You repeat only buzzwords here. I ask for real life examples of code
>> that have this problem and how your "solution" fix it exactly.
>> It looks like an XY problem. Like, you have a contradiction there:
>> "polymorphic" and "homegenous".
>> Why do you try to use techine that is for different kinds of problems
>> and claim it does not work?
>> Did you try to use proper tools for problems you have?
>>
>> > On Thu, 2 Apr 2026, 1:11 pm Marcin Jaczewski, <
>> marcinjaczewski86_at_[hidden]> wrote:
>> >>
>> >> czw., 2 kwi 2026 o 09:05 Muneem via Std-Proposals
>> >> <std-proposals_at_[hidden]> napisał(a):
>> >> >
>> >> > hi!
>> >> > Your point is partly correct, but this issue is quite prevalent,
>> below are my branches from multiple sources:
>> >> > this is the updated code:
>> >> > #include <variant>
>> >> > #include <iostream>
>> >> > #include <chrono>
>> >> > #include <ctime>
>> >> > #include <iomanip>
>> >> > #include<array>
>> >> > std::array<int, 3> array_1={1,2,3};
>> >> >
>> >> > struct A { int get() { return array_1[0]; } };
>> >> > struct B { int get() { return array_1[1]; } };
>> >> > struct C { int get() { return array_1[2]; } };
>> >> >
>> >> > struct data_accessed_through_visit {
>> >> > static std::variant<A, B, C> obj;
>> >> >
>> >> > inline int operator()(int) {
>> >> > return std::visit([](auto&& arg) {
>> >> > return arg.get();
>> >> > }, obj);
>> >> > }
>> >> > };
>> >> > std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>> >> > int user_index = 0;
>> >>
>> >> I can hold you right now, you use a static object for the
>> >> `std::variant`, this means you test different thing than in other
>> >> cases.
>> >> This can affect code gen and what optimizer can see and guarantee.
>> >>
>> >> Besides, why use `variant` here? It was not designed for this but to
>> >> store different types in the same memory location.
>> >> And none of this is used there, only making it harder to compiler to
>> >> see what is going on.
>> >>
>> >> Why not use simple pointers there? And if you need to do some
>> >> calculations function pointers?
>> >>
>> >> >
>> >> > struct data_ternary {
>> >> > inline int operator()(int index) {
>> >> > return (index == 0) ? array_1[0] : (index == 1) ? array_1[1]
>> : (index == 1) ? array_1[2] : -1;
>> >> > }
>> >> > };
>> >> >
>> >> > struct data_switched {
>> >> > inline int operator()(int index) {
>> >> > switch(index) {
>> >> > case 0: return array_1[0];
>> >> > case 1: return array_1[1];
>> >> > case 2: return array_1[2];
>> >> > default: return -1;
>> >> > }
>> >> > }
>> >> > };
>> >> >
>> >> > struct data_indexing {
>> >> > inline int operator()(int index) {
>> >> > return array_1[index];
>> >> > }
>> >> > };
>> >> >
>> >> >
>> >> >
>> >> > volatile int x = 0;
>> >> > constexpr uint64_t loop_count=10000;
>> >> > static void measure_switch() {
>> >> > data_switched obj;
>> >> > for (int i=0; i++<loop_count;) {
>> >> > x = obj(user_index);
>> >> > }
>> >> > }
>> >> >
>> >> > static void measure_visit() {
>> >> > data_accessed_through_visit obj;
>> >> > for (int i=0; i++<loop_count;) {
>> >> > x = obj(user_index);
>> >> > }
>> >> > }
>> >>
>> >> Why not use:
>> >> ```
>> >> static void measure_visit() {
>> >> data_accessed_through_visit obj
>> >> return std::visit([](auto&& arg) {
>> >> for (int i=0; i++<loop_count;) {
>> >> x = arg.get();
>> >> }
>> >> }, obj);
>> >> }
>> >> ```
>> >> And this is more similar to what the compiler sees in other test cases.
>> >>
>> >> >
>> >> > static void measure_ternary() {
>> >> > data_ternary obj;
>> >> > for (int i=0; i++<loop_count;) {
>> >> > x = obj(user_index);
>> >> > }
>> >> > }
>> >> > static void measure_indexing() {
>> >> > data_indexing obj;
>> >> > for (int i=0; i++<loop_count;) {
>> >> > x = obj(user_index);
>> >> > }
>> >> > }
>> >> >
>> >> > template<typename func_t>
>> >> > void call_func(func_t callable_obj, int arg){
>> >> > const auto start = std::chrono::steady_clock::now();
>> >> >
>> >> > constexpr int how_much_to_loop=1000;
>> >> > for(int i=0; i++<how_much_to_loop;){
>> >> > callable_obj();
>> >> > }
>> >> > const auto end = std::chrono::steady_clock::now();
>> >> > auto result=
>> std::chrono::duration_cast<std::chrono::nanoseconds>(end -
>> start).count()/how_much_to_loop;
>> >> > std::cout<<result/how_much_to_loop<<std::endl;
>> >> >
>> >> > }
>> >> >
>> >> > int main() {
>> >> > std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>> >> > if (!(std::cin >> user_index)) return 1;
>> >> >
>> >> > // Set the variant state
>> >> > if (user_index == 0) data_accessed_through_visit::obj = A{};
>> >> > else if (user_index == 1) data_accessed_through_visit::obj = B{};
>> >> > else if (user_index == 2) data_accessed_through_visit::obj = C{};
>> >> >
>> >> > std::cout << "Time (ns) for switch: ";
>> >> > call_func(measure_switch, user_index);
>> >> >
>> >> > std::cout << "Time (ns) for visit: ";
>> >> > call_func(measure_visit, user_index);
>> >> >
>> >> > std::cout << "Time (ns) for ternary: ";
>> >> > call_func(measure_ternary, user_index);
>> >> >
>> >> > std::cout << "Time (ns) for subscript: ";
>> >> > call_func(measure_indexing, user_index);
>> >> >
>> >> > return 0;
>> >> > }
>> >> > the bench marks consistently show that these syntax constructs do
>> matter (the smaller the index range is, the more the compiler can flatten
>> it and know how to branch), notice how ternary is outperforming them all
>> even though its nesting, This means that adding new syntax with the sole
>> purpose to give compilers as much information as possible is actually
>> useful.
>> >>
>> >> And what do you propose here exactly? diffrent structures behave
>> differently.
>> >> How exactly it will look and work, right now you use lot of vague
>> >> statement that do not show anything.
>> >>
>> >> > Consider how templates and instantiation give the compiler extra
>> insight. why? because templates are instantiated at the point of
>> instantiation which can be delayed upto link time. these are the benchmarks:
>> >>
>> >> what? did you confuse it with Link Time Optimization? instantiation is
>> >> in the Transition Unit not at link time as this would be too late.
>> >> We have Extern Templates but then compiler in TU becomes blind to that
>> >> code as it instantiationed in other TU, LTO can help there but
>> >> then we have exactly the same case if we use normal functions.
>> >>
>> >> > benchmarks for g++:
>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>> >> > Time (ns) for switch: 33
>> >> > Time (ns) for visit: 278
>> >> > Time (ns) for ternary: 19
>> >> > Time (ns) for subscript: 34
>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>> >> > Time (ns) for switch: 33
>> >> > Time (ns) for visit: 296
>> >> > Time (ns) for ternary: 20
>> >> > Time (ns) for subscript: 35
>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>> >> > Time (ns) for switch: 34
>> >> > Time (ns) for visit: 271
>> >> > Time (ns) for ternary: 17
>> >> > Time (ns) for subscript: 33
>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>> >> > Time (ns) for switch: 34
>> >> > Time (ns) for visit: 281
>> >> > Time (ns) for ternary: 19
>> >> > Time (ns) for subscript: 32
>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>> >> > Time (ns) for switch: 34
>> >> > Time (ns) for visit: 282
>> >> > Time (ns) for ternary: 20
>> >> > Time (ns) for subscript: 34
>> >> > I really have to go to sleep now ( I am having some issues with
>> visual studio 2026), I Hope, it would be acceptable for me to send the
>> benchmarks for that tomorrow.
>> >> >
>> >> > regards, Muneem
>> >> >
>> >> >
>> >> > On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>> >> >>
>> >> >> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem
>> via Std-
>> >> >> Proposals wrote:
>> >> >> > /*
>> >> >> > Time (ns) for switch: 168100
>> >> >> > Time (ns) for visit: 3664100
>> >> >> > Time (ns) for ternary: 190900
>> >> >> > It keeps on getting worse!
>> >> >> > */
>> >> >>
>> >> >> So far you've maybe shown that one implementation is generating bad
>> code. Have
>> >> >> you tried others?
>> >> >>
>> >> >> You need to prove that this is an inherent and unavoidable problem
>> of the
>> >> >> requirements, not that it just happened to be bad for this
>> implementation.
>> >> >> Just quickly reading the proposed benchmark code, it would seem
>> there's no
>> >> >> such inherent reason and you're making an unfounded and probably
>> incorrect
>> >> >> assumption about how things actually work.
>> >> >>
>> >> >> In fact, I pasted a portion of your code into godbolt just to see
>> what the
>> >> >> variant visit code, which you claim to be unnecessarily slow, would
>> look like:
>> >> >> https://gcc.godbolt.org/z/WK5bMzcae
>> >> >>
>> >> >> The first thing to note in the GCC/libstdc++ pane is that it does
>> not use
>> >> >> user_index. The compiler thinks it's a constant, meaning this
>> benchmark is
>> >> >> faulty. And thus it has constant-propagated this value and is
>> *incredibly*
>> >> >> efficient in doing nothing useful. MSVC did likewise.
>> >> >>
>> >> >> Since MSVC outputs the out-of-line copy of inlined functions, we
>> can see the
>> >> >> operator() expansion without the proapagation of the user_index
>> constant. And
>> >> >> it's no different than what a ternary or switch would look like.
>> >> >>
>> >> >> In the Clang/libc++ pane, we see indirect function calls. I don't
>> know why
>> >> >> libc++ std::variant is implemented this way, but it could be why it
>> is slow
>> >> >> for you if you're using this implementation. If you tell Clang to
>> instead use
>> >> >> libstdc++ (remove the last argument of the command-line), the
>> indirect
>> >> >> function call disappears and we see an unrolled loop of loading the
>> value 10.
>> >> >> That would mean Clang is even more efficient at doing nothing.
>> >> >>
>> >> >> Conclusion: it looks like your assumption that there is a problem
>> to be solved
>> >> >> is faulty. There is no problem.
>> >> >>
>> >> >> --
>> >> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>> >> >> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>> >> >> --
>> >> >> Std-Proposals mailing list
>> >> >> Std-Proposals_at_[hidden]
>> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >> >
>> >> > --
>> >> > Std-Proposals mailing list
>> >> > Std-Proposals_at_[hidden]
>> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>

Received on 2026-04-02 19:18:39