Date: Fri, 3 Apr 2026 00:20:04 +0500
std::array<int, 3> array_1 = { 1,2,3 }; is also global variable, but
the ternary/subcript operator statements dont have a hard time
figuring it out.
regards, Muneem.
On Fri, Apr 3, 2026 at 12:18 AM Muneem <itfllow123_at_[hidden]> wrote:
> >why is std::variant object static
> I made the std::variant static because the array that I was getting was
> also a global variable, infact even when I make the class objects return an
> array variable instead of a constant in that std::variant, the rise in
> latency is almost 0 because the latency of std::visit itself is around 470
> ns. Again, context and intent are the golden words for any compiler.
> regards, Muneem
>
> On Fri, Apr 3, 2026 at 12:14 AM Muneem <itfllow123_at_[hidden]> wrote:
>
>> So, I am sorry for confusing you, but the think is that currently, you
>> need std::visit or branching to index homogeneous values with a runtime
>> index, this means that the context and intent given to the compiler is
>> minimal, where as for something like template inlining, the compiler has a
>> lot of context and intent, from compiler optimization flags to
>> instantiation time (which can be delayed till link time for the linker,
>> which can lead to even better optimizations), to the context of defining
>> the template definition, to the context of argument dependent lookup even,
>> which for templates is even more flexible since it can loop for it in any
>> namespace to find the most perfect match, which again gives the compiler
>> more information. The goal is to give compiler context and intent so that
>> it can find the best time to optimize (any phase in compilation or
>> linkage), and it knows as much as possible on how to optimize. here are the
>> bench marks in visual studio 2026
>> Time (ns) for switch: 97
>> Time (ns) for visit: 534
>> Time (ns) for ternary: 41
>> Time (ns) for subscript: 61
>>
>> On Thu, Apr 2, 2026 at 3:02 PM Marcin Jaczewski <
>> marcinjaczewski86_at_[hidden]> wrote:
>>
>>> czw., 2 kwi 2026 o 11:45 Muneem <itfllow123_at_[hidden]> napisał(a):
>>> >
>>> > That's the point, std variant isn't usable here, nor is any other
>>> polymorphic technique, hence we need the features that I described in the
>>> standard. The results consistently show that subscription and ternary
>>> statements can be easier to optimize than the others, which show the need
>>> for a new construct to deal with homegenous sets of values. Current
>>> branching techniques work, but can lead to verbose code, is adhoc, and is
>>> just not efficient when compared to subcripting; thats why we need
>>> subscripting like capabilities for homegenous sets of values. Context is
>>> the most important for a compiler because it can reorder expressions in
>>> that context in many ways (as long as the observable behaviour is the same,
>>> refer to the standard for a defintion of "observable behaviour" if you
>>> would like).
>>> > Std::variant and polymorphism fail at fixing everything, and I think
>>> that we can fix this specific issue of homogenous lists with this new
>>> feature. The goal is to give the compiler context and intent.
>>> > Regards, Muneem.
>>> >
>>>
>>> You repeat only buzzwords here. I ask for real life examples of code
>>> that have this problem and how your "solution" fix it exactly.
>>> It looks like an XY problem. Like, you have a contradiction there:
>>> "polymorphic" and "homegenous".
>>> Why do you try to use techine that is for different kinds of problems
>>> and claim it does not work?
>>> Did you try to use proper tools for problems you have?
>>>
>>> > On Thu, 2 Apr 2026, 1:11 pm Marcin Jaczewski, <
>>> marcinjaczewski86_at_[hidden]> wrote:
>>> >>
>>> >> czw., 2 kwi 2026 o 09:05 Muneem via Std-Proposals
>>> >> <std-proposals_at_[hidden]> napisał(a):
>>> >> >
>>> >> > hi!
>>> >> > Your point is partly correct, but this issue is quite prevalent,
>>> below are my branches from multiple sources:
>>> >> > this is the updated code:
>>> >> > #include <variant>
>>> >> > #include <iostream>
>>> >> > #include <chrono>
>>> >> > #include <ctime>
>>> >> > #include <iomanip>
>>> >> > #include<array>
>>> >> > std::array<int, 3> array_1={1,2,3};
>>> >> >
>>> >> > struct A { int get() { return array_1[0]; } };
>>> >> > struct B { int get() { return array_1[1]; } };
>>> >> > struct C { int get() { return array_1[2]; } };
>>> >> >
>>> >> > struct data_accessed_through_visit {
>>> >> > static std::variant<A, B, C> obj;
>>> >> >
>>> >> > inline int operator()(int) {
>>> >> > return std::visit([](auto&& arg) {
>>> >> > return arg.get();
>>> >> > }, obj);
>>> >> > }
>>> >> > };
>>> >> > std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>>> >> > int user_index = 0;
>>> >>
>>> >> I can hold you right now, you use a static object for the
>>> >> `std::variant`, this means you test different thing than in other
>>> >> cases.
>>> >> This can affect code gen and what optimizer can see and guarantee.
>>> >>
>>> >> Besides, why use `variant` here? It was not designed for this but to
>>> >> store different types in the same memory location.
>>> >> And none of this is used there, only making it harder to compiler to
>>> >> see what is going on.
>>> >>
>>> >> Why not use simple pointers there? And if you need to do some
>>> >> calculations function pointers?
>>> >>
>>> >> >
>>> >> > struct data_ternary {
>>> >> > inline int operator()(int index) {
>>> >> > return (index == 0) ? array_1[0] : (index == 1) ?
>>> array_1[1] : (index == 1) ? array_1[2] : -1;
>>> >> > }
>>> >> > };
>>> >> >
>>> >> > struct data_switched {
>>> >> > inline int operator()(int index) {
>>> >> > switch(index) {
>>> >> > case 0: return array_1[0];
>>> >> > case 1: return array_1[1];
>>> >> > case 2: return array_1[2];
>>> >> > default: return -1;
>>> >> > }
>>> >> > }
>>> >> > };
>>> >> >
>>> >> > struct data_indexing {
>>> >> > inline int operator()(int index) {
>>> >> > return array_1[index];
>>> >> > }
>>> >> > };
>>> >> >
>>> >> >
>>> >> >
>>> >> > volatile int x = 0;
>>> >> > constexpr uint64_t loop_count=10000;
>>> >> > static void measure_switch() {
>>> >> > data_switched obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> >
>>> >> > static void measure_visit() {
>>> >> > data_accessed_through_visit obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >>
>>> >> Why not use:
>>> >> ```
>>> >> static void measure_visit() {
>>> >> data_accessed_through_visit obj
>>> >> return std::visit([](auto&& arg) {
>>> >> for (int i=0; i++<loop_count;) {
>>> >> x = arg.get();
>>> >> }
>>> >> }, obj);
>>> >> }
>>> >> ```
>>> >> And this is more similar to what the compiler sees in other test
>>> cases.
>>> >>
>>> >> >
>>> >> > static void measure_ternary() {
>>> >> > data_ternary obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> > static void measure_indexing() {
>>> >> > data_indexing obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> >
>>> >> > template<typename func_t>
>>> >> > void call_func(func_t callable_obj, int arg){
>>> >> > const auto start = std::chrono::steady_clock::now();
>>> >> >
>>> >> > constexpr int how_much_to_loop=1000;
>>> >> > for(int i=0; i++<how_much_to_loop;){
>>> >> > callable_obj();
>>> >> > }
>>> >> > const auto end = std::chrono::steady_clock::now();
>>> >> > auto result=
>>> std::chrono::duration_cast<std::chrono::nanoseconds>(end -
>>> start).count()/how_much_to_loop;
>>> >> > std::cout<<result/how_much_to_loop<<std::endl;
>>> >> >
>>> >> > }
>>> >> >
>>> >> > int main() {
>>> >> > std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>>> >> > if (!(std::cin >> user_index)) return 1;
>>> >> >
>>> >> > // Set the variant state
>>> >> > if (user_index == 0) data_accessed_through_visit::obj = A{};
>>> >> > else if (user_index == 1) data_accessed_through_visit::obj =
>>> B{};
>>> >> > else if (user_index == 2) data_accessed_through_visit::obj =
>>> C{};
>>> >> >
>>> >> > std::cout << "Time (ns) for switch: ";
>>> >> > call_func(measure_switch, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for visit: ";
>>> >> > call_func(measure_visit, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for ternary: ";
>>> >> > call_func(measure_ternary, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for subscript: ";
>>> >> > call_func(measure_indexing, user_index);
>>> >> >
>>> >> > return 0;
>>> >> > }
>>> >> > the bench marks consistently show that these syntax constructs do
>>> matter (the smaller the index range is, the more the compiler can flatten
>>> it and know how to branch), notice how ternary is outperforming them all
>>> even though its nesting, This means that adding new syntax with the sole
>>> purpose to give compilers as much information as possible is actually
>>> useful.
>>> >>
>>> >> And what do you propose here exactly? diffrent structures behave
>>> differently.
>>> >> How exactly it will look and work, right now you use lot of vague
>>> >> statement that do not show anything.
>>> >>
>>> >> > Consider how templates and instantiation give the compiler extra
>>> insight. why? because templates are instantiated at the point of
>>> instantiation which can be delayed upto link time. these are the benchmarks:
>>> >>
>>> >> what? did you confuse it with Link Time Optimization? instantiation is
>>> >> in the Transition Unit not at link time as this would be too late.
>>> >> We have Extern Templates but then compiler in TU becomes blind to that
>>> >> code as it instantiationed in other TU, LTO can help there but
>>> >> then we have exactly the same case if we use normal functions.
>>> >>
>>> >> > benchmarks for g++:
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 33
>>> >> > Time (ns) for visit: 278
>>> >> > Time (ns) for ternary: 19
>>> >> > Time (ns) for subscript: 34
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 33
>>> >> > Time (ns) for visit: 296
>>> >> > Time (ns) for ternary: 20
>>> >> > Time (ns) for subscript: 35
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 271
>>> >> > Time (ns) for ternary: 17
>>> >> > Time (ns) for subscript: 33
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 281
>>> >> > Time (ns) for ternary: 19
>>> >> > Time (ns) for subscript: 32
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 282
>>> >> > Time (ns) for ternary: 20
>>> >> > Time (ns) for subscript: 34
>>> >> > I really have to go to sleep now ( I am having some issues with
>>> visual studio 2026), I Hope, it would be acceptable for me to send the
>>> benchmarks for that tomorrow.
>>> >> >
>>> >> > regards, Muneem
>>> >> >
>>> >> >
>>> >> > On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>>> std-proposals_at_[hidden]> wrote:
>>> >> >>
>>> >> >> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem
>>> via Std-
>>> >> >> Proposals wrote:
>>> >> >> > /*
>>> >> >> > Time (ns) for switch: 168100
>>> >> >> > Time (ns) for visit: 3664100
>>> >> >> > Time (ns) for ternary: 190900
>>> >> >> > It keeps on getting worse!
>>> >> >> > */
>>> >> >>
>>> >> >> So far you've maybe shown that one implementation is generating
>>> bad code. Have
>>> >> >> you tried others?
>>> >> >>
>>> >> >> You need to prove that this is an inherent and unavoidable problem
>>> of the
>>> >> >> requirements, not that it just happened to be bad for this
>>> implementation.
>>> >> >> Just quickly reading the proposed benchmark code, it would seem
>>> there's no
>>> >> >> such inherent reason and you're making an unfounded and probably
>>> incorrect
>>> >> >> assumption about how things actually work.
>>> >> >>
>>> >> >> In fact, I pasted a portion of your code into godbolt just to see
>>> what the
>>> >> >> variant visit code, which you claim to be unnecessarily slow,
>>> would look like:
>>> >> >> https://gcc.godbolt.org/z/WK5bMzcae
>>> >> >>
>>> >> >> The first thing to note in the GCC/libstdc++ pane is that it does
>>> not use
>>> >> >> user_index. The compiler thinks it's a constant, meaning this
>>> benchmark is
>>> >> >> faulty. And thus it has constant-propagated this value and is
>>> *incredibly*
>>> >> >> efficient in doing nothing useful. MSVC did likewise.
>>> >> >>
>>> >> >> Since MSVC outputs the out-of-line copy of inlined functions, we
>>> can see the
>>> >> >> operator() expansion without the proapagation of the user_index
>>> constant. And
>>> >> >> it's no different than what a ternary or switch would look like.
>>> >> >>
>>> >> >> In the Clang/libc++ pane, we see indirect function calls. I don't
>>> know why
>>> >> >> libc++ std::variant is implemented this way, but it could be why
>>> it is slow
>>> >> >> for you if you're using this implementation. If you tell Clang to
>>> instead use
>>> >> >> libstdc++ (remove the last argument of the command-line), the
>>> indirect
>>> >> >> function call disappears and we see an unrolled loop of loading
>>> the value 10.
>>> >> >> That would mean Clang is even more efficient at doing nothing.
>>> >> >>
>>> >> >> Conclusion: it looks like your assumption that there is a problem
>>> to be solved
>>> >> >> is faulty. There is no problem.
>>> >> >>
>>> >> >> --
>>> >> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> >> >> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> >> >> --
>>> >> >> Std-Proposals mailing list
>>> >> >> Std-Proposals_at_[hidden]
>>> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>> >> >
>>> >> > --
>>> >> > Std-Proposals mailing list
>>> >> > Std-Proposals_at_[hidden]
>>> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
the ternary/subcript operator statements dont have a hard time
figuring it out.
regards, Muneem.
On Fri, Apr 3, 2026 at 12:18 AM Muneem <itfllow123_at_[hidden]> wrote:
> >why is std::variant object static
> I made the std::variant static because the array that I was getting was
> also a global variable, infact even when I make the class objects return an
> array variable instead of a constant in that std::variant, the rise in
> latency is almost 0 because the latency of std::visit itself is around 470
> ns. Again, context and intent are the golden words for any compiler.
> regards, Muneem
>
> On Fri, Apr 3, 2026 at 12:14 AM Muneem <itfllow123_at_[hidden]> wrote:
>
>> So, I am sorry for confusing you, but the think is that currently, you
>> need std::visit or branching to index homogeneous values with a runtime
>> index, this means that the context and intent given to the compiler is
>> minimal, where as for something like template inlining, the compiler has a
>> lot of context and intent, from compiler optimization flags to
>> instantiation time (which can be delayed till link time for the linker,
>> which can lead to even better optimizations), to the context of defining
>> the template definition, to the context of argument dependent lookup even,
>> which for templates is even more flexible since it can loop for it in any
>> namespace to find the most perfect match, which again gives the compiler
>> more information. The goal is to give compiler context and intent so that
>> it can find the best time to optimize (any phase in compilation or
>> linkage), and it knows as much as possible on how to optimize. here are the
>> bench marks in visual studio 2026
>> Time (ns) for switch: 97
>> Time (ns) for visit: 534
>> Time (ns) for ternary: 41
>> Time (ns) for subscript: 61
>>
>> On Thu, Apr 2, 2026 at 3:02 PM Marcin Jaczewski <
>> marcinjaczewski86_at_[hidden]> wrote:
>>
>>> czw., 2 kwi 2026 o 11:45 Muneem <itfllow123_at_[hidden]> napisał(a):
>>> >
>>> > That's the point, std variant isn't usable here, nor is any other
>>> polymorphic technique, hence we need the features that I described in the
>>> standard. The results consistently show that subscription and ternary
>>> statements can be easier to optimize than the others, which show the need
>>> for a new construct to deal with homegenous sets of values. Current
>>> branching techniques work, but can lead to verbose code, is adhoc, and is
>>> just not efficient when compared to subcripting; thats why we need
>>> subscripting like capabilities for homegenous sets of values. Context is
>>> the most important for a compiler because it can reorder expressions in
>>> that context in many ways (as long as the observable behaviour is the same,
>>> refer to the standard for a defintion of "observable behaviour" if you
>>> would like).
>>> > Std::variant and polymorphism fail at fixing everything, and I think
>>> that we can fix this specific issue of homogenous lists with this new
>>> feature. The goal is to give the compiler context and intent.
>>> > Regards, Muneem.
>>> >
>>>
>>> You repeat only buzzwords here. I ask for real life examples of code
>>> that have this problem and how your "solution" fix it exactly.
>>> It looks like an XY problem. Like, you have a contradiction there:
>>> "polymorphic" and "homegenous".
>>> Why do you try to use techine that is for different kinds of problems
>>> and claim it does not work?
>>> Did you try to use proper tools for problems you have?
>>>
>>> > On Thu, 2 Apr 2026, 1:11 pm Marcin Jaczewski, <
>>> marcinjaczewski86_at_[hidden]> wrote:
>>> >>
>>> >> czw., 2 kwi 2026 o 09:05 Muneem via Std-Proposals
>>> >> <std-proposals_at_[hidden]> napisał(a):
>>> >> >
>>> >> > hi!
>>> >> > Your point is partly correct, but this issue is quite prevalent,
>>> below are my branches from multiple sources:
>>> >> > this is the updated code:
>>> >> > #include <variant>
>>> >> > #include <iostream>
>>> >> > #include <chrono>
>>> >> > #include <ctime>
>>> >> > #include <iomanip>
>>> >> > #include<array>
>>> >> > std::array<int, 3> array_1={1,2,3};
>>> >> >
>>> >> > struct A { int get() { return array_1[0]; } };
>>> >> > struct B { int get() { return array_1[1]; } };
>>> >> > struct C { int get() { return array_1[2]; } };
>>> >> >
>>> >> > struct data_accessed_through_visit {
>>> >> > static std::variant<A, B, C> obj;
>>> >> >
>>> >> > inline int operator()(int) {
>>> >> > return std::visit([](auto&& arg) {
>>> >> > return arg.get();
>>> >> > }, obj);
>>> >> > }
>>> >> > };
>>> >> > std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>>> >> > int user_index = 0;
>>> >>
>>> >> I can hold you right now, you use a static object for the
>>> >> `std::variant`, this means you test different thing than in other
>>> >> cases.
>>> >> This can affect code gen and what optimizer can see and guarantee.
>>> >>
>>> >> Besides, why use `variant` here? It was not designed for this but to
>>> >> store different types in the same memory location.
>>> >> And none of this is used there, only making it harder to compiler to
>>> >> see what is going on.
>>> >>
>>> >> Why not use simple pointers there? And if you need to do some
>>> >> calculations function pointers?
>>> >>
>>> >> >
>>> >> > struct data_ternary {
>>> >> > inline int operator()(int index) {
>>> >> > return (index == 0) ? array_1[0] : (index == 1) ?
>>> array_1[1] : (index == 1) ? array_1[2] : -1;
>>> >> > }
>>> >> > };
>>> >> >
>>> >> > struct data_switched {
>>> >> > inline int operator()(int index) {
>>> >> > switch(index) {
>>> >> > case 0: return array_1[0];
>>> >> > case 1: return array_1[1];
>>> >> > case 2: return array_1[2];
>>> >> > default: return -1;
>>> >> > }
>>> >> > }
>>> >> > };
>>> >> >
>>> >> > struct data_indexing {
>>> >> > inline int operator()(int index) {
>>> >> > return array_1[index];
>>> >> > }
>>> >> > };
>>> >> >
>>> >> >
>>> >> >
>>> >> > volatile int x = 0;
>>> >> > constexpr uint64_t loop_count=10000;
>>> >> > static void measure_switch() {
>>> >> > data_switched obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> >
>>> >> > static void measure_visit() {
>>> >> > data_accessed_through_visit obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >>
>>> >> Why not use:
>>> >> ```
>>> >> static void measure_visit() {
>>> >> data_accessed_through_visit obj
>>> >> return std::visit([](auto&& arg) {
>>> >> for (int i=0; i++<loop_count;) {
>>> >> x = arg.get();
>>> >> }
>>> >> }, obj);
>>> >> }
>>> >> ```
>>> >> And this is more similar to what the compiler sees in other test
>>> cases.
>>> >>
>>> >> >
>>> >> > static void measure_ternary() {
>>> >> > data_ternary obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> > static void measure_indexing() {
>>> >> > data_indexing obj;
>>> >> > for (int i=0; i++<loop_count;) {
>>> >> > x = obj(user_index);
>>> >> > }
>>> >> > }
>>> >> >
>>> >> > template<typename func_t>
>>> >> > void call_func(func_t callable_obj, int arg){
>>> >> > const auto start = std::chrono::steady_clock::now();
>>> >> >
>>> >> > constexpr int how_much_to_loop=1000;
>>> >> > for(int i=0; i++<how_much_to_loop;){
>>> >> > callable_obj();
>>> >> > }
>>> >> > const auto end = std::chrono::steady_clock::now();
>>> >> > auto result=
>>> std::chrono::duration_cast<std::chrono::nanoseconds>(end -
>>> start).count()/how_much_to_loop;
>>> >> > std::cout<<result/how_much_to_loop<<std::endl;
>>> >> >
>>> >> > }
>>> >> >
>>> >> > int main() {
>>> >> > std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>>> >> > if (!(std::cin >> user_index)) return 1;
>>> >> >
>>> >> > // Set the variant state
>>> >> > if (user_index == 0) data_accessed_through_visit::obj = A{};
>>> >> > else if (user_index == 1) data_accessed_through_visit::obj =
>>> B{};
>>> >> > else if (user_index == 2) data_accessed_through_visit::obj =
>>> C{};
>>> >> >
>>> >> > std::cout << "Time (ns) for switch: ";
>>> >> > call_func(measure_switch, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for visit: ";
>>> >> > call_func(measure_visit, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for ternary: ";
>>> >> > call_func(measure_ternary, user_index);
>>> >> >
>>> >> > std::cout << "Time (ns) for subscript: ";
>>> >> > call_func(measure_indexing, user_index);
>>> >> >
>>> >> > return 0;
>>> >> > }
>>> >> > the bench marks consistently show that these syntax constructs do
>>> matter (the smaller the index range is, the more the compiler can flatten
>>> it and know how to branch), notice how ternary is outperforming them all
>>> even though its nesting, This means that adding new syntax with the sole
>>> purpose to give compilers as much information as possible is actually
>>> useful.
>>> >>
>>> >> And what do you propose here exactly? diffrent structures behave
>>> differently.
>>> >> How exactly it will look and work, right now you use lot of vague
>>> >> statement that do not show anything.
>>> >>
>>> >> > Consider how templates and instantiation give the compiler extra
>>> insight. why? because templates are instantiated at the point of
>>> instantiation which can be delayed upto link time. these are the benchmarks:
>>> >>
>>> >> what? did you confuse it with Link Time Optimization? instantiation is
>>> >> in the Transition Unit not at link time as this would be too late.
>>> >> We have Extern Templates but then compiler in TU becomes blind to that
>>> >> code as it instantiationed in other TU, LTO can help there but
>>> >> then we have exactly the same case if we use normal functions.
>>> >>
>>> >> > benchmarks for g++:
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 33
>>> >> > Time (ns) for visit: 278
>>> >> > Time (ns) for ternary: 19
>>> >> > Time (ns) for subscript: 34
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 33
>>> >> > Time (ns) for visit: 296
>>> >> > Time (ns) for ternary: 20
>>> >> > Time (ns) for subscript: 35
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 271
>>> >> > Time (ns) for ternary: 17
>>> >> > Time (ns) for subscript: 33
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 281
>>> >> > Time (ns) for ternary: 19
>>> >> > Time (ns) for subscript: 32
>>> >> > PS C:\Users\drnoo\Downloads> .\a.exe
>>> >> > Enter index (0 for A, 1 for B, 2 for C): 2
>>> >> > Time (ns) for switch: 34
>>> >> > Time (ns) for visit: 282
>>> >> > Time (ns) for ternary: 20
>>> >> > Time (ns) for subscript: 34
>>> >> > I really have to go to sleep now ( I am having some issues with
>>> visual studio 2026), I Hope, it would be acceptable for me to send the
>>> benchmarks for that tomorrow.
>>> >> >
>>> >> > regards, Muneem
>>> >> >
>>> >> >
>>> >> > On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>>> std-proposals_at_[hidden]> wrote:
>>> >> >>
>>> >> >> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem
>>> via Std-
>>> >> >> Proposals wrote:
>>> >> >> > /*
>>> >> >> > Time (ns) for switch: 168100
>>> >> >> > Time (ns) for visit: 3664100
>>> >> >> > Time (ns) for ternary: 190900
>>> >> >> > It keeps on getting worse!
>>> >> >> > */
>>> >> >>
>>> >> >> So far you've maybe shown that one implementation is generating
>>> bad code. Have
>>> >> >> you tried others?
>>> >> >>
>>> >> >> You need to prove that this is an inherent and unavoidable problem
>>> of the
>>> >> >> requirements, not that it just happened to be bad for this
>>> implementation.
>>> >> >> Just quickly reading the proposed benchmark code, it would seem
>>> there's no
>>> >> >> such inherent reason and you're making an unfounded and probably
>>> incorrect
>>> >> >> assumption about how things actually work.
>>> >> >>
>>> >> >> In fact, I pasted a portion of your code into godbolt just to see
>>> what the
>>> >> >> variant visit code, which you claim to be unnecessarily slow,
>>> would look like:
>>> >> >> https://gcc.godbolt.org/z/WK5bMzcae
>>> >> >>
>>> >> >> The first thing to note in the GCC/libstdc++ pane is that it does
>>> not use
>>> >> >> user_index. The compiler thinks it's a constant, meaning this
>>> benchmark is
>>> >> >> faulty. And thus it has constant-propagated this value and is
>>> *incredibly*
>>> >> >> efficient in doing nothing useful. MSVC did likewise.
>>> >> >>
>>> >> >> Since MSVC outputs the out-of-line copy of inlined functions, we
>>> can see the
>>> >> >> operator() expansion without the proapagation of the user_index
>>> constant. And
>>> >> >> it's no different than what a ternary or switch would look like.
>>> >> >>
>>> >> >> In the Clang/libc++ pane, we see indirect function calls. I don't
>>> know why
>>> >> >> libc++ std::variant is implemented this way, but it could be why
>>> it is slow
>>> >> >> for you if you're using this implementation. If you tell Clang to
>>> instead use
>>> >> >> libstdc++ (remove the last argument of the command-line), the
>>> indirect
>>> >> >> function call disappears and we see an unrolled loop of loading
>>> the value 10.
>>> >> >> That would mean Clang is even more efficient at doing nothing.
>>> >> >>
>>> >> >> Conclusion: it looks like your assumption that there is a problem
>>> to be solved
>>> >> >> is faulty. There is no problem.
>>> >> >>
>>> >> >> --
>>> >> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> >> >> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> >> >> --
>>> >> >> Std-Proposals mailing list
>>> >> >> Std-Proposals_at_[hidden]
>>> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>> >> >
>>> >> > --
>>> >> > Std-Proposals mailing list
>>> >> > Std-Proposals_at_[hidden]
>>> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
Received on 2026-04-02 19:20:22
