Date: Thu, 2 Apr 2026 12:06:09 +0500
Also, I always have std::cin to make sure that the compiler never cheats!
On Thu, Apr 2, 2026 at 12:05 PM Muneem <itfllow123_at_[hidden]> wrote:
> hi!
> Your point is partly correct, but this issue is quite prevalent, below are
> my branches from multiple sources:
> this is the updated code:
> #include <variant>
> #include <iostream>
> #include <chrono>
> #include <ctime>
> #include <iomanip>
> #include<array>
> std::array<int, 3> array_1={1,2,3};
>
> struct A { int get() { return array_1[0]; } };
> struct B { int get() { return array_1[1]; } };
> struct C { int get() { return array_1[2]; } };
>
> struct data_accessed_through_visit {
> static std::variant<A, B, C> obj;
>
> inline int operator()(int) {
> return std::visit([](auto&& arg) {
> return arg.get();
> }, obj);
> }
> };
> std::variant<A, B, C> data_accessed_through_visit::obj=C{};
> int user_index = 0;
>
> struct data_ternary {
> inline int operator()(int index) {
> return (index == 0) ? array_1[0] : (index == 1) ? array_1[1] :
> (index == 1) ? array_1[2] : -1;
> }
> };
>
> struct data_switched {
> inline int operator()(int index) {
> switch(index) {
> case 0: return array_1[0];
> case 1: return array_1[1];
> case 2: return array_1[2];
> default: return -1;
> }
> }
> };
>
> struct data_indexing {
> inline int operator()(int index) {
> return array_1[index];
> }
> };
>
>
>
> volatile int x = 0;
> constexpr uint64_t loop_count=10000;
> static void measure_switch() {
> data_switched obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> static void measure_visit() {
> data_accessed_through_visit obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> static void measure_ternary() {
> data_ternary obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
> static void measure_indexing() {
> data_indexing obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> template<typename func_t>
> void call_func(func_t callable_obj, int arg){
> const auto start = std::chrono::steady_clock::now();
>
> constexpr int how_much_to_loop=1000;
> for(int i=0; i++<how_much_to_loop;){
> callable_obj();
> }
> const auto end = std::chrono::steady_clock::now();
> auto result= std::chrono::duration_cast<std::chrono::nanoseconds>(end
> - start).count()/how_much_to_loop;
> std::cout<<result/how_much_to_loop<<std::endl;
>
> }
>
> int main() {
> std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
> if (!(std::cin >> user_index)) return 1;
>
> // Set the variant state
> if (user_index == 0) data_accessed_through_visit::obj = A{};
> else if (user_index == 1) data_accessed_through_visit::obj = B{};
> else if (user_index == 2) data_accessed_through_visit::obj = C{};
>
> std::cout << "Time (ns) for switch: ";
> call_func(measure_switch, user_index);
>
> std::cout << "Time (ns) for visit: ";
> call_func(measure_visit, user_index);
>
> std::cout << "Time (ns) for ternary: ";
> call_func(measure_ternary, user_index);
>
> std::cout << "Time (ns) for subscript: ";
> call_func(measure_indexing, user_index);
>
> return 0;
> }
> the bench marks consistently show that these syntax constructs do matter
> (the smaller the index range is, the more the compiler can flatten it and
> know how to branch), notice how ternary is outperforming them all even
> though its nesting, This means that adding new syntax with the sole purpose
> to give compilers as much information as possible is actually useful.
> Consider how templates and instantiation give the compiler extra insight.
> why? because templates are instantiated at the point of instantiation which
> can be delayed upto link time. these are the benchmarks:
> benchmarks for g++:
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 33
> Time (ns) for visit: 278
> Time (ns) for ternary: 19
> Time (ns) for subscript: 34
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 33
> Time (ns) for visit: 296
> Time (ns) for ternary: 20
> Time (ns) for subscript: 35
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 271
> Time (ns) for ternary: 17
> Time (ns) for subscript: 33
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 281
> Time (ns) for ternary: 19
> Time (ns) for subscript: 32
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 282
> Time (ns) for ternary: 20
> Time (ns) for subscript: 34
> I really have to go to sleep now ( I am having some issues with visual
> studio 2026), I Hope, it would be acceptable for me to send the benchmarks
> for that tomorrow.
>
> regards, Muneem
>
>
> On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem via Std-
>> Proposals wrote:
>> > /*
>> > Time (ns) for switch: 168100
>> > Time (ns) for visit: 3664100
>> > Time (ns) for ternary: 190900
>> > It keeps on getting worse!
>> > */
>>
>> So far you've maybe shown that one implementation is generating bad code.
>> Have
>> you tried others?
>>
>> You need to prove that this is an inherent and unavoidable problem of the
>> requirements, not that it just happened to be bad for this
>> implementation.
>> Just quickly reading the proposed benchmark code, it would seem there's
>> no
>> such inherent reason and you're making an unfounded and probably
>> incorrect
>> assumption about how things actually work.
>>
>> In fact, I pasted a portion of your code into godbolt just to see what
>> the
>> variant visit code, which you claim to be unnecessarily slow, would look
>> like:
>> https://gcc.godbolt.org/z/WK5bMzcae
>>
>> The first thing to note in the GCC/libstdc++ pane is that it does not use
>> user_index. The compiler thinks it's a constant, meaning this benchmark
>> is
>> faulty. And thus it has constant-propagated this value and is
>> *incredibly*
>> efficient in doing nothing useful. MSVC did likewise.
>>
>> Since MSVC outputs the out-of-line copy of inlined functions, we can see
>> the
>> operator() expansion without the proapagation of the user_index constant.
>> And
>> it's no different than what a ternary or switch would look like.
>>
>> In the Clang/libc++ pane, we see indirect function calls. I don't know
>> why
>> libc++ std::variant is implemented this way, but it could be why it is
>> slow
>> for you if you're using this implementation. If you tell Clang to instead
>> use
>> libstdc++ (remove the last argument of the command-line), the indirect
>> function call disappears and we see an unrolled loop of loading the value
>> 10.
>> That would mean Clang is even more efficient at doing nothing.
>>
>> Conclusion: it looks like your assumption that there is a problem to be
>> solved
>> is faulty. There is no problem.
>>
>> --
>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>
On Thu, Apr 2, 2026 at 12:05 PM Muneem <itfllow123_at_[hidden]> wrote:
> hi!
> Your point is partly correct, but this issue is quite prevalent, below are
> my branches from multiple sources:
> this is the updated code:
> #include <variant>
> #include <iostream>
> #include <chrono>
> #include <ctime>
> #include <iomanip>
> #include<array>
> std::array<int, 3> array_1={1,2,3};
>
> struct A { int get() { return array_1[0]; } };
> struct B { int get() { return array_1[1]; } };
> struct C { int get() { return array_1[2]; } };
>
> struct data_accessed_through_visit {
> static std::variant<A, B, C> obj;
>
> inline int operator()(int) {
> return std::visit([](auto&& arg) {
> return arg.get();
> }, obj);
> }
> };
> std::variant<A, B, C> data_accessed_through_visit::obj=C{};
> int user_index = 0;
>
> struct data_ternary {
> inline int operator()(int index) {
> return (index == 0) ? array_1[0] : (index == 1) ? array_1[1] :
> (index == 1) ? array_1[2] : -1;
> }
> };
>
> struct data_switched {
> inline int operator()(int index) {
> switch(index) {
> case 0: return array_1[0];
> case 1: return array_1[1];
> case 2: return array_1[2];
> default: return -1;
> }
> }
> };
>
> struct data_indexing {
> inline int operator()(int index) {
> return array_1[index];
> }
> };
>
>
>
> volatile int x = 0;
> constexpr uint64_t loop_count=10000;
> static void measure_switch() {
> data_switched obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> static void measure_visit() {
> data_accessed_through_visit obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> static void measure_ternary() {
> data_ternary obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
> static void measure_indexing() {
> data_indexing obj;
> for (int i=0; i++<loop_count;) {
> x = obj(user_index);
> }
> }
>
> template<typename func_t>
> void call_func(func_t callable_obj, int arg){
> const auto start = std::chrono::steady_clock::now();
>
> constexpr int how_much_to_loop=1000;
> for(int i=0; i++<how_much_to_loop;){
> callable_obj();
> }
> const auto end = std::chrono::steady_clock::now();
> auto result= std::chrono::duration_cast<std::chrono::nanoseconds>(end
> - start).count()/how_much_to_loop;
> std::cout<<result/how_much_to_loop<<std::endl;
>
> }
>
> int main() {
> std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
> if (!(std::cin >> user_index)) return 1;
>
> // Set the variant state
> if (user_index == 0) data_accessed_through_visit::obj = A{};
> else if (user_index == 1) data_accessed_through_visit::obj = B{};
> else if (user_index == 2) data_accessed_through_visit::obj = C{};
>
> std::cout << "Time (ns) for switch: ";
> call_func(measure_switch, user_index);
>
> std::cout << "Time (ns) for visit: ";
> call_func(measure_visit, user_index);
>
> std::cout << "Time (ns) for ternary: ";
> call_func(measure_ternary, user_index);
>
> std::cout << "Time (ns) for subscript: ";
> call_func(measure_indexing, user_index);
>
> return 0;
> }
> the bench marks consistently show that these syntax constructs do matter
> (the smaller the index range is, the more the compiler can flatten it and
> know how to branch), notice how ternary is outperforming them all even
> though its nesting, This means that adding new syntax with the sole purpose
> to give compilers as much information as possible is actually useful.
> Consider how templates and instantiation give the compiler extra insight.
> why? because templates are instantiated at the point of instantiation which
> can be delayed upto link time. these are the benchmarks:
> benchmarks for g++:
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 33
> Time (ns) for visit: 278
> Time (ns) for ternary: 19
> Time (ns) for subscript: 34
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 33
> Time (ns) for visit: 296
> Time (ns) for ternary: 20
> Time (ns) for subscript: 35
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 271
> Time (ns) for ternary: 17
> Time (ns) for subscript: 33
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 281
> Time (ns) for ternary: 19
> Time (ns) for subscript: 32
> PS C:\Users\drnoo\Downloads> .\a.exe
> Enter index (0 for A, 1 for B, 2 for C): 2
> Time (ns) for switch: 34
> Time (ns) for visit: 282
> Time (ns) for ternary: 20
> Time (ns) for subscript: 34
> I really have to go to sleep now ( I am having some issues with visual
> studio 2026), I Hope, it would be acceptable for me to send the benchmarks
> for that tomorrow.
>
> regards, Muneem
>
>
> On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem via Std-
>> Proposals wrote:
>> > /*
>> > Time (ns) for switch: 168100
>> > Time (ns) for visit: 3664100
>> > Time (ns) for ternary: 190900
>> > It keeps on getting worse!
>> > */
>>
>> So far you've maybe shown that one implementation is generating bad code.
>> Have
>> you tried others?
>>
>> You need to prove that this is an inherent and unavoidable problem of the
>> requirements, not that it just happened to be bad for this
>> implementation.
>> Just quickly reading the proposed benchmark code, it would seem there's
>> no
>> such inherent reason and you're making an unfounded and probably
>> incorrect
>> assumption about how things actually work.
>>
>> In fact, I pasted a portion of your code into godbolt just to see what
>> the
>> variant visit code, which you claim to be unnecessarily slow, would look
>> like:
>> https://gcc.godbolt.org/z/WK5bMzcae
>>
>> The first thing to note in the GCC/libstdc++ pane is that it does not use
>> user_index. The compiler thinks it's a constant, meaning this benchmark
>> is
>> faulty. And thus it has constant-propagated this value and is
>> *incredibly*
>> efficient in doing nothing useful. MSVC did likewise.
>>
>> Since MSVC outputs the out-of-line copy of inlined functions, we can see
>> the
>> operator() expansion without the proapagation of the user_index constant.
>> And
>> it's no different than what a ternary or switch would look like.
>>
>> In the Clang/libc++ pane, we see indirect function calls. I don't know
>> why
>> libc++ std::variant is implemented this way, but it could be why it is
>> slow
>> for you if you're using this implementation. If you tell Clang to instead
>> use
>> libstdc++ (remove the last argument of the command-line), the indirect
>> function call disappears and we see an unrolled loop of loading the value
>> 10.
>> That would mean Clang is even more efficient at doing nothing.
>>
>> Conclusion: it looks like your assumption that there is a problem to be
>> solved
>> is faulty. There is no problem.
>>
>> --
>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>
Received on 2026-04-02 07:06:26
