Date: Thu, 2 Apr 2026 12:08:16 +0500
My last note for today: I really really hope that we can continue this
discussion when I wake up, and my feature won't fix everything, but it
would fix branching to get a single value in homogeneous lists at runtime.
The reason that I use heterogeneous pairs is because the feature isn't
supported yet.
On Thu, Apr 2, 2026 at 12:06 PM Muneem <itfllow123_at_[hidden]> wrote:
> Also, I always have std::cin to make sure that the compiler never cheats!
>
> On Thu, Apr 2, 2026 at 12:05 PM Muneem <itfllow123_at_[hidden]> wrote:
>
>> hi!
>> Your point is partly correct, but this issue is quite prevalent, below
>> are my branches from multiple sources:
>> this is the updated code:
>> #include <variant>
>> #include <iostream>
>> #include <chrono>
>> #include <ctime>
>> #include <iomanip>
>> #include<array>
>> std::array<int, 3> array_1={1,2,3};
>>
>> struct A { int get() { return array_1[0]; } };
>> struct B { int get() { return array_1[1]; } };
>> struct C { int get() { return array_1[2]; } };
>>
>> struct data_accessed_through_visit {
>> static std::variant<A, B, C> obj;
>>
>> inline int operator()(int) {
>> return std::visit([](auto&& arg) {
>> return arg.get();
>> }, obj);
>> }
>> };
>> std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>> int user_index = 0;
>>
>> struct data_ternary {
>> inline int operator()(int index) {
>> return (index == 0) ? array_1[0] : (index == 1) ? array_1[1] :
>> (index == 1) ? array_1[2] : -1;
>> }
>> };
>>
>> struct data_switched {
>> inline int operator()(int index) {
>> switch(index) {
>> case 0: return array_1[0];
>> case 1: return array_1[1];
>> case 2: return array_1[2];
>> default: return -1;
>> }
>> }
>> };
>>
>> struct data_indexing {
>> inline int operator()(int index) {
>> return array_1[index];
>> }
>> };
>>
>>
>>
>> volatile int x = 0;
>> constexpr uint64_t loop_count=10000;
>> static void measure_switch() {
>> data_switched obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> static void measure_visit() {
>> data_accessed_through_visit obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> static void measure_ternary() {
>> data_ternary obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>> static void measure_indexing() {
>> data_indexing obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> template<typename func_t>
>> void call_func(func_t callable_obj, int arg){
>> const auto start = std::chrono::steady_clock::now();
>>
>> constexpr int how_much_to_loop=1000;
>> for(int i=0; i++<how_much_to_loop;){
>> callable_obj();
>> }
>> const auto end = std::chrono::steady_clock::now();
>> auto result= std::chrono::duration_cast<std::chrono::nanoseconds>(end
>> - start).count()/how_much_to_loop;
>> std::cout<<result/how_much_to_loop<<std::endl;
>>
>> }
>>
>> int main() {
>> std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>> if (!(std::cin >> user_index)) return 1;
>>
>> // Set the variant state
>> if (user_index == 0) data_accessed_through_visit::obj = A{};
>> else if (user_index == 1) data_accessed_through_visit::obj = B{};
>> else if (user_index == 2) data_accessed_through_visit::obj = C{};
>>
>> std::cout << "Time (ns) for switch: ";
>> call_func(measure_switch, user_index);
>>
>> std::cout << "Time (ns) for visit: ";
>> call_func(measure_visit, user_index);
>>
>> std::cout << "Time (ns) for ternary: ";
>> call_func(measure_ternary, user_index);
>>
>> std::cout << "Time (ns) for subscript: ";
>> call_func(measure_indexing, user_index);
>>
>> return 0;
>> }
>> the bench marks consistently show that these syntax constructs do matter
>> (the smaller the index range is, the more the compiler can flatten it and
>> know how to branch), notice how ternary is outperforming them all even
>> though its nesting, This means that adding new syntax with the sole purpose
>> to give compilers as much information as possible is actually useful.
>> Consider how templates and instantiation give the compiler extra insight.
>> why? because templates are instantiated at the point of instantiation which
>> can be delayed upto link time. these are the benchmarks:
>> benchmarks for g++:
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 33
>> Time (ns) for visit: 278
>> Time (ns) for ternary: 19
>> Time (ns) for subscript: 34
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 33
>> Time (ns) for visit: 296
>> Time (ns) for ternary: 20
>> Time (ns) for subscript: 35
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 271
>> Time (ns) for ternary: 17
>> Time (ns) for subscript: 33
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 281
>> Time (ns) for ternary: 19
>> Time (ns) for subscript: 32
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 282
>> Time (ns) for ternary: 20
>> Time (ns) for subscript: 34
>> I really have to go to sleep now ( I am having some issues with visual
>> studio 2026), I Hope, it would be acceptable for me to send the benchmarks
>> for that tomorrow.
>>
>> regards, Muneem
>>
>>
>> On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem via Std-
>>> Proposals wrote:
>>> > /*
>>> > Time (ns) for switch: 168100
>>> > Time (ns) for visit: 3664100
>>> > Time (ns) for ternary: 190900
>>> > It keeps on getting worse!
>>> > */
>>>
>>> So far you've maybe shown that one implementation is generating bad
>>> code. Have
>>> you tried others?
>>>
>>> You need to prove that this is an inherent and unavoidable problem of
>>> the
>>> requirements, not that it just happened to be bad for this
>>> implementation.
>>> Just quickly reading the proposed benchmark code, it would seem there's
>>> no
>>> such inherent reason and you're making an unfounded and probably
>>> incorrect
>>> assumption about how things actually work.
>>>
>>> In fact, I pasted a portion of your code into godbolt just to see what
>>> the
>>> variant visit code, which you claim to be unnecessarily slow, would look
>>> like:
>>> https://gcc.godbolt.org/z/WK5bMzcae
>>>
>>> The first thing to note in the GCC/libstdc++ pane is that it does not
>>> use
>>> user_index. The compiler thinks it's a constant, meaning this benchmark
>>> is
>>> faulty. And thus it has constant-propagated this value and is
>>> *incredibly*
>>> efficient in doing nothing useful. MSVC did likewise.
>>>
>>> Since MSVC outputs the out-of-line copy of inlined functions, we can see
>>> the
>>> operator() expansion without the proapagation of the user_index
>>> constant. And
>>> it's no different than what a ternary or switch would look like.
>>>
>>> In the Clang/libc++ pane, we see indirect function calls. I don't know
>>> why
>>> libc++ std::variant is implemented this way, but it could be why it is
>>> slow
>>> for you if you're using this implementation. If you tell Clang to
>>> instead use
>>> libstdc++ (remove the last argument of the command-line), the indirect
>>> function call disappears and we see an unrolled loop of loading the
>>> value 10.
>>> That would mean Clang is even more efficient at doing nothing.
>>>
>>> Conclusion: it looks like your assumption that there is a problem to be
>>> solved
>>> is faulty. There is no problem.
>>>
>>> --
>>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
discussion when I wake up, and my feature won't fix everything, but it
would fix branching to get a single value in homogeneous lists at runtime.
The reason that I use heterogeneous pairs is because the feature isn't
supported yet.
On Thu, Apr 2, 2026 at 12:06 PM Muneem <itfllow123_at_[hidden]> wrote:
> Also, I always have std::cin to make sure that the compiler never cheats!
>
> On Thu, Apr 2, 2026 at 12:05 PM Muneem <itfllow123_at_[hidden]> wrote:
>
>> hi!
>> Your point is partly correct, but this issue is quite prevalent, below
>> are my branches from multiple sources:
>> this is the updated code:
>> #include <variant>
>> #include <iostream>
>> #include <chrono>
>> #include <ctime>
>> #include <iomanip>
>> #include<array>
>> std::array<int, 3> array_1={1,2,3};
>>
>> struct A { int get() { return array_1[0]; } };
>> struct B { int get() { return array_1[1]; } };
>> struct C { int get() { return array_1[2]; } };
>>
>> struct data_accessed_through_visit {
>> static std::variant<A, B, C> obj;
>>
>> inline int operator()(int) {
>> return std::visit([](auto&& arg) {
>> return arg.get();
>> }, obj);
>> }
>> };
>> std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>> int user_index = 0;
>>
>> struct data_ternary {
>> inline int operator()(int index) {
>> return (index == 0) ? array_1[0] : (index == 1) ? array_1[1] :
>> (index == 1) ? array_1[2] : -1;
>> }
>> };
>>
>> struct data_switched {
>> inline int operator()(int index) {
>> switch(index) {
>> case 0: return array_1[0];
>> case 1: return array_1[1];
>> case 2: return array_1[2];
>> default: return -1;
>> }
>> }
>> };
>>
>> struct data_indexing {
>> inline int operator()(int index) {
>> return array_1[index];
>> }
>> };
>>
>>
>>
>> volatile int x = 0;
>> constexpr uint64_t loop_count=10000;
>> static void measure_switch() {
>> data_switched obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> static void measure_visit() {
>> data_accessed_through_visit obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> static void measure_ternary() {
>> data_ternary obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>> static void measure_indexing() {
>> data_indexing obj;
>> for (int i=0; i++<loop_count;) {
>> x = obj(user_index);
>> }
>> }
>>
>> template<typename func_t>
>> void call_func(func_t callable_obj, int arg){
>> const auto start = std::chrono::steady_clock::now();
>>
>> constexpr int how_much_to_loop=1000;
>> for(int i=0; i++<how_much_to_loop;){
>> callable_obj();
>> }
>> const auto end = std::chrono::steady_clock::now();
>> auto result= std::chrono::duration_cast<std::chrono::nanoseconds>(end
>> - start).count()/how_much_to_loop;
>> std::cout<<result/how_much_to_loop<<std::endl;
>>
>> }
>>
>> int main() {
>> std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>> if (!(std::cin >> user_index)) return 1;
>>
>> // Set the variant state
>> if (user_index == 0) data_accessed_through_visit::obj = A{};
>> else if (user_index == 1) data_accessed_through_visit::obj = B{};
>> else if (user_index == 2) data_accessed_through_visit::obj = C{};
>>
>> std::cout << "Time (ns) for switch: ";
>> call_func(measure_switch, user_index);
>>
>> std::cout << "Time (ns) for visit: ";
>> call_func(measure_visit, user_index);
>>
>> std::cout << "Time (ns) for ternary: ";
>> call_func(measure_ternary, user_index);
>>
>> std::cout << "Time (ns) for subscript: ";
>> call_func(measure_indexing, user_index);
>>
>> return 0;
>> }
>> the bench marks consistently show that these syntax constructs do matter
>> (the smaller the index range is, the more the compiler can flatten it and
>> know how to branch), notice how ternary is outperforming them all even
>> though its nesting, This means that adding new syntax with the sole purpose
>> to give compilers as much information as possible is actually useful.
>> Consider how templates and instantiation give the compiler extra insight.
>> why? because templates are instantiated at the point of instantiation which
>> can be delayed upto link time. these are the benchmarks:
>> benchmarks for g++:
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 33
>> Time (ns) for visit: 278
>> Time (ns) for ternary: 19
>> Time (ns) for subscript: 34
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 33
>> Time (ns) for visit: 296
>> Time (ns) for ternary: 20
>> Time (ns) for subscript: 35
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 271
>> Time (ns) for ternary: 17
>> Time (ns) for subscript: 33
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 281
>> Time (ns) for ternary: 19
>> Time (ns) for subscript: 32
>> PS C:\Users\drnoo\Downloads> .\a.exe
>> Enter index (0 for A, 1 for B, 2 for C): 2
>> Time (ns) for switch: 34
>> Time (ns) for visit: 282
>> Time (ns) for ternary: 20
>> Time (ns) for subscript: 34
>> I really have to go to sleep now ( I am having some issues with visual
>> studio 2026), I Hope, it would be acceptable for me to send the benchmarks
>> for that tomorrow.
>>
>> regards, Muneem
>>
>>
>> On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem via Std-
>>> Proposals wrote:
>>> > /*
>>> > Time (ns) for switch: 168100
>>> > Time (ns) for visit: 3664100
>>> > Time (ns) for ternary: 190900
>>> > It keeps on getting worse!
>>> > */
>>>
>>> So far you've maybe shown that one implementation is generating bad
>>> code. Have
>>> you tried others?
>>>
>>> You need to prove that this is an inherent and unavoidable problem of
>>> the
>>> requirements, not that it just happened to be bad for this
>>> implementation.
>>> Just quickly reading the proposed benchmark code, it would seem there's
>>> no
>>> such inherent reason and you're making an unfounded and probably
>>> incorrect
>>> assumption about how things actually work.
>>>
>>> In fact, I pasted a portion of your code into godbolt just to see what
>>> the
>>> variant visit code, which you claim to be unnecessarily slow, would look
>>> like:
>>> https://gcc.godbolt.org/z/WK5bMzcae
>>>
>>> The first thing to note in the GCC/libstdc++ pane is that it does not
>>> use
>>> user_index. The compiler thinks it's a constant, meaning this benchmark
>>> is
>>> faulty. And thus it has constant-propagated this value and is
>>> *incredibly*
>>> efficient in doing nothing useful. MSVC did likewise.
>>>
>>> Since MSVC outputs the out-of-line copy of inlined functions, we can see
>>> the
>>> operator() expansion without the proapagation of the user_index
>>> constant. And
>>> it's no different than what a ternary or switch would look like.
>>>
>>> In the Clang/libc++ pane, we see indirect function calls. I don't know
>>> why
>>> libc++ std::variant is implemented this way, but it could be why it is
>>> slow
>>> for you if you're using this implementation. If you tell Clang to
>>> instead use
>>> libstdc++ (remove the last argument of the command-line), the indirect
>>> function call disappears and we see an unrolled loop of loading the
>>> value 10.
>>> That would mean Clang is even more efficient at doing nothing.
>>>
>>> Conclusion: it looks like your assumption that there is a problem to be
>>> solved
>>> is faulty. There is no problem.
>>>
>>> --
>>> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
Received on 2026-04-02 07:08:33
