Date: Fri, 3 Apr 2026 03:18:30 +0500
I am not in any way changing the language; what I am doing is adding
constructs to increase clarity for both programmers and compilers. This
would at the very least lead to readable code than the current state where
code using std::visit or switch case is ugly, and would (probably) lead to
lower compiler time for compilers that try to optimize these branches.
On Fri, Apr 3, 2026 at 3:15 AM Muneem <itfllow123_at_[hidden]> wrote:
> I completely agree with you, that's why I think a new construct would be
> better, not only for being less verbose than switch statements but also
> providing the some intent that subscripting does. I provided benchmarks
> because that is what some asked for. Ask me anything, and I would be
> pleased to respond!
> regards, Muneem
>
> On Fri, Apr 3, 2026 at 3:12 AM Muneem <itfllow123_at_[hidden]> wrote:
>
>> Hi, again!
>> > Interesting compiler veven remove all warpers and only emit calls to
>> `std::cout`
>> In my benchmark, I used std::cin and volatile x to make sure that any
>> compile time indexing dosent happen!
>> I completely agree with your point on cache locality, infact, that's the
>> whole point of the construct, so the compiler has as much context as
>> possible so it can make code cache locale. I know benchmarks can lie, but
>> assembly lies more because compilers can produce the same assembly for two
>> functions that are only slightly slower, but small differences can have
>> huge implications on cache performance. I completely agree with you that
>> cache performance matters more than anything, and my example shows that the
>> compiler has context and intent that I want a contiguous allocated
>> structure that I can index through. Functions calls are worse because they
>> dont speak enough intent and not enough context, meanwhile my proposal will
>> give you context from many sources and the clearest intent.
>>
>> These intent based constructs are useful so that the compiler can do
>> exactly what you said "deal with cache better". Benchmarks show that even
>> in small code, compilers fall short when it comes to optimizing switch case
>> statements and std::visits. Benchmarks can lie, but in my case it shows
>> that if we have proper syntax constructs for these features, then we can in
>> fact make it cache friendly because the compiler has the exact map of
>> intent and context. It can basically measure a lot to increase performance.
>>
>> thanks again for your feedback
>>
>> regards, Muneem
>>
>> On Fri, Apr 3, 2026 at 2:57 AM Marcin Jaczewski <
>> marcinjaczewski86_at_[hidden]> wrote:
>>
>>> Benchmarks do lie...
>>>
>>> The first rule of benchmarking is not testing real code and this
>>> creates biases that screw
>>> results, considering simple things like CPU cache. Benchmark can
>>> easily fill it full and run at 100% speed.
>>> but in real code the same logic has only 0.1% cache available to it as
>>> other code needs the same cache too.
>>> If you have cache miss, the whole code run 100 times slower.
>>>
>>> Only test final optimized assembly are true test of permanence,
>>> benchmarks are only hits for searching optimal solutions.
>>>
>>>
>>> And for your example, why not simply use get function pointer like
>>> `&A::f` and that call this function directly?
>>> Why bother with indexes?
>>>
>>> ```
>>> #include <iostream>
>>> #include <tuple>
>>> #include <functional>
>>>
>>> struct A { int get() { return 10; } };
>>> struct B { double get() { return 1; } };
>>> struct C { float get() { return 2.1; } };
>>>
>>> template<typename C>
>>> struct GetCallerArg;
>>> template<typename T, typename R>
>>> struct GetCallerArg<R (T::*)()>
>>> {
>>> using type = T;
>>> };
>>>
>>> template<typename T, auto Callback>
>>> void warper(T& a)
>>> {
>>> std::cout<< std::invoke(
>>> Callback,
>>> std::get<typename GetCallerArg<decltype(Callback)>::type>(a)
>>> ) << '\n';
>>> }
>>> template<typename T>
>>> using Access = void(*)(T&);
>>>
>>>
>>>
>>> int main()
>>> {
>>> std::tuple<A,B,C> t{ A{}, B{}, C{} };
>>>
>>> Access<decltype(t)> f = &warper<decltype(t), &A::get>;
>>>
>>> f(t);
>>>
>>> f = &warper<decltype(t), &B::get>;
>>>
>>> f(t);
>>> }
>>> ```
>>> https://godbolt.org/z/Y5fGEMdc4
>>>
>>> Intresing compiler veven remove all warpers and only emit calls to
>>> `std::cout`
>>>
>>> czw., 2 kwi 2026 o 23:20 Muneem via Std-Proposals
>>> <std-proposals_at_[hidden]> napisaĆ(a):
>>> >
>>> > benchmarks don't lie! Even if the assembly is the same size and looks
>>> similar, benchmarks show otherwise. Do benchmarking in the code that I
>>> showed using my proposal.
>>> > --
>>> > Std-Proposals mailing list
>>> > Std-Proposals_at_[hidden]
>>> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
constructs to increase clarity for both programmers and compilers. This
would at the very least lead to readable code than the current state where
code using std::visit or switch case is ugly, and would (probably) lead to
lower compiler time for compilers that try to optimize these branches.
On Fri, Apr 3, 2026 at 3:15 AM Muneem <itfllow123_at_[hidden]> wrote:
> I completely agree with you, that's why I think a new construct would be
> better, not only for being less verbose than switch statements but also
> providing the some intent that subscripting does. I provided benchmarks
> because that is what some asked for. Ask me anything, and I would be
> pleased to respond!
> regards, Muneem
>
> On Fri, Apr 3, 2026 at 3:12 AM Muneem <itfllow123_at_[hidden]> wrote:
>
>> Hi, again!
>> > Interesting compiler veven remove all warpers and only emit calls to
>> `std::cout`
>> In my benchmark, I used std::cin and volatile x to make sure that any
>> compile time indexing dosent happen!
>> I completely agree with your point on cache locality, infact, that's the
>> whole point of the construct, so the compiler has as much context as
>> possible so it can make code cache locale. I know benchmarks can lie, but
>> assembly lies more because compilers can produce the same assembly for two
>> functions that are only slightly slower, but small differences can have
>> huge implications on cache performance. I completely agree with you that
>> cache performance matters more than anything, and my example shows that the
>> compiler has context and intent that I want a contiguous allocated
>> structure that I can index through. Functions calls are worse because they
>> dont speak enough intent and not enough context, meanwhile my proposal will
>> give you context from many sources and the clearest intent.
>>
>> These intent based constructs are useful so that the compiler can do
>> exactly what you said "deal with cache better". Benchmarks show that even
>> in small code, compilers fall short when it comes to optimizing switch case
>> statements and std::visits. Benchmarks can lie, but in my case it shows
>> that if we have proper syntax constructs for these features, then we can in
>> fact make it cache friendly because the compiler has the exact map of
>> intent and context. It can basically measure a lot to increase performance.
>>
>> thanks again for your feedback
>>
>> regards, Muneem
>>
>> On Fri, Apr 3, 2026 at 2:57 AM Marcin Jaczewski <
>> marcinjaczewski86_at_[hidden]> wrote:
>>
>>> Benchmarks do lie...
>>>
>>> The first rule of benchmarking is not testing real code and this
>>> creates biases that screw
>>> results, considering simple things like CPU cache. Benchmark can
>>> easily fill it full and run at 100% speed.
>>> but in real code the same logic has only 0.1% cache available to it as
>>> other code needs the same cache too.
>>> If you have cache miss, the whole code run 100 times slower.
>>>
>>> Only test final optimized assembly are true test of permanence,
>>> benchmarks are only hits for searching optimal solutions.
>>>
>>>
>>> And for your example, why not simply use get function pointer like
>>> `&A::f` and that call this function directly?
>>> Why bother with indexes?
>>>
>>> ```
>>> #include <iostream>
>>> #include <tuple>
>>> #include <functional>
>>>
>>> struct A { int get() { return 10; } };
>>> struct B { double get() { return 1; } };
>>> struct C { float get() { return 2.1; } };
>>>
>>> template<typename C>
>>> struct GetCallerArg;
>>> template<typename T, typename R>
>>> struct GetCallerArg<R (T::*)()>
>>> {
>>> using type = T;
>>> };
>>>
>>> template<typename T, auto Callback>
>>> void warper(T& a)
>>> {
>>> std::cout<< std::invoke(
>>> Callback,
>>> std::get<typename GetCallerArg<decltype(Callback)>::type>(a)
>>> ) << '\n';
>>> }
>>> template<typename T>
>>> using Access = void(*)(T&);
>>>
>>>
>>>
>>> int main()
>>> {
>>> std::tuple<A,B,C> t{ A{}, B{}, C{} };
>>>
>>> Access<decltype(t)> f = &warper<decltype(t), &A::get>;
>>>
>>> f(t);
>>>
>>> f = &warper<decltype(t), &B::get>;
>>>
>>> f(t);
>>> }
>>> ```
>>> https://godbolt.org/z/Y5fGEMdc4
>>>
>>> Intresing compiler veven remove all warpers and only emit calls to
>>> `std::cout`
>>>
>>> czw., 2 kwi 2026 o 23:20 Muneem via Std-Proposals
>>> <std-proposals_at_[hidden]> napisaĆ(a):
>>> >
>>> > benchmarks don't lie! Even if the assembly is the same size and looks
>>> similar, benchmarks show otherwise. Do benchmarking in the code that I
>>> showed using my proposal.
>>> > --
>>> > Std-Proposals mailing list
>>> > Std-Proposals_at_[hidden]
>>> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
Received on 2026-04-02 22:18:47
