C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Fwd: Extension to runtime polymorphism proposed

From: Muneem <itfllow123_at_[hidden]>
Date: Fri, 3 Apr 2026 01:18:28 +0500
Hi!
Thank you for your feedback ❤️❤️❤️.
Sorry if some of the things that I said were incorrect, but my
philosophical point was that compilers are layered models that need a
language to consume intent and context so they product proper code, that's
just how it should be, why we use compiled languages.
On the question of how the syntax will look like:
It would have the same semantics as an std::array(runtime indexing, and a
template instantion for every collection that it is instantiated for), but
is homegenous. (I called it half open in the previous mails because that's
what Stroustrup calls sequential containers). To index, you can use an
integer or any type that converts to one implicitly or explicitly(just like
any other half open container types). To implement this container, the
language would have to provide either one of these facilities on a language
level:
1.Indexing user-defined objects or providing a new construct like
struct/class that can be indexes.
2. Providing homogenous lists (unlikely to be implemented).
3. Providing a specific kind of function which can have different return
types based on the index passed:
{Int_obj, double_obj, float_obj} return_indexed_value(any integer that ful
fills the is integral concept index);
//The compiler deduces the body itself.
A cherry on the top would be to allow any class to be based for
indexing(those classes must have a particular function for providing the
integer), so that you can give compilers an insight to the fact that you
are potentially doing multiple subscript operations in the same chain, so
the compiler can optimize that chain. For example:
List[list[1]] the compiler can figure out ways to make this as cache
friendly as possible.
(This feature would probably work the same way if you provide an operator
that converts the type to int, but again, this feature would make the
intent more clearer).

On Fri, Apr 3, 2026 at 1:17 AM Muneem <itfllow123_at_[hidden]> wrote:

> Hi!
> Thank you for your feedback ❤️❤️❤️.
> Sorry if some of the things that I said were incorrect, but my
> philosophical point was that compilers are layered models that need a
> language to consume intent and context so they product proper code, that's
> just how it should be, why we use compiled languages.
> On the question of how the syntax will look like:
> It would have the same semantics as an std::array(runtime indexing, and a
> template instantion for every collection that it is instantiated for), but
> is homegenous. (I called it half open in the previous mails because that's
> what Stroustrup calls sequential containers). To index, you can use an
> integer or any type that converts to one implicitly or explicitly(just like
> any other half open container types). To implement this container, the
> language would have to provide either one of these facilities on a language
> level:
> 1.Indexing user-defined objects or providing a new construct like
> struct/class that can be indexes.
> 2. Providing homogenous lists (unlikely to be implemented).
> 3. Providing a specific kind of function which can have different return
> types based on the index passed:
> {Int_obj, double_obj, float_obj} return_indexed_value(any integer that ful
> fills the is integral concept index);
> //The compiler deduces the body itself.
> A cherry on the top would be to allow any class to be based for
> indexing(those classes must have a particular function for providing the
> integer), so that you can give compilers an insight to the fact that you
> are potentially doing multiple subscript operations in the same chain, so
> the compiler can optimize that chain. For example:
> List[list[1]] the compiler can figure out ways to make this as cache
> friendly as possible.
> (This feature would probably work the same way if you provide an operator
> that converts the type to int, but again, this feature would make the
> intent more clearer).
>
>
>
> On Fri, 3 Apr 2026, 12:43 am Breno Guimarães, <brenorg_at_[hidden]> wrote:
>
>> Hi Muneem,
>>
>> Can you post how the code would look like with the feature you are
>> proposing?
>>
>> You don't need to try to teach everyone about how compilers work. Some of
>> the things you say are just incorrect, so this will drag on.
>>
>> You showed the benchmark showing different implementations have different
>> times. Great.
>>
>> How would the new improved version look like?
>> It doesn't need to compile.
>>
>> Thanks!
>>
>> Em qui., 2 de abr. de 2026, 16:32, Muneem via Std-Proposals <
>> std-proposals_at_[hidden]> escreveu:
>>
>>> >what? Did you confuse it with Link Time Optimization? instantiation is
>>> >in the Transition Unit not at link time as this would be too late.
>>> >We have Extern Templates but then the compiler in TU becomes blind to
>>> that
>>> >code as it instantiates in other TU, LTO can help there but
>>> >then we have exactly the same case if we use normal functions.
>>> It's different for templates because the linker has to plug in some
>>> type, which again gives it more flexibility since it knows the INTENT IS TO
>>> PLUGIN types. current std::variants do rely on templates, but it's not good
>>> enough for subscripting, like my benchmarking clearly shows this
>>> consistently. LIke how am I supposed to subscript homogenous lists when the
>>> speed of branching is not fast enough. Why is assembly code outperformed by
>>> C++ code (for large code bases)? because C++ code has constructs that give
>>> compiler intent and context, which helps the compiler do automated
>>> optimizations). There is no current tool, like the current tools can be
>>> used for this until there is a construct to allow for runtime indexing of
>>> homogenous half open sets.
>>>
>>> One last detail:
>>> tools do exist and I tried them, but they dont convey intent, like
>>> again, I can make my own tools but they won't convey intent, hence won't be
>>> fast enough. Like why do I use the inline keyword, when the compiler should
>>> in theory inline any function? Why do I use the constexpr keyword, if the
>>> compiler can evaluate any thing that It can at compile time if the
>>> observable behaviour does not change. It's a very simple point that I am
>>> trying to build on.
>>>
>>>
>>> On Fri, Apr 3, 2026 at 12:28 AM Muneem <itfllow123_at_[hidden]> wrote:
>>>
>>>> >what? Did you confuse it with Link Time Optimization? instantiation is
>>>> >in the Transition Unit not at link time as this would be too late.
>>>> >We have Extern Templates but then the compiler in TU becomes blind to
>>>> that
>>>> >code as it instantiates in other TU, LTO can help there but
>>>> >then we have exactly the same case if we use normal functions.
>>>> It's different for templates because the linker has to plug in some
>>>> type, which again gives it more flexibility since it knows the INTENT IS TO
>>>> PLUGIN types. current std::variants do rely on templates, but it's not good
>>>> enough for subscripting, like my benchmarking clearly shows this
>>>> consistently. LIke how am I supposed to subscript homogenous lists when the
>>>> speed of branching is not fast enough. Why is assembly code outperformed by
>>>> C++ code (for large code bases)? because C++ code has constructs that give
>>>> compiler intent and context, which helps the compiler do automated
>>>> optimizations). There is no current tool, like the current tools can be
>>>> used for this until there is a construct to allow for runtime indexing of
>>>> homogenous half open sets.
>>>>
>>>>
>>>> On Thu, Apr 2, 2026 at 1:11 PM Marcin Jaczewski <
>>>> marcinjaczewski86_at_[hidden]> wrote:
>>>>
>>>>> czw., 2 kwi 2026 o 09:05 Muneem via Std-Proposals
>>>>> <std-proposals_at_[hidden]> napisał(a):
>>>>> >
>>>>> > hi!
>>>>> > Your point is partly correct, but this issue is quite prevalent,
>>>>> below are my branches from multiple sources:
>>>>> > this is the updated code:
>>>>> > #include <variant>
>>>>> > #include <iostream>
>>>>> > #include <chrono>
>>>>> > #include <ctime>
>>>>> > #include <iomanip>
>>>>> > #include<array>
>>>>> > std::array<int, 3> array_1={1,2,3};
>>>>> >
>>>>> > struct A { int get() { return array_1[0]; } };
>>>>> > struct B { int get() { return array_1[1]; } };
>>>>> > struct C { int get() { return array_1[2]; } };
>>>>> >
>>>>> > struct data_accessed_through_visit {
>>>>> > static std::variant<A, B, C> obj;
>>>>> >
>>>>> > inline int operator()(int) {
>>>>> > return std::visit([](auto&& arg) {
>>>>> > return arg.get();
>>>>> > }, obj);
>>>>> > }
>>>>> > };
>>>>> > std::variant<A, B, C> data_accessed_through_visit::obj=C{};
>>>>> > int user_index = 0;
>>>>>
>>>>> I can hold you right now, you use a static object for the
>>>>> `std::variant`, this means you test different thing than in other
>>>>> cases.
>>>>> This can affect code gen and what optimizer can see and guarantee.
>>>>>
>>>>> Besides, why use `variant` here? It was not designed for this but to
>>>>> store different types in the same memory location.
>>>>> And none of this is used there, only making it harder to compiler to
>>>>> see what is going on.
>>>>>
>>>>> Why not use simple pointers there? And if you need to do some
>>>>> calculations function pointers?
>>>>>
>>>>> >
>>>>> > struct data_ternary {
>>>>> > inline int operator()(int index) {
>>>>> > return (index == 0) ? array_1[0] : (index == 1) ? array_1[1]
>>>>> : (index == 1) ? array_1[2] : -1;
>>>>> > }
>>>>> > };
>>>>> >
>>>>> > struct data_switched {
>>>>> > inline int operator()(int index) {
>>>>> > switch(index) {
>>>>> > case 0: return array_1[0];
>>>>> > case 1: return array_1[1];
>>>>> > case 2: return array_1[2];
>>>>> > default: return -1;
>>>>> > }
>>>>> > }
>>>>> > };
>>>>> >
>>>>> > struct data_indexing {
>>>>> > inline int operator()(int index) {
>>>>> > return array_1[index];
>>>>> > }
>>>>> > };
>>>>> >
>>>>> >
>>>>> >
>>>>> > volatile int x = 0;
>>>>> > constexpr uint64_t loop_count=10000;
>>>>> > static void measure_switch() {
>>>>> > data_switched obj;
>>>>> > for (int i=0; i++<loop_count;) {
>>>>> > x = obj(user_index);
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > static void measure_visit() {
>>>>> > data_accessed_through_visit obj;
>>>>> > for (int i=0; i++<loop_count;) {
>>>>> > x = obj(user_index);
>>>>> > }
>>>>> > }
>>>>>
>>>>> Why not use:
>>>>> ```
>>>>> static void measure_visit() {
>>>>> data_accessed_through_visit obj
>>>>> return std::visit([](auto&& arg) {
>>>>> for (int i=0; i++<loop_count;) {
>>>>> x = arg.get();
>>>>> }
>>>>> }, obj);
>>>>> }
>>>>> ```
>>>>> And this is more similar to what the compiler sees in other test cases.
>>>>>
>>>>> >
>>>>> > static void measure_ternary() {
>>>>> > data_ternary obj;
>>>>> > for (int i=0; i++<loop_count;) {
>>>>> > x = obj(user_index);
>>>>> > }
>>>>> > }
>>>>> > static void measure_indexing() {
>>>>> > data_indexing obj;
>>>>> > for (int i=0; i++<loop_count;) {
>>>>> > x = obj(user_index);
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > template<typename func_t>
>>>>> > void call_func(func_t callable_obj, int arg){
>>>>> > const auto start = std::chrono::steady_clock::now();
>>>>> >
>>>>> > constexpr int how_much_to_loop=1000;
>>>>> > for(int i=0; i++<how_much_to_loop;){
>>>>> > callable_obj();
>>>>> > }
>>>>> > const auto end = std::chrono::steady_clock::now();
>>>>> > auto result=
>>>>> std::chrono::duration_cast<std::chrono::nanoseconds>(end -
>>>>> start).count()/how_much_to_loop;
>>>>> > std::cout<<result/how_much_to_loop<<std::endl;
>>>>> >
>>>>> > }
>>>>> >
>>>>> > int main() {
>>>>> > std::cout << "Enter index (0 for A, 1 for B, 2 for C): ";
>>>>> > if (!(std::cin >> user_index)) return 1;
>>>>> >
>>>>> > // Set the variant state
>>>>> > if (user_index == 0) data_accessed_through_visit::obj = A{};
>>>>> > else if (user_index == 1) data_accessed_through_visit::obj = B{};
>>>>> > else if (user_index == 2) data_accessed_through_visit::obj = C{};
>>>>> >
>>>>> > std::cout << "Time (ns) for switch: ";
>>>>> > call_func(measure_switch, user_index);
>>>>> >
>>>>> > std::cout << "Time (ns) for visit: ";
>>>>> > call_func(measure_visit, user_index);
>>>>> >
>>>>> > std::cout << "Time (ns) for ternary: ";
>>>>> > call_func(measure_ternary, user_index);
>>>>> >
>>>>> > std::cout << "Time (ns) for subscript: ";
>>>>> > call_func(measure_indexing, user_index);
>>>>> >
>>>>> > return 0;
>>>>> > }
>>>>> > the bench marks consistently show that these syntax constructs do
>>>>> matter (the smaller the index range is, the more the compiler can flatten
>>>>> it and know how to branch), notice how ternary is outperforming them all
>>>>> even though its nesting, This means that adding new syntax with the sole
>>>>> purpose to give compilers as much information as possible is actually
>>>>> useful.
>>>>>
>>>>> And what do you propose here exactly? diffrent structures behave
>>>>> differently.
>>>>> How exactly it will look and work, right now you use lot of vague
>>>>> statement that do not show anything.
>>>>>
>>>>> > Consider how templates and instantiation give the compiler extra
>>>>> insight. why? because templates are instantiated at the point of
>>>>> instantiation which can be delayed upto link time. these are the benchmarks:
>>>>>
>>>>> what? did you confuse it with Link Time Optimization? instantiation is
>>>>> in the Transition Unit not at link time as this would be too late.
>>>>> We have Extern Templates but then compiler in TU becomes blind to that
>>>>> code as it instantiationed in other TU, LTO can help there but
>>>>> then we have exactly the same case if we use normal functions.
>>>>>
>>>>> > benchmarks for g++:
>>>>> > Enter index (0 for A, 1 for B, 2 for C): 2
>>>>> > Time (ns) for switch: 33
>>>>> > Time (ns) for visit: 278
>>>>> > Time (ns) for ternary: 19
>>>>> > Time (ns) for subscript: 34
>>>>> > PS C:\Users\drnoo\Downloads> .\a.exe
>>>>> > Enter index (0 for A, 1 for B, 2 for C): 2
>>>>> > Time (ns) for switch: 33
>>>>> > Time (ns) for visit: 296
>>>>> > Time (ns) for ternary: 20
>>>>> > Time (ns) for subscript: 35
>>>>> > PS C:\Users\drnoo\Downloads> .\a.exe
>>>>> > Enter index (0 for A, 1 for B, 2 for C): 2
>>>>> > Time (ns) for switch: 34
>>>>> > Time (ns) for visit: 271
>>>>> > Time (ns) for ternary: 17
>>>>> > Time (ns) for subscript: 33
>>>>> > PS C:\Users\drnoo\Downloads> .\a.exe
>>>>> > Enter index (0 for A, 1 for B, 2 for C): 2
>>>>> > Time (ns) for switch: 34
>>>>> > Time (ns) for visit: 281
>>>>> > Time (ns) for ternary: 19
>>>>> > Time (ns) for subscript: 32
>>>>> > PS C:\Users\drnoo\Downloads> .\a.exe
>>>>> > Enter index (0 for A, 1 for B, 2 for C): 2
>>>>> > Time (ns) for switch: 34
>>>>> > Time (ns) for visit: 282
>>>>> > Time (ns) for ternary: 20
>>>>> > Time (ns) for subscript: 34
>>>>> > I really have to go to sleep now ( I am having some issues with
>>>>> visual studio 2026), I Hope, it would be acceptable for me to send the
>>>>> benchmarks for that tomorrow.
>>>>> >
>>>>> > regards, Muneem
>>>>> >
>>>>> >
>>>>> > On Thu, Apr 2, 2026 at 10:54 AM Thiago Macieira via Std-Proposals <
>>>>> std-proposals_at_[hidden]> wrote:
>>>>> >>
>>>>> >> On Wednesday, 1 April 2026 21:56:44 Pacific Daylight Time Muneem
>>>>> via Std-
>>>>> >> Proposals wrote:
>>>>> >> > /*
>>>>> >> > Time (ns) for switch: 168100
>>>>> >> > Time (ns) for visit: 3664100
>>>>> >> > Time (ns) for ternary: 190900
>>>>> >> > It keeps on getting worse!
>>>>> >> > */
>>>>> >>
>>>>> >> So far you've maybe shown that one implementation is generating bad
>>>>> code. Have
>>>>> >> you tried others?
>>>>> >>
>>>>> >> You need to prove that this is an inherent and unavoidable problem
>>>>> of the
>>>>> >> requirements, not that it just happened to be bad for this
>>>>> implementation.
>>>>> >> Just quickly reading the proposed benchmark code, it would seem
>>>>> there's no
>>>>> >> such inherent reason and you're making an unfounded and probably
>>>>> incorrect
>>>>> >> assumption about how things actually work.
>>>>> >>
>>>>> >> In fact, I pasted a portion of your code into godbolt just to see
>>>>> what the
>>>>> >> variant visit code, which you claim to be unnecessarily slow, would
>>>>> look like:
>>>>> >> https://gcc.godbolt.org/z/WK5bMzcae
>>>>> >>
>>>>> >> The first thing to note in the GCC/libstdc++ pane is that it does
>>>>> not use
>>>>> >> user_index. The compiler thinks it's a constant, meaning this
>>>>> benchmark is
>>>>> >> faulty. And thus it has constant-propagated this value and is
>>>>> *incredibly*
>>>>> >> efficient in doing nothing useful. MSVC did likewise.
>>>>> >>
>>>>> >> Since MSVC outputs the out-of-line copy of inlined functions, we
>>>>> can see the
>>>>> >> operator() expansion without the proapagation of the user_index
>>>>> constant. And
>>>>> >> it's no different than what a ternary or switch would look like.
>>>>> >>
>>>>> >> In the Clang/libc++ pane, we see indirect function calls. I don't
>>>>> know why
>>>>> >> libc++ std::variant is implemented this way, but it could be why it
>>>>> is slow
>>>>> >> for you if you're using this implementation. If you tell Clang to
>>>>> instead use
>>>>> >> libstdc++ (remove the last argument of the command-line), the
>>>>> indirect
>>>>> >> function call disappears and we see an unrolled loop of loading the
>>>>> value 10.
>>>>> >> That would mean Clang is even more efficient at doing nothing.
>>>>> >>
>>>>> >> Conclusion: it looks like your assumption that there is a problem
>>>>> to be solved
>>>>> >> is faulty. There is no problem.
>>>>> >>
>>>>> >> --
>>>>> >> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>>>>> >> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
>>>>> >> --
>>>>> >> Std-Proposals mailing list
>>>>> >> Std-Proposals_at_[hidden]
>>>>> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>> >
>>>>> > --
>>>>> > Std-Proposals mailing list
>>>>> > Std-Proposals_at_[hidden]
>>>>> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>

Received on 2026-04-02 20:18:46