ISOCPP std-proposals List: Re: [std-proposals] Function overload set type information loss

From: Breno Guimarães <brenorg_at_[hidden]>
Date: Sat, 3 Aug 2024 15:33:46 -0300

Or can you do:

for (auto elem : vec)
{
    auto r = useType(elem); // now r is not void anymore right? It's some
type that came from the template instantiation
    auto t = r.get(); // can't do this can you?
    doSomething(t);
}

Em sáb., 3 de ago. de 2024 15:22, Breno Guimarães <brenorg_at_[hidden]>
escreveu:

> Oh, so the only way to use this function pointer is by doing
> consumeType(useType(elem)) ?
>
>
> Em sáb., 3 de ago. de 2024 15:16, organicoman <organicoman_at_[hidden]>
> escreveu:
>
>> Like
>>
>> template<class T>
>> void doSomething(T arg) {}
>>
>> ...
>>
>> for (auto elem : vec)
>> {
>> auto r = elem();
>> auto t = r.get();
>> doSomething(t);
>> }
>>
>> Since you don't know the type of "r" at compile time, you don't know the
>> type of "t" at compile time, so you don't know for which types to
>> instantiate "doSomething".
>> You see the loop in blue in your example:
>> It takes a function pointer from the vector then call.
>> That function pointer is of type void(*)(void)
>> So that example is a compile error.
>> And that's what confused me. So i assumed you are talking about
>> consumeType and gave the previous explanation, that you considered it was a
>> diversion.
>>
>> I'm trying to understand what are the limitations the user will face when
>> using such feature.
>>
>> There are no limitations as far as i know, except that,
>> If you use a template function name other than consumeType, then you
>> have to tag it manually so the pre-pass includes it in code generation.
>> Otherwise consumeType is the default name that the pre-pass is using to
>> concatenate the switch cases as i showed you before.
>>
>> Keep exploring it. Because there is trick that i really want to show you,
>> but you need to understand deeply the concept or it will not make sense to
>> you.
>>
>> -------- Original message --------
>> From: Breno Guimarães <brenorg_at_[hidden]>
>> Date: 8/3/24 9:54 PM (GMT+04:00)
>> To: organicoman <organicoman_at_[hidden]>
>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>> tmiguelf_at_[hidden]>
>> Subject: Re: [std-proposals] Function overload set type information loss
>>
>> Like
>>
>> template<class T>
>> void doSomething(T arg) {}
>>
>> ...
>>
>> for (auto elem : vec)
>> {
>> auto r = elem();
>> auto t = r.get();
>> doSomething(t);
>> }
>>
>> Since you don't know the type of "r" at compile time, you don't know the
>> type of "t" at compile time, so you don't know for which types to
>> instantiate "doSomething".
>>
>> I'm trying to understand what are the limitations the user will face when
>> using such feature.
>> ...
>>
>> Em sáb., 3 de ago. de 2024 14:49, Breno Guimarães <brenorg_at_[hidden]>
>> escreveu:
>>
>>> And what would we get for a template taking arbitrary type?
>>>
>>> Em sáb., 3 de ago. de 2024 14:43, organicoman <organicoman_at_[hidden]>
>>> escreveu:
>>>
>>>> Not clear at all. You changed my question into another question and
>>>> answered that instead.
>>>>
>>>> I asked for which type we should instantiate the template that is being
>>>> invoked with argument "t" for which you don't know the type.
>>>>
>>>> Are you talking about your example? this one:
>>>>
>>>> for (auto elem: vec)
>>>> {
>>>> auto r = elem();
>>>> auto t = r.get(); //<-is this the t you are asking?
>>>> useType(t);
>>>> }
>>>>
>>>> If it is the case, then it means that you misunderstood the signature
>>>> of useType....please read the post carefully.
>>>>
>>>> useType takes a function pointer, not variable of arbitrary type
>>>>
>>>> Look what i wrote: (copy/paste)
>>>> auto useType(effdecltype(foo) f)
>>>> {
>>>> using T = __get_ortho_at__<0>(f);
>>>> f();
>>>> return T(0);
>>>>
>>>> }
>>>>
>>>> If I am mistaken then, copy/paste the code from my example and point
>>>> where you want more explanation.
>>>>
>>>>
>>>>
>>>>
>>>> Not clear at all. You changed my question into another question and
>>>> answered that instead.
>>>>
>>>> I asked for which type we should instantiate the template that is being
>>>> invoked with argument "t" for which you don't know the type.
>>>>
>>>> Em sáb., 3 de ago. de 2024 12:47, organicoman <organicoman_at_[hidden]>
>>>> escreveu:
>>>>
>>>>> Oooh i get you now.
>>>>> Let me simplify that for you by an example.
>>>>>
>>>>> *auto* someFunction();
>>>>>
>>>>> Can you tell what this function's return type?
>>>>>
>>>>> No.
>>>>>
>>>>> So let me give you help about the implementation of this function.
>>>>>
>>>>> 1- this function returns an unnamed type.
>>>>> 2- this unnamed type is an aggregate
>>>>> 3- at offset 8 bytes there is a union of unknown members count and
>>>>> types.
>>>>>
>>>>> Asked:
>>>>> Read the value of the union member.
>>>>>
>>>>> User asks:
>>>>> But to what type should i cast that union to?
>>>>>
>>>>> Answer:
>>>>> That's your responsibility.....Unless....
>>>>> You use my tool.
>>>>>
>>>>> Users asks:
>>>>> What is it?
>>>>>
>>>>> Answer:
>>>>> You create a template function, you tune its body to execute whatever
>>>>> you want based of whatever type you want, and my tool will take in charge
>>>>> the responsibility of which overload it should call.
>>>>>
>>>>> That's in basic language what is happening.
>>>>>
>>>>> You don't need to know what type in the union.
>>>>>
>>>>> If you want to adventure and cast it to a specific type, you have two
>>>>> possibility:
>>>>> * can get what you want,
>>>>> * you can get type mismatch => garbage.
>>>>>
>>>>> All that, is orchestrated by the pre-pass, so it is preferable to let
>>>>> the tool handle the case.
>>>>>
>>>>> What about now? Clear?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sent from my Galaxy
>>>>>
>>>>>
>>>>> -------- Original message --------
>>>>> From: Breno Guimarães <brenorg_at_[hidden]>
>>>>> Date: 8/3/24 6:56 PM (GMT+04:00)
>>>>> To: organicoman <organicoman_at_[hidden]>
>>>>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>>>>> tmiguelf_at_[hidden]>
>>>>> Subject: Re: [std-proposals] Function overload set type information
>>>>> loss
>>>>>
>>>>> You didn't see how that doesn't work yet?
>>>>> You don't know the possible types inside xunion because of the
>>>>> multiple TU thing. So you don't know how to fully implement
>>>>> operator==(xunion,int). Other translation units can't help you implement
>>>>> that either because they don't even know an operator== is needed since that
>>>>> is in a body of another cpp file they never saw.
>>>>>
>>>>> But let's see a perhaps more obvious example:
>>>>>
>>>>> for (auto elem: vec)
>>>>> {
>>>>> auto r = elem();
>>>>> auto t = r.get();
>>>>> useType(t);
>>>>> }
>>>>>
>>>>> "r" is the fancy xunion. The you call get on it. So you need to do the
>>>>> virtual dispatch thing. But you don't know the type of "t", because
>>>>> different types will return different things when you call "get()" on them.
>>>>> So how would you know for which types to instantiate useType?
>>>>>
>>>>> Em sáb., 3 de ago. de 2024 11:22, organicoman <organicoman_at_[hidden]>
>>>>> escreveu:
>>>>>
>>>>>> I don't think I was clear enough.
>>>>>>
>>>>>> What if you have:
>>>>>> for (auto elem : vec)
>>>>>> {
>>>>>> auto r = useType(elem);
>>>>>> if (r == 1)
>>>>>> consumeType(r);
>>>>>> }
>>>>>>
>>>>>> The compiler on this translation unit doesn't know all possible types
>>>>>> (because the vec is global and filled in other TUs like you antecipated).
>>>>>>
>>>>>> So this translation unit could reason about the types it knows and
>>>>>> generate the code for "if (r == 1)" but what about the types it doesn't
>>>>>> know?
>>>>>> The other translation units don't know the body of the for-loop so
>>>>>> they wouldn't be able to fill in that gap either.
>>>>>>
>>>>>> In this translation unit we could generate the if for "int" but
>>>>>> another one could require for float. Another translation unit could even
>>>>>> push a std::string which would make this code a syntax error.
>>>>>>
>>>>>> That's why I asked if it was a limitation that you couldn't do
>>>>>> anything other than directly call consumeType.
>>>>>>
>>>>>> The temporary value generated by useType would also have to be
>>>>>> destroyed and that's a variation of the if problem because it typically is
>>>>>> destroyed on the caller body. So you wouldn't know which destructor to call.
>>>>>>
>>>>>> I didn't understand your question very well, but i will use your
>>>>>> example and try to detail it.
>>>>>>
>>>>>> So let analyze this:
>>>>>>
>>>>>> if(r == 1)
>>>>>>
>>>>>> For the compiler, it sees it as
>>>>>>
>>>>>> if(operator == (r, 1))// bool operator==(xunion,int)
>>>>>>
>>>>>> It directly replace it by the cast i described previously.
>>>>>> And it will become
>>>>>>
>>>>>> if(
>>>>>> *reinterpret_cast<int*>
>>>>>> ((reinterpret_cast<char*>(&(r))+sizeof(void*))
>>>>>> == 1)
>>>>>>
>>>>>> And it becomes a regular comparison, calling
>>>>>>
>>>>>> bool operator(int, int)
>>>>>>
>>>>>> Remember what i said in that post:(copy/past)
>>>>>>
>>>>>> ------start------
>>>>>> "...
>>>>>> In the second case ( casting) which is the easiest, that line will be
>>>>>> replaced by
>>>>>>
>>>>>> double var2 = *reinterpret_cast<double*>
>>>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int since
>>>>>> __X_index is int
>>>>>>
>>>>>> No mater what is the union's active member.
>>>>>> The user have to take responsibility of his code.
>>>>>> And it is safe to still the guts of a temporary.
>>>>>> ..."
>>>>>> ------end--------
>>>>>>
>>>>>> The user takes responsibility to what he wants to cast to.
>>>>>>
>>>>>> You should package all operation inside a template function like
>>>>>> consumeType.
>>>>>>
>>>>>> You can think of consumeType as std::visit
>>>>>> which was written for you by your servant Mr.compiler.
>>>>>>
>>>>>> Did I answer your question?
>>>>>>
>>>>>> -------- Original message --------
>>>>>> From: Breno Guimarães <brenorg_at_[hidden]>
>>>>>> Date: 8/3/24 5:43 PM (GMT+04:00)
>>>>>> To: organicoman <organicoman_at_[hidden]>
>>>>>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>>>>>> tmiguelf_at_[hidden]>
>>>>>> Subject: Re: [std-proposals] Function overload set type information
>>>>>> loss
>>>>>>
>>>>>> I don't think I was clear enough.
>>>>>>
>>>>>> What if you have:
>>>>>> for (auto elem : vec)
>>>>>> {
>>>>>> auto r = useType(elem);
>>>>>> if (r == 1)
>>>>>> consumeType(r);
>>>>>> }
>>>>>>
>>>>>> The compiler on this translation unit doesn't know all possible types
>>>>>> (because the vec is global and filled in other TUs like you antecipated).
>>>>>>
>>>>>> So this translation unit could reason about the types it knows and
>>>>>> generate the code for "if (r == 1)" but what about the types it doesn't
>>>>>> know?
>>>>>> The other translation units don't know the body of the for-loop so
>>>>>> they wouldn't be able to fill in that gap either.
>>>>>>
>>>>>> In this translation unit we could generate the if for "int" but
>>>>>> another one could require for float. Another translation unit could even
>>>>>> push a std::string which would make this code a syntax error.
>>>>>>
>>>>>> That's why I asked if it was a limitation that you couldn't do
>>>>>> anything other than directly call consumeType.
>>>>>>
>>>>>> The temporary value generated by useType would also have to be
>>>>>> destroyed and that's a variation of the if problem because it typically is
>>>>>> destroyed on the caller body. So you wouldn't know which destructor to call.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Em sáb., 3 de ago. de 2024 10:08, organicoman <organicoman_at_[hidden]>
>>>>>> escreveu:
>>>>>>
>>>>>>> That means you can only use these pointers with consumeType?
>>>>>>>
>>>>>>> If you do a for loop on the vec you can only call
>>>>>>> consumeType(useType(F))?
>>>>>>> Because if you do other things inside the for loop in one
>>>>>>> translation unit, the compiler won't have visibility of that in the second
>>>>>>> translation unit. So it would not know to generate the code for the for
>>>>>>> loop body in the first translation unit for the types in the second
>>>>>>> translation unit.
>>>>>>>
>>>>>>> Would that be a limitation? Or is there another way around that?
>>>>>>>
>>>>>>> As far as the vector is concerned, its elements type is void(*)(),
>>>>>>> check the algorithm step 10-2
>>>>>>>
>>>>>>> And if you recall in the explanation of that algorithm, i said:
>>>>>>> (copy/paste)
>>>>>>>
>>>>>>> -------- start---------
>>>>>>>
>>>>>>> "....
>>>>>>>
>>>>>>> If useType return value is assigned to a variable we have two cases.
>>>>>>>
>>>>>>> auto var1 = useType (F); // perfect capture
>>>>>>> double var2 = useType(F); // casting
>>>>>>>
>>>>>>> The first case, which is the most interesting and powerful, is most
>>>>>>> useful if you pass that var1 to a function template, consumeType, in our
>>>>>>> example above
>>>>>>> ...."
>>>>>>> -------end-----
>>>>>>>
>>>>>>> So, all revolves around useType, and its usage scope.
>>>>>>>
>>>>>>> Also, since the vector is a local variable to that translation
>>>>>>> unit, then the scope of useType is just the instantiated foo<T> in
>>>>>>> that unit. The rest is irrelevant.
>>>>>>> You don't pay for what you don't need.
>>>>>>>
>>>>>>> Allow me to anticipate one question.
>>>>>>>
>>>>>>> What if the vector was a global variable, filled accross
>>>>>>> translation unit?
>>>>>>> *It doesn't matter*.
>>>>>>> The linker is doing the concatenation algorithm that i described to
>>>>>>> you, so all is fine and dandy.
>>>>>>>
>>>>>>> Let's use an example:
>>>>>>> -----Source1.cpp
>>>>>>> #include "*temp.h*"
>>>>>>>
>>>>>>> extern vector<effdecltype(foo)> globalVector;
>>>>>>>
>>>>>>> /// no local insertion into the global vector
>>>>>>> /// use it directly in the loop
>>>>>>> for(F : globalVector)
>>>>>>> consumeType(useType(F));
>>>>>>>
>>>>>>> -----Source2.cpp
>>>>>>> #include "*temp.h*"
>>>>>>>
>>>>>>> extern vector<effdecltype(foo)> globalVector;
>>>>>>>
>>>>>>> /// just fill the global vector...
>>>>>>> globalVec.push_back(foo<int>);
>>>>>>> globalVec.push_back(foo<double>);
>>>>>>>
>>>>>>> *Explanation*:
>>>>>>> In the 1st translation unit, the virtual overload dispatch mechanism
>>>>>>> (read explanation in the post about the pre-pass algorithm)
>>>>>>> sees that, in this translation unit, the compiler didn't record any
>>>>>>> instance of foo<T>, so it emits an error (substitution failure
>>>>>>> cannot call a template function consumeType), *that if the vector
>>>>>>> was a local variable*, but because the compiler knows it is a
>>>>>>> global variable (remember that the vector is the entity which
>>>>>>> triggered this mechanism), then it replaces the whole overload
>>>>>>> dispatch switch with just the name of the generated function and let the
>>>>>>> linker resolve it. Like the following.
>>>>>>>
>>>>>>> ----starts like this
>>>>>>>
>>>>>>> for(F: globalVector)
>>>>>>> consumeType(useType(F));
>>>>>>>
>>>>>>> ....becomes
>>>>>>>
>>>>>>> // generared pre-pass code
>>>>>>> extern struct __X_foo_eff;
>>>>>>> __X_foo_eff useType(effdecltype(foo));
>>>>>>> void __X_consumeType(__X_foo_eff);
>>>>>>>
>>>>>>> // down in code
>>>>>>> for(F: globalVector)
>>>>>>> __X_consumeType(useType(F));
>>>>>>>
>>>>>>> The rest is known to everyone.
>>>>>>>
>>>>>>> The overall observation about this mechanism, is that, it doesn't
>>>>>>> leak type information to the runtime.
>>>>>>> I don't have a table let say
>>>>>>> hash(type)<->function_ptr
>>>>>>> Or
>>>>>>> textMangle(type)<->function_ptr
>>>>>>> Usually used by std::any or, some reflection algorithms.
>>>>>>> Which make this approach safe against reverse engineering the {hash,
>>>>>>> text}, and doesn't break the ASLR safety mechanism by saving function
>>>>>>> pointer in permanent global container.
>>>>>>>
>>>>>>> I hope it is clear..... at this point, if everyone understood the
>>>>>>> fundamental of this paper then I guess that you can start contributing.
>>>>>>>
>>>>>>> That's all.
>>>>>>>
>>>>>>> Em sex., 2 de ago. de 2024 23:25, organicoman <organicoman_at_[hidden]>
>>>>>>> escreveu:
>>>>>>>
>>>>>>>>
>>>>>>>> So how would you generate the switch for each type given that some
>>>>>>>> instantiations of the template can happen only in another translation unit?
>>>>>>>>
>>>>>>>> Let me illustrate with an example:
>>>>>>>>
>>>>>>>> Let's put in header file
>>>>>>>>
>>>>>>>> --- *temp.h*
>>>>>>>>
>>>>>>>> foo<T>
>>>>>>>>
>>>>>>>> consumeType <T>
>>>>>>>>
>>>>>>>> useType(effdecltype(foo) f)
>>>>>>>>
>>>>>>>> --- source1.cpp
>>>>>>>>
>>>>>>>> #include "*temp.h*"
>>>>>>>>
>>>>>>>> ///// adding instances
>>>>>>>>
>>>>>>>> vec.push_back(foo<int>);
>>>>>>>>
>>>>>>>> vec.push_back (foo<double>);
>>>>>>>>
>>>>>>>> /// calling function inside loop
>>>>>>>>
>>>>>>>> consumeType(useType(F)); //<- to be replaced
>>>>>>>>
>>>>>>>> --- source2.cpp
>>>>>>>>
>>>>>>>> #include "*temp.h*"
>>>>>>>>
>>>>>>>> ///// adding instances
>>>>>>>>
>>>>>>>> vec.push_back(foo<float>);
>>>>>>>>
>>>>>>>> vec.push_back (foo<char>);
>>>>>>>>
>>>>>>>> /// calling function inside loop
>>>>>>>>
>>>>>>>> consumeType(useType(F)); //<- to be replaced
>>>>>>>> *Compilation pre-pass:*
>>>>>>>> Source1: pseudo code only (don't hate me)
>>>>>>>> useType:
>>>>>>>> case foo<int>:
>>>>>>>> return xunion{int}
>>>>>>>> case foo<double>:
>>>>>>>> return xunion{double};
>>>>>>>> Source2:
>>>>>>>> useType:
>>>>>>>> case foo<float>:
>>>>>>>> return xunion{float}
>>>>>>>> case foo<char>:
>>>>>>>> return xunion{char}
>>>>>>>>
>>>>>>>> *Compilation pass:*
>>>>>>>> Since useType is tagged then add a bogus jump instruction that
>>>>>>>> will never be taken in that translation unit.
>>>>>>>> Source1.o: pseudo assembly
>>>>>>>> cmp esi, foo<int>;
>>>>>>>> je LBL_xunion_int ;
>>>>>>>> cmp esi,foo<double>
>>>>>>>> je LBL_xunion_double
>>>>>>>> jmp 0xfffffff; // hook jump
>>>>>>>> source2.o:
>>>>>>>> cmp esi, foo<float>
>>>>>>>> je LBL_xunion_float
>>>>>>>> cmp esi, foo<char>
>>>>>>>> je LBL_xunion<char>
>>>>>>>> jmp 0xfffffff // hook jump
>>>>>>>> *Linker: source1.o source2.o main.o*
>>>>>>>> 1- read mangled name of function
>>>>>>>> 2- is it tagged as effdecltype
>>>>>>>> no: do your usual job, then goto 6
>>>>>>>> 3- is there any previous cmp address saved?
>>>>>>>> yes: replace the 0xfffff of the hook jmp with the address of
>>>>>>>> the saved cmp.
>>>>>>>> 4- save the current first cmp address.
>>>>>>>> 5- resolve mangled name to be the current function address.
>>>>>>>> 6- move to next
>>>>>>>> 7- repeat if not finished.
>>>>>>>>
>>>>>>>> *Explanation:*
>>>>>>>> Let's assume the worst case scenario, where the two translation
>>>>>>>> unit are compiled separately.
>>>>>>>> Because if they were together, the compiler detects and fuse the
>>>>>>>> two generated function assembly into one, by concatenating the switch cases.
>>>>>>>>
>>>>>>>> Ok, I guess now, everyone knows what happens in the compiler
>>>>>>>> pre-pass.
>>>>>>>> So at the end of it, we will have, for each translation unit an
>>>>>>>> object file with different switch for the same function name.
>>>>>>>> Let's tackle the assembly now.
>>>>>>>> Because the compiler already knows, that useType is a tagged
>>>>>>>> function, it will add to its assembly a bogus jump instruction, that is
>>>>>>>> proven it will never be taken for that translation unit only.
>>>>>>>> jmp 0xFFFFFFFF;
>>>>>>>> Then if we want to use both *.o object files. Things are taken
>>>>>>>> differently.
>>>>>>>> Here comes the linker job.
>>>>>>>> On the command line, the linker parses the *.o from left to right,
>>>>>>>> like every one knows.
>>>>>>>> As soon as it detects the mangled name of useType, it knows that
>>>>>>>> it is tagged, it saves the address of the first compare instruction cmp
>>>>>>>> esi, foo<T> then carry on its job, when it enters the second
>>>>>>>> object file, it finds the same useType mangled name, thus it knows
>>>>>>>> it is tagged, it checks if it has any compare instruction saved from
>>>>>>>> previous *.o object file then stamps its address into the bogus jump
>>>>>>>> instruction of the current object file then saves the first cmp of
>>>>>>>> the current object file instruction, then marks the mangled name as
>>>>>>>> resolved with address of useType in this object file.... and so
>>>>>>>> on....
>>>>>>>>
>>>>>>>> current *.o
>>>>>>>> jmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] //
>>>>>>>> illustration only don't hate me
>>>>>>>>
>>>>>>>> until the last object file that contains the same mangled name, at
>>>>>>>> that point, it created a linked list between compare instructions.
>>>>>>>> The mangled useType will be resolved to the last same mangled name
>>>>>>>> seen, thus it becomes the entry point of the useType function.
>>>>>>>>
>>>>>>>> It's like you are concatenating switch statement.
>>>>>>>> One observation worth reiterating,
>>>>>>>> Like i said in last reply, we will have code bloat.
>>>>>>>> That's because the body of useType before the switch statement
>>>>>>>> will be deduplicated as many as it is generated, and all, except one, will
>>>>>>>> be execute, the rest is dead meat never executed.
>>>>>>>> From this, i recommand to make the body of functions like useType
>>>>>>>> very small, let consumeType do the heavy lifting.
>>>>>>>> Or make the linker smart to fuse the whole thing into one giant
>>>>>>>> switch.
>>>>>>>> But the first recommendation is faster to implement.
>>>>>>>>
>>>>>>>> That's one simple approach, many other are way efficient. But we
>>>>>>>> need a compiler planner to detect best decisions. They will work in tendem.
>>>>>>>>
>>>>>>>> That's all.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Em sex., 2 de ago. de 2024 19:48, organicoman <organicoman_at_[hidden]>
>>>>>>>> escreveu:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So it doesn't work for multiple translation units?
>>>>>>>>>
>>>>>>>>> It perfectly does.
>>>>>>>>>
>>>>>>>>> Because for each translation unit, you tag the functions and
>>>>>>>>> generate the code, stamp it, then pass it the compilation step...and so on,
>>>>>>>>>
>>>>>>>>> There is no deduplication of function names, nor generation of
>>>>>>>>> extraneous code. You have exactly what you need in that unit.
>>>>>>>>>
>>>>>>>>> Just one thing.
>>>>>>>>>
>>>>>>>>> Sometimes the generated code could have the same instructions but
>>>>>>>>> under different function name, that causes bloating, thus a need to a
>>>>>>>>> compiler planner.
>>>>>>>>>
>>>>>>>>> Look the idea is veeeeeery basic.
>>>>>>>>>
>>>>>>>>> I consider a C++ source code, after the compiler does template
>>>>>>>>> instantiation, as a type's database. Unfortunately the current standards
>>>>>>>>> does not allow us to query anything from it.
>>>>>>>>>
>>>>>>>>> Imagine if you have the power to query how many types your program
>>>>>>>>> uses, what is the type-size distribution chart, sorting by type size, by
>>>>>>>>> layout size.... etc
>>>>>>>>>
>>>>>>>>> This is a data mine to optimize your code, reason about its
>>>>>>>>> implementation and make it more fast and efficient.
>>>>>>>>>
>>>>>>>>> That's what i am trying to start.
>>>>>>>>>
>>>>>>>>> I don't see the extent of my proposal but for sure, with some
>>>>>>>>> help, it will be developed to full potential.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <
>>>>>>>>> std-proposals_at_[hidden]> escreveu:
>>>>>>>>>
>>>>>>>>>> Your solution requires at compile time information that is only
>>>>>>>>>> available at runtime.
>>>>>>>>>>
>>>>>>>>>> I don't understand what runtime data you are referring to.
>>>>>>>>>>
>>>>>>>>>> In the algorithm explained before. All the data, that the
>>>>>>>>>> compiler's pre-pass is manipulating, are compile time data.
>>>>>>>>>> At the level of a translation unit. A function with *orthogonal
>>>>>>>>>> template parameters* (foo<T>)
>>>>>>>>>> cannot exist unless you instantiate it.
>>>>>>>>>> Add to that, the function is tagged as per the algorithm, so it
>>>>>>>>>> is prone to record the types it was instantiated for.
>>>>>>>>>> But as an optimization, the compiler will not record the list of *orthogonal
>>>>>>>>>> **template parameters* unless it sees that some variable is
>>>>>>>>>> using it, otherwise no need to generate any code.
>>>>>>>>>> The compiler confirmed that by the usage of the effdecltype
>>>>>>>>>> operator inside the template argument of the vector in the
>>>>>>>>>> example.
>>>>>>>>>> Every time you instantiate foo with a type, you have to write it
>>>>>>>>>> explicitly in the source code.
>>>>>>>>>> foo<int>, foo<double>...etc
>>>>>>>>>> Next to that, the address of a function is a constexpr value, you
>>>>>>>>>> can use it as a non-type template parameter.
>>>>>>>>>> So as a summary, the list of *orthogonal template parameters* is
>>>>>>>>>> fetchable from the source code, and the address of the instances is
>>>>>>>>>> constexpr value.
>>>>>>>>>> These are the ingredients to make the algorithm work.
>>>>>>>>>>
>>>>>>>>>> Even if the compiler could trace data across any type of data
>>>>>>>>>> container (which it won't),
>>>>>>>>>>
>>>>>>>>>> It doesn't need to, because the information needed is the *orthogonal
>>>>>>>>>> template parameters* not the number of elements in the container.
>>>>>>>>>> You can have 100 element of type foo<int>, but the list of *orthogonal
>>>>>>>>>> template parameters* is only { int }
>>>>>>>>>> It's like you are counting how many instances of foo<T> you have
>>>>>>>>>> in that translation unit.
>>>>>>>>>>
>>>>>>>>>> datapoints in container could take the form of any possible
>>>>>>>>>> combination on any position and be influenced to translation units not
>>>>>>>>>> visible at compile time. But for your solution to work it requires a fix
>>>>>>>>>> layout (which in practice doesn't happen) and perfect observability.
>>>>>>>>>> This is not a problem that needs to be solved, it's just how
>>>>>>>>>> things work.
>>>>>>>>>>
>>>>>>>>>> Let me reiterate.
>>>>>>>>>> Look at this snippet:
>>>>>>>>>>
>>>>>>>>>> vec.push_back(&foo<int>);
>>>>>>>>>> int random;
>>>>>>>>>> cin >> random;
>>>>>>>>>> while(random - -) vec.push_back(&foo<char>);
>>>>>>>>>> cin >> random;
>>>>>>>>>> random = min(random, vec.size());
>>>>>>>>>> while(random - -) vec.pop_back();
>>>>>>>>>>
>>>>>>>>>> As you can see, i cannot tell nor track the vector size. But it
>>>>>>>>>> is irrelevant, because i have only two function pointer => { int, char }
>>>>>>>>>> And by consequence my switch statment will contain only two cases:
>>>>>>>>>>
>>>>>>>>>> case &foo<int>:
>>>>>>>>>> return __Xic_foo_eff_{ &foo<int>, int(0) };
>>>>>>>>>> case &foo<char>:
>>>>>>>>>> return __Xic_foo_eff_{ &foo<char>, char(0) };
>>>>>>>>>>
>>>>>>>>>> The two case lables are constexpr values known at compile time.
>>>>>>>>>>
>>>>>>>>>> You cannot resolve this problem without appending extra data to
>>>>>>>>>> containers that is used to inform about the nature of the data. And you
>>>>>>>>>> cannot solve this without a runtime check and control flow to browse for
>>>>>>>>>> the right handler.
>>>>>>>>>> This is a problem of distinguishability and computability, it is
>>>>>>>>>> not something you can solve, it's a consequence of math as described by
>>>>>>>>>> Shannon.
>>>>>>>>>>
>>>>>>>>>> That's why std:any has additional data related to runtime type
>>>>>>>>>> information, and that's why it always needs some form of selection or hash
>>>>>>>>>> table or equivalent.
>>>>>>>>>>
>>>>>>>>>> Well, i just proved to you that i don't need that data, and that
>>>>>>>>>> this approach is plausibly more effecient and safe.
>>>>>>>>>> Let's try to stretch it more, shall we?
>>>>>>>>>>
>>>>>>>>>> The way it's already done is already how this problem should be
>>>>>>>>>> solved, no new feature necessary. What you are proposing requires the
>>>>>>>>>> suspension of the laws of physics.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* organicoman <organicoman_at_[hidden]>
>>>>>>>>>> *Sent:* Friday, August 2, 2024 9:44:09 PM
>>>>>>>>>> *To:* Tiago Freire <tmiguelf_at_[hidden]>;
>>>>>>>>>> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>>>>>>>>> *Subject:* Re: [std-proposals] Function overload set type
>>>>>>>>>> information loss
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And also unimplementable...
>>>>>>>>>> But you don't have to take my word for it, you can try it
>>>>>>>>>> yourself.
>>>>>>>>>>
>>>>>>>>>> But ... that's the whole purpose of this thread.
>>>>>>>>>> Putting down an idea in form of a paper, or an algorithm and
>>>>>>>>>> tackle any contradiction it raise untill we hit the wall or we make it
>>>>>>>>>> through.
>>>>>>>>>> And we have just started agreeing about the core idea. Which is
>>>>>>>>>> Adding an operator that does overload dispatch and code
>>>>>>>>>> generation...etc
>>>>>>>>>>
>>>>>>>>>> If it is not implementable, then point where, i will work on
>>>>>>>>>> it...
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]>
>>>>>>>>>> on behalf of organicoman via Std-Proposals <
>>>>>>>>>> std-proposals_at_[hidden]>
>>>>>>>>>> *Sent:* Friday, August 2, 2024 9:15:36 PM
>>>>>>>>>> *To:* organicoman via Std-Proposals <
>>>>>>>>>> std-proposals_at_[hidden]>
>>>>>>>>>> *Cc:* organicoman <organicoman_at_[hidden]>
>>>>>>>>>> *Subject:* Re: [std-proposals] Function overload set type
>>>>>>>>>> information loss
>>>>>>>>>>
>>>>>>>>>> Hello Gašper,
>>>>>>>>>> Since participants are starting to get what I am talking about,
>>>>>>>>>> then allow me to answer your pending question.
>>>>>>>>>> This is what you've said in previous post.
>>>>>>>>>>
>>>>>>>>>> There is another fundamental misunderstanding here.
>>>>>>>>>>
>>>>>>>>>> The feature calls to "record all the orthogonal template
>>>>>>>>>> parameters".
>>>>>>>>>>
>>>>>>>>>> While that's not what that word means, I think regardless it's
>>>>>>>>>> asking for global enumeration. C++ has a separate linking model. All the
>>>>>>>>>> types/functions are not known within a single unit. The xunion is
>>>>>>>>>> impossible to compute.
>>>>>>>>>>
>>>>>>>>>> This is a sister problem to what Jason is highlighting.
>>>>>>>>>>
>>>>>>>>>> When i was describing my algorithm, i left a big X mark, hoping
>>>>>>>>>> that someone will ask me, if people were attentive.
>>>>>>>>>> Line:
>>>>>>>>>> 4-b-
>>>>>>>>>> switch
>>>>>>>>>> *.....
>>>>>>>>>> * inside function body: goto X
>>>>>>>>>>
>>>>>>>>>> Then I left a hint at the bottom of the post in a form of code
>>>>>>>>>> comment
>>>>>>>>>>
>>>>>>>>>> double var2 = *reinterpret_cast<double*>
>>>>>>>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int
>>>>>>>>>> since __X_index is int.
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> ^^^^^
>>>>>>>>>>
>>>>>>>>>> The __X_index is not of type int but of type void*, the unnamed
>>>>>>>>>> struct that the compiler generate becomes:
>>>>>>>>>> struct __Xidfc_foo_eff
>>>>>>>>>> {
>>>>>>>>>> void* __X_index;
>>>>>>>>>> union
>>>>>>>>>> { int __Xi; double __Xd ; float __Xf; char __Xc };
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> Since the compiler tagged useType function, as described by the
>>>>>>>>>> algorithm, then after finishing recording the orthogonal template
>>>>>>>>>> parameters, it replaces the body with the following.
>>>>>>>>>>
>>>>>>>>>> auto useType(effdecltype(foo) f)
>>>>>>>>>> {
>>>>>>>>>> f();
>>>>>>>>>> switch(f)
>>>>>>>>>> {
>>>>>>>>>> case &f<int>: //<- the pointer value
>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
>>>>>>>>>>
>>>>>>>>>> case &f<double>:
>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
>>>>>>>>>>
>>>>>>>>>> case &f<float>:
>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
>>>>>>>>>>
>>>>>>>>>> case &f<char>:
>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Pointers are comparable for equality.
>>>>>>>>>>
>>>>>>>>>> As you can see, you can capture the xunion.... right? There is
>>>>>>>>>> no fundamental misunderstanding ...
>>>>>>>>>> Plus everything is done by the compiler, there is no global map,
>>>>>>>>>> no find algorithm, no type hashs, no type erasure (heap allocation), no
>>>>>>>>>> manual intervention. All is automated at compile-time.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Std-Proposals mailing list
>>>>>>>>>> Std-Proposals_at_[hidden]
>>>>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>>>>>>
>>>>>>>>>>

Received on 2024-08-03 18:34:02