Date: Sat, 03 Aug 2024 17:25:33 +0400
I also don't think that's how linkers work :).Neither the compiler I'm describing does. Yes, a lot of things get easier if you can do things in the linker. But what about dynamic libraries? You can't just concatenate code at dlopen() :)Yes you can, but with a little bit of gymnastics.Because if you re-read the proposal, you will understand that, it is a static analysis of the code, not a dynamic one, but it can be adapted,Plus , seriously , I think that it is time that C++ should have its word about dynamic linking....On Sat, Aug 3, 2024 at 12:01 PM Breno Guimarães via Std-Proposals <std-proposals_at_[hidden]> wrote:That means you can only use these pointers with consumeType?
If you do a for loop on the vec you can only call consumeType(useType(F))?
Because if you do other things inside the for loop in one translation unit, the compiler won't have visibility of that in the second translation unit. So it would not know to generate the code for the for loop body in the first translation unit for the types in the second translation unit.
Would that be a limitation? Or is there another way around that?
Em sex., 2 de ago. de 2024 23:25, organicoman <organicoman_at_[hidden]> escreveu:So how would you generate the switch for each type given that some instantiations of the template can happen only in another translation unit?Let me illustrate with an example:Let's put in header file--- temp.hfoo<T>consumeType <T>useType(effdecltype(foo) f)--- source1.cpp#include "temp.h"///// adding instancesvec.push_back(foo<int>); vec.push_back (foo<double>);/// calling function inside loopconsumeType(useType(F)); //<- to be replaced--- source2.cpp#include "temp.h"///// adding instancesvec.push_back(foo<float>); vec.push_back (foo<char>);/// calling function inside loopconsumeType(useType(F)); //<- to be replaced
Compilation pre-pass:Source1: pseudo code only (don't hate me)useType:case foo<int>: return xunion{int}case foo<double>: return xunion{double};Source2:useType:case foo<float>: return xunion{float}case foo<char>: return xunion{char}Compilation pass:Since useType is tagged then add a bogus jump instruction that will never be taken in that translation unit.Source1.o: pseudo assemblycmp esi, foo<int>; je LBL_xunion_int ;cmp esi,foo<double>je LBL_xunion_doublejmp 0xfffffff; // hook jumpsource2.o:cmp esi, foo<float>je LBL_xunion_floatcmp esi, foo<char>je LBL_xunion<char>jmp 0xfffffff // hook jumpLinker: source1.o source2.o main.o1- read mangled name of function 2- is it tagged as effdecltype no: do your usual job, then goto 63- is there any previous cmp address saved? yes: replace the 0xfffff of the hook jmp with the address of the saved cmp.4- save the current first cmp address.5- resolve mangled name to be the current function address.6- move to next7- repeat if not finished.Explanation:Let's assume the worst case scenario, where the two translation unit are compiled separately.Because if they were together, the compiler detects and fuse the two generated function assembly into one, by concatenating the switch cases. Ok, I guess now, everyone knows what happens in the compiler pre-pass.So at the end of it, we will have, for each translation unit an object file with different switch for the same function name.Let's tackle the assembly now.Because the compiler already knows, that useType is a tagged function, it will add to its assembly a bogus jump instruction, that is proven it will never be taken for that translation unit only.jmp 0xFFFFFFFF;Then if we want to use both *.o object files. Things are taken differently.Here comes the linker job.On the command line, the linker parses the *.o from left to right, like every one knows.As soon as it detects the mangled name of useType, it knows that it is tagged, it saves the address of the first compare instruction cmp esi, foo<T> then carry on its job, when it enters the second object file, it finds the same useType mangled name, thus it knows it is tagged, it checks if it has any compare instruction saved from previous *.o object file then stamps its address into the bogus jump instruction of the current object file then saves the first cmp of the current object file instruction, then marks the mangled name as resolved with address of useType in this object file.... and so on.... current *.ojmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] // illustration only don't hate meuntil the last object file that contains the same mangled name, at that point, it created a linked list between compare instructions.The mangled useType will be resolved to the last same mangled name seen, thus it becomes the entry point of the useType function.It's like you are concatenating switch statement.One observation worth reiterating,Like i said in last reply, we will have code bloat.That's because the body of useType before the switch statement will be deduplicated as many as it is generated, and all, except one, will be execute, the rest is dead meat never executed.From this, i recommand to make the body of functions like useType very small, let consumeType do the heavy lifting.Or make the linker smart to fuse the whole thing into one giant switch.But the first recommendation is faster to implement.That's one simple approach, many other are way efficient. But we need a compiler planner to detect best decisions. They will work in tendem.That's all. Em sex., 2 de ago. de 2024 19:48, organicoman <organicoman_at_[hidden]> escreveu:So it doesn't work for multiple translation units?It perfectly does.Because for each translation unit, you tag the functions and generate the code, stamp it, then pass it the compilation step...and so on, There is no deduplication of function names, nor generation of extraneous code. You have exactly what you need in that unit.Just one thing. Sometimes the generated code could have the same instructions but under different function name, that causes bloating, thus a need to a compiler planner.Look the idea is veeeeeery basic.I consider a C++ source code, after the compiler does template instantiation, as a type's database. Unfortunately the current standards does not allow us to query anything from it.Imagine if you have the power to query how many types your program uses, what is the type-size distribution chart, sorting by type size, by layout size.... etcThis is a data mine to optimize your code, reason about its implementation and make it more fast and efficient.That's what i am trying to start.I don't see the extent of my proposal but for sure, with some help, it will be developed to full potential.
Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <std-proposals_at_[hidden]> escreveu:Your solution requires at compile time information that is only available at runtime.I don't understand what runtime data you are referring to.In the algorithm explained before. All the data, that the compiler's pre-pass is manipulating, are compile time data.At the level of a translation unit. A function with orthogonal template parameters (foo<T>)cannot exist unless you instantiate it.
Add to that, the function is tagged as per the algorithm, so it is prone to record the types it was instantiated for.But as an optimization, the compiler will not record the list of orthogonal template parameters unless it sees that some variable is using it, otherwise no need to generate any code.The compiler confirmed that by the usage of the effdecltype operator inside the template argument of the vector in the example.Every time you instantiate foo with a type, you have to write it explicitly in the source code.foo<int>, foo<double>...etcNext to that, the address of a function is a constexpr value, you can use it as a non-type template parameter.So as a summary, the list of orthogonal template parameters is fetchable from the source code, and the address of the instances is constexpr value.These are the ingredients to make the algorithm work.
Even if the compiler could trace data across any type of data container (which it won't),It doesn't need to, because the information needed is the orthogonal template parameters not the number of elements in the container.You can have 100 element of type foo<int>, but the list of orthogonal template parameters is only { int }It's like you are counting how many instances of foo<T> you have in that translation unit. datapoints in container could take the form of any possible combination on any position and be influenced to translation units not visible at compile
time. But for your solution to work it requires a fix layout (which in practice doesn't happen) and perfect observability.
This is not a problem that needs to be solved, it's just how things work.Let me reiterate.Look at this snippet:vec.push_back(&foo<int>);int random;cin >> random;while(random - -) vec.push_back(&foo<char>);cin >> random;random = min(random, vec.size());while(random - -) vec.pop_back();As you can see, i cannot tell nor track the vector size. But it is irrelevant, because i have only two function pointer => { int, char }And by consequence my switch statment will contain only two cases:case &foo<int>:return __Xic_foo_eff_{ &foo<int>, int(0) };case &foo<char>:return __Xic_foo_eff_{ &foo<char>, char(0) };The two case lables are constexpr values known at compile time.
You cannot resolve this problem without appending extra data to containers that is used to inform about the nature of the data. And you cannot solve this without a runtime check and control flow to browse for the right handler.
This is a problem of distinguishability and computability, it is not something you can solve, it's a consequence of math as described by Shannon.
That's why std:any has additional data related to runtime type information, and that's why it always needs some form of selection or hash table or equivalent.Well, i just proved to you that i don't need that data, and that this approach is plausibly more effecient and safe.Let's try to stretch it more, shall we?
The way it's already done is already how this problem should be solved, no new feature necessary. What you are proposing requires the suspension of the laws of physics.
From: organicoman <organicoman_at_[hidden]r>
Sent: Friday, August 2, 2024 9:44:09 PM
To: Tiago Freire <tmiguelf_at_[hidden]>; std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Subject: Re: [std-proposals] Function overload set type information loss
And also unimplementable...
But you don't have to take my word for it, you can try it yourself.
But ... that's the whole purpose of this thread.
Putting down an idea in form of a paper, or an algorithm and tackle any contradiction it raise untill we hit the wall or we make it through.
And we have just started agreeing about the core idea. Which is
Adding an operator that does overload dispatch and code generation...etc
If it is not implementable, then point where, i will work on it...
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of organicoman via Std-Proposals <std-proposals_at_[hidden]>
Sent: Friday, August 2, 2024 9:15:36 PM
To: organicoman via Std-Proposals <std-proposals_at_[hidden]>
Cc: organicoman <organicoman_at_[hidden]>
Subject: Re: [std-proposals] Function overload set type information loss
Hello Gašper,
Since participants are starting to get what I am talking about, then allow me to answer your pending question.
This is what you've said in previous post.
There is another fundamental misunderstanding here.
The feature calls to "record all the orthogonal template parameters".
While that's not what that word means, I think regardless it's asking for global enumeration. C++ has a separate linking model. All the types/functions are not known within a single unit. The xunion is impossible
to compute.
This is a sister problem to what Jason is highlighting.
When i was describing my algorithm, i left a big X mark, hoping that someone will ask me, if people were attentive.
Line:
4-b-
switch
*.....
* inside function body: goto X
Then I left a hint at the bottom of the post in a form of code comment
double var2 = *reinterpret_cast<double*>
((reinterpret_cast<char*>(&(useType()))+sizeof(int));
// int since __X_index is int. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^
The __X_index is not of type
int but of type
void*, the unnamed struct that the compiler generate becomes:
struct __Xidfc_foo_eff
{
void* __X_index;
union
{ int __Xi; double __Xd ; float __Xf; char __Xc };
};
Since the compiler tagged
useType function, as described by the algorithm, then after finishing recording the orthogonal template parameters, it replaces the body with the following.
auto useType(effdecltype(foo) f)
{
f();
switch(f)
{
case &f<int>: //<- the pointer value
return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
case &f<double>:
return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
case &f<float>:
return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
case &f<char>:
return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
}
}
Pointers are comparable for equality.
As you can see, you can capture the xunion.... right? There is no fundamental misunderstanding ...
Plus everything is done by the compiler, there is no global map, no find algorithm, no type hashs, no type erasure (heap allocation), no manual intervention. All is automated at compile-time.
If you do a for loop on the vec you can only call consumeType(useType(F))?
Because if you do other things inside the for loop in one translation unit, the compiler won't have visibility of that in the second translation unit. So it would not know to generate the code for the for loop body in the first translation unit for the types in the second translation unit.
Would that be a limitation? Or is there another way around that?
Em sex., 2 de ago. de 2024 23:25, organicoman <organicoman_at_[hidden]> escreveu:So how would you generate the switch for each type given that some instantiations of the template can happen only in another translation unit?Let me illustrate with an example:Let's put in header file--- temp.hfoo<T>consumeType <T>useType(effdecltype(foo) f)--- source1.cpp#include "temp.h"///// adding instancesvec.push_back(foo<int>); vec.push_back (foo<double>);/// calling function inside loopconsumeType(useType(F)); //<- to be replaced--- source2.cpp#include "temp.h"///// adding instancesvec.push_back(foo<float>); vec.push_back (foo<char>);/// calling function inside loopconsumeType(useType(F)); //<- to be replaced
Compilation pre-pass:Source1: pseudo code only (don't hate me)useType:case foo<int>: return xunion{int}case foo<double>: return xunion{double};Source2:useType:case foo<float>: return xunion{float}case foo<char>: return xunion{char}Compilation pass:Since useType is tagged then add a bogus jump instruction that will never be taken in that translation unit.Source1.o: pseudo assemblycmp esi, foo<int>; je LBL_xunion_int ;cmp esi,foo<double>je LBL_xunion_doublejmp 0xfffffff; // hook jumpsource2.o:cmp esi, foo<float>je LBL_xunion_floatcmp esi, foo<char>je LBL_xunion<char>jmp 0xfffffff // hook jumpLinker: source1.o source2.o main.o1- read mangled name of function 2- is it tagged as effdecltype no: do your usual job, then goto 63- is there any previous cmp address saved? yes: replace the 0xfffff of the hook jmp with the address of the saved cmp.4- save the current first cmp address.5- resolve mangled name to be the current function address.6- move to next7- repeat if not finished.Explanation:Let's assume the worst case scenario, where the two translation unit are compiled separately.Because if they were together, the compiler detects and fuse the two generated function assembly into one, by concatenating the switch cases. Ok, I guess now, everyone knows what happens in the compiler pre-pass.So at the end of it, we will have, for each translation unit an object file with different switch for the same function name.Let's tackle the assembly now.Because the compiler already knows, that useType is a tagged function, it will add to its assembly a bogus jump instruction, that is proven it will never be taken for that translation unit only.jmp 0xFFFFFFFF;Then if we want to use both *.o object files. Things are taken differently.Here comes the linker job.On the command line, the linker parses the *.o from left to right, like every one knows.As soon as it detects the mangled name of useType, it knows that it is tagged, it saves the address of the first compare instruction cmp esi, foo<T> then carry on its job, when it enters the second object file, it finds the same useType mangled name, thus it knows it is tagged, it checks if it has any compare instruction saved from previous *.o object file then stamps its address into the bogus jump instruction of the current object file then saves the first cmp of the current object file instruction, then marks the mangled name as resolved with address of useType in this object file.... and so on.... current *.ojmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] // illustration only don't hate meuntil the last object file that contains the same mangled name, at that point, it created a linked list between compare instructions.The mangled useType will be resolved to the last same mangled name seen, thus it becomes the entry point of the useType function.It's like you are concatenating switch statement.One observation worth reiterating,Like i said in last reply, we will have code bloat.That's because the body of useType before the switch statement will be deduplicated as many as it is generated, and all, except one, will be execute, the rest is dead meat never executed.From this, i recommand to make the body of functions like useType very small, let consumeType do the heavy lifting.Or make the linker smart to fuse the whole thing into one giant switch.But the first recommendation is faster to implement.That's one simple approach, many other are way efficient. But we need a compiler planner to detect best decisions. They will work in tendem.That's all. Em sex., 2 de ago. de 2024 19:48, organicoman <organicoman_at_[hidden]> escreveu:So it doesn't work for multiple translation units?It perfectly does.Because for each translation unit, you tag the functions and generate the code, stamp it, then pass it the compilation step...and so on, There is no deduplication of function names, nor generation of extraneous code. You have exactly what you need in that unit.Just one thing. Sometimes the generated code could have the same instructions but under different function name, that causes bloating, thus a need to a compiler planner.Look the idea is veeeeeery basic.I consider a C++ source code, after the compiler does template instantiation, as a type's database. Unfortunately the current standards does not allow us to query anything from it.Imagine if you have the power to query how many types your program uses, what is the type-size distribution chart, sorting by type size, by layout size.... etcThis is a data mine to optimize your code, reason about its implementation and make it more fast and efficient.That's what i am trying to start.I don't see the extent of my proposal but for sure, with some help, it will be developed to full potential.
Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <std-proposals_at_[hidden]> escreveu:Your solution requires at compile time information that is only available at runtime.I don't understand what runtime data you are referring to.In the algorithm explained before. All the data, that the compiler's pre-pass is manipulating, are compile time data.At the level of a translation unit. A function with orthogonal template parameters (foo<T>)cannot exist unless you instantiate it.
Add to that, the function is tagged as per the algorithm, so it is prone to record the types it was instantiated for.But as an optimization, the compiler will not record the list of orthogonal template parameters unless it sees that some variable is using it, otherwise no need to generate any code.The compiler confirmed that by the usage of the effdecltype operator inside the template argument of the vector in the example.Every time you instantiate foo with a type, you have to write it explicitly in the source code.foo<int>, foo<double>...etcNext to that, the address of a function is a constexpr value, you can use it as a non-type template parameter.So as a summary, the list of orthogonal template parameters is fetchable from the source code, and the address of the instances is constexpr value.These are the ingredients to make the algorithm work.
Even if the compiler could trace data across any type of data container (which it won't),It doesn't need to, because the information needed is the orthogonal template parameters not the number of elements in the container.You can have 100 element of type foo<int>, but the list of orthogonal template parameters is only { int }It's like you are counting how many instances of foo<T> you have in that translation unit. datapoints in container could take the form of any possible combination on any position and be influenced to translation units not visible at compile
time. But for your solution to work it requires a fix layout (which in practice doesn't happen) and perfect observability.
This is not a problem that needs to be solved, it's just how things work.Let me reiterate.Look at this snippet:vec.push_back(&foo<int>);int random;cin >> random;while(random - -) vec.push_back(&foo<char>);cin >> random;random = min(random, vec.size());while(random - -) vec.pop_back();As you can see, i cannot tell nor track the vector size. But it is irrelevant, because i have only two function pointer => { int, char }And by consequence my switch statment will contain only two cases:case &foo<int>:return __Xic_foo_eff_{ &foo<int>, int(0) };case &foo<char>:return __Xic_foo_eff_{ &foo<char>, char(0) };The two case lables are constexpr values known at compile time.
You cannot resolve this problem without appending extra data to containers that is used to inform about the nature of the data. And you cannot solve this without a runtime check and control flow to browse for the right handler.
This is a problem of distinguishability and computability, it is not something you can solve, it's a consequence of math as described by Shannon.
That's why std:any has additional data related to runtime type information, and that's why it always needs some form of selection or hash table or equivalent.Well, i just proved to you that i don't need that data, and that this approach is plausibly more effecient and safe.Let's try to stretch it more, shall we?
The way it's already done is already how this problem should be solved, no new feature necessary. What you are proposing requires the suspension of the laws of physics.
From: organicoman <organicoman_at_[hidden]r>
Sent: Friday, August 2, 2024 9:44:09 PM
To: Tiago Freire <tmiguelf_at_[hidden]>; std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Subject: Re: [std-proposals] Function overload set type information loss
And also unimplementable...
But you don't have to take my word for it, you can try it yourself.
But ... that's the whole purpose of this thread.
Putting down an idea in form of a paper, or an algorithm and tackle any contradiction it raise untill we hit the wall or we make it through.
And we have just started agreeing about the core idea. Which is
Adding an operator that does overload dispatch and code generation...etc
If it is not implementable, then point where, i will work on it...
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of organicoman via Std-Proposals <std-proposals_at_[hidden]>
Sent: Friday, August 2, 2024 9:15:36 PM
To: organicoman via Std-Proposals <std-proposals_at_[hidden]>
Cc: organicoman <organicoman_at_[hidden]>
Subject: Re: [std-proposals] Function overload set type information loss
Hello Gašper,
Since participants are starting to get what I am talking about, then allow me to answer your pending question.
This is what you've said in previous post.
There is another fundamental misunderstanding here.
The feature calls to "record all the orthogonal template parameters".
While that's not what that word means, I think regardless it's asking for global enumeration. C++ has a separate linking model. All the types/functions are not known within a single unit. The xunion is impossible
to compute.
This is a sister problem to what Jason is highlighting.
When i was describing my algorithm, i left a big X mark, hoping that someone will ask me, if people were attentive.
Line:
4-b-
switch
*.....
* inside function body: goto X
Then I left a hint at the bottom of the post in a form of code comment
double var2 = *reinterpret_cast<double*>
((reinterpret_cast<char*>(&(useType()))+sizeof(int));
// int since __X_index is int. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^
The __X_index is not of type
int but of type
void*, the unnamed struct that the compiler generate becomes:
struct __Xidfc_foo_eff
{
void* __X_index;
union
{ int __Xi; double __Xd ; float __Xf; char __Xc };
};
Since the compiler tagged
useType function, as described by the algorithm, then after finishing recording the orthogonal template parameters, it replaces the body with the following.
auto useType(effdecltype(foo) f)
{
f();
switch(f)
{
case &f<int>: //<- the pointer value
return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
case &f<double>:
return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
case &f<float>:
return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
case &f<char>:
return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
}
}
Pointers are comparable for equality.
As you can see, you can capture the xunion.... right? There is no fundamental misunderstanding ...
Plus everything is done by the compiler, there is no global map, no find algorithm, no type hashs, no type erasure (heap allocation), no manual intervention. All is automated at compile-time.
-- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2024-08-03 13:25:50