C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Function overload set type information loss

From: Gašper Ažman <gasper.azman_at_[hidden]>
Date: Sat, 3 Aug 2024 12:52:17 +0100
I also don't think that's how linkers work :). Yes, a lot of things get
easier if you can do things in the linker. But what about dynamic
libraries? You can't just concatenate code at dlopen() :)

On Sat, Aug 3, 2024 at 12:01 PM Breno Guimarães via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> That means you can only use these pointers with consumeType?
>
> If you do a for loop on the vec you can only call consumeType(useType(F))?
> Because if you do other things inside the for loop in one translation
> unit, the compiler won't have visibility of that in the second translation
> unit. So it would not know to generate the code for the for loop body in
> the first translation unit for the types in the second translation unit.
>
> Would that be a limitation? Or is there another way around that?
>
> Em sex., 2 de ago. de 2024 23:25, organicoman <organicoman_at_[hidden]>
> escreveu:
>
>>
>> So how would you generate the switch for each type given that some
>> instantiations of the template can happen only in another translation unit?
>>
>> Let me illustrate with an example:
>>
>> Let's put in header file
>>
>> --- *temp.h*
>>
>> foo<T>
>>
>> consumeType <T>
>>
>> useType(effdecltype(foo) f)
>>
>> --- source1.cpp
>>
>> #include "*temp.h*"
>>
>> ///// adding instances
>>
>> vec.push_back(foo<int>);
>>
>> vec.push_back (foo<double>);
>>
>> /// calling function inside loop
>>
>> consumeType(useType(F)); //<- to be replaced
>>
>> --- source2.cpp
>>
>> #include "*temp.h*"
>>
>> ///// adding instances
>>
>> vec.push_back(foo<float>);
>>
>> vec.push_back (foo<char>);
>>
>> /// calling function inside loop
>>
>> consumeType(useType(F)); //<- to be replaced
>> *Compilation pre-pass:*
>> Source1: pseudo code only (don't hate me)
>> useType:
>> case foo<int>:
>> return xunion{int}
>> case foo<double>:
>> return xunion{double};
>> Source2:
>> useType:
>> case foo<float>:
>> return xunion{float}
>> case foo<char>:
>> return xunion{char}
>>
>> *Compilation pass:*
>> Since useType is tagged then add a bogus jump instruction that will
>> never be taken in that translation unit.
>> Source1.o: pseudo assembly
>> cmp esi, foo<int>;
>> je LBL_xunion_int ;
>> cmp esi,foo<double>
>> je LBL_xunion_double
>> jmp 0xfffffff; // hook jump
>> source2.o:
>> cmp esi, foo<float>
>> je LBL_xunion_float
>> cmp esi, foo<char>
>> je LBL_xunion<char>
>> jmp 0xfffffff // hook jump
>> *Linker: source1.o source2.o main.o*
>> 1- read mangled name of function
>> 2- is it tagged as effdecltype
>> no: do your usual job, then goto 6
>> 3- is there any previous cmp address saved?
>> yes: replace the 0xfffff of the hook jmp with the address of the
>> saved cmp.
>> 4- save the current first cmp address.
>> 5- resolve mangled name to be the current function address.
>> 6- move to next
>> 7- repeat if not finished.
>>
>> *Explanation:*
>> Let's assume the worst case scenario, where the two translation unit are
>> compiled separately.
>> Because if they were together, the compiler detects and fuse the two
>> generated function assembly into one, by concatenating the switch cases.
>>
>> Ok, I guess now, everyone knows what happens in the compiler pre-pass.
>> So at the end of it, we will have, for each translation unit an object
>> file with different switch for the same function name.
>> Let's tackle the assembly now.
>> Because the compiler already knows, that useType is a tagged function,
>> it will add to its assembly a bogus jump instruction, that is proven it
>> will never be taken for that translation unit only.
>> jmp 0xFFFFFFFF;
>> Then if we want to use both *.o object files. Things are taken
>> differently.
>> Here comes the linker job.
>> On the command line, the linker parses the *.o from left to right, like
>> every one knows.
>> As soon as it detects the mangled name of useType, it knows that it is
>> tagged, it saves the address of the first compare instruction cmp esi,
>> foo<T> then carry on its job, when it enters the second object file, it
>> finds the same useType mangled name, thus it knows it is tagged, it
>> checks if it has any compare instruction saved from previous *.o object
>> file then stamps its address into the bogus jump instruction of the current
>> object file then saves the first cmp of the current object file
>> instruction, then marks the mangled name as resolved with address of
>> useType in this object file.... and so on....
>>
>> current *.o
>> jmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] //
>> illustration only don't hate me
>>
>> until the last object file that contains the same mangled name, at that
>> point, it created a linked list between compare instructions.
>> The mangled useType will be resolved to the last same mangled name seen,
>> thus it becomes the entry point of the useType function.
>>
>> It's like you are concatenating switch statement.
>> One observation worth reiterating,
>> Like i said in last reply, we will have code bloat.
>> That's because the body of useType before the switch statement will be
>> deduplicated as many as it is generated, and all, except one, will be
>> execute, the rest is dead meat never executed.
>> From this, i recommand to make the body of functions like useType very
>> small, let consumeType do the heavy lifting.
>> Or make the linker smart to fuse the whole thing into one giant switch.
>> But the first recommendation is faster to implement.
>>
>> That's one simple approach, many other are way efficient. But we need a
>> compiler planner to detect best decisions. They will work in tendem.
>>
>> That's all.
>>
>>
>>
>>
>>
>> Em sex., 2 de ago. de 2024 19:48, organicoman <organicoman_at_[hidden]>
>> escreveu:
>>
>>>
>>>
>>> So it doesn't work for multiple translation units?
>>>
>>> It perfectly does.
>>>
>>> Because for each translation unit, you tag the functions and generate
>>> the code, stamp it, then pass it the compilation step...and so on,
>>>
>>> There is no deduplication of function names, nor generation of
>>> extraneous code. You have exactly what you need in that unit.
>>>
>>> Just one thing.
>>>
>>> Sometimes the generated code could have the same instructions but under
>>> different function name, that causes bloating, thus a need to a compiler
>>> planner.
>>>
>>> Look the idea is veeeeeery basic.
>>>
>>> I consider a C++ source code, after the compiler does template
>>> instantiation, as a type's database. Unfortunately the current standards
>>> does not allow us to query anything from it.
>>>
>>> Imagine if you have the power to query how many types your program uses,
>>> what is the type-size distribution chart, sorting by type size, by layout
>>> size.... etc
>>>
>>> This is a data mine to optimize your code, reason about its
>>> implementation and make it more fast and efficient.
>>>
>>> That's what i am trying to start.
>>>
>>> I don't see the extent of my proposal but for sure, with some help, it
>>> will be developed to full potential.
>>>
>>>
>>>
>>>
>>>
>>> Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <
>>> std-proposals_at_[hidden]> escreveu:
>>>
>>>> Your solution requires at compile time information that is only
>>>> available at runtime.
>>>>
>>>> I don't understand what runtime data you are referring to.
>>>>
>>>> In the algorithm explained before. All the data, that the compiler's
>>>> pre-pass is manipulating, are compile time data.
>>>> At the level of a translation unit. A function with *orthogonal
>>>> template parameters* (foo<T>)
>>>> cannot exist unless you instantiate it.
>>>> Add to that, the function is tagged as per the algorithm, so it is
>>>> prone to record the types it was instantiated for.
>>>> But as an optimization, the compiler will not record the list of *orthogonal
>>>> **template parameters* unless it sees that some variable is using it,
>>>> otherwise no need to generate any code.
>>>> The compiler confirmed that by the usage of the effdecltype operator
>>>> inside the template argument of the vector in the example.
>>>> Every time you instantiate foo with a type, you have to write it
>>>> explicitly in the source code.
>>>> foo<int>, foo<double>...etc
>>>> Next to that, the address of a function is a constexpr value, you can
>>>> use it as a non-type template parameter.
>>>> So as a summary, the list of *orthogonal template parameters* is
>>>> fetchable from the source code, and the address of the instances is
>>>> constexpr value.
>>>> These are the ingredients to make the algorithm work.
>>>>
>>>> Even if the compiler could trace data across any type of data container
>>>> (which it won't),
>>>>
>>>> It doesn't need to, because the information needed is the *orthogonal
>>>> template parameters* not the number of elements in the container.
>>>> You can have 100 element of type foo<int>, but the list of *orthogonal
>>>> template parameters* is only { int }
>>>> It's like you are counting how many instances of foo<T> you have in
>>>> that translation unit.
>>>>
>>>> datapoints in container could take the form of any possible
>>>> combination on any position and be influenced to translation units not
>>>> visible at compile time. But for your solution to work it requires a fix
>>>> layout (which in practice doesn't happen) and perfect observability.
>>>> This is not a problem that needs to be solved, it's just how things
>>>> work.
>>>>
>>>> Let me reiterate.
>>>> Look at this snippet:
>>>>
>>>> vec.push_back(&foo<int>);
>>>> int random;
>>>> cin >> random;
>>>> while(random - -) vec.push_back(&foo<char>);
>>>> cin >> random;
>>>> random = min(random, vec.size());
>>>> while(random - -) vec.pop_back();
>>>>
>>>> As you can see, i cannot tell nor track the vector size. But it is
>>>> irrelevant, because i have only two function pointer => { int, char }
>>>> And by consequence my switch statment will contain only two cases:
>>>>
>>>> case &foo<int>:
>>>> return __Xic_foo_eff_{ &foo<int>, int(0) };
>>>> case &foo<char>:
>>>> return __Xic_foo_eff_{ &foo<char>, char(0) };
>>>>
>>>> The two case lables are constexpr values known at compile time.
>>>>
>>>> You cannot resolve this problem without appending extra data to
>>>> containers that is used to inform about the nature of the data. And you
>>>> cannot solve this without a runtime check and control flow to browse for
>>>> the right handler.
>>>> This is a problem of distinguishability and computability, it is not
>>>> something you can solve, it's a consequence of math as described by Shannon.
>>>>
>>>> That's why std:any has additional data related to runtime type
>>>> information, and that's why it always needs some form of selection or hash
>>>> table or equivalent.
>>>>
>>>> Well, i just proved to you that i don't need that data, and that this
>>>> approach is plausibly more effecient and safe.
>>>> Let's try to stretch it more, shall we?
>>>>
>>>> The way it's already done is already how this problem should be solved,
>>>> no new feature necessary. What you are proposing requires the suspension of
>>>> the laws of physics.
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* organicoman <organicoman_at_[hidden]>
>>>> *Sent:* Friday, August 2, 2024 9:44:09 PM
>>>> *To:* Tiago Freire <tmiguelf_at_[hidden]>;
>>>> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>>> *Subject:* Re: [std-proposals] Function overload set type information
>>>> loss
>>>>
>>>>
>>>>
>>>>
>>>> And also unimplementable...
>>>> But you don't have to take my word for it, you can try it yourself.
>>>>
>>>> But ... that's the whole purpose of this thread.
>>>> Putting down an idea in form of a paper, or an algorithm and tackle any
>>>> contradiction it raise untill we hit the wall or we make it through.
>>>> And we have just started agreeing about the core idea. Which is
>>>> Adding an operator that does overload dispatch and code generation...etc
>>>>
>>>> If it is not implementable, then point where, i will work on it...
>>>>
>>>> ------------------------------
>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
>>>> behalf of organicoman via Std-Proposals <std-proposals_at_[hidden]
>>>> >
>>>> *Sent:* Friday, August 2, 2024 9:15:36 PM
>>>> *To:* organicoman via Std-Proposals <std-proposals_at_[hidden]>
>>>> *Cc:* organicoman <organicoman_at_[hidden]>
>>>> *Subject:* Re: [std-proposals] Function overload set type information
>>>> loss
>>>>
>>>> Hello Gašper,
>>>> Since participants are starting to get what I am talking about, then
>>>> allow me to answer your pending question.
>>>> This is what you've said in previous post.
>>>>
>>>> There is another fundamental misunderstanding here.
>>>>
>>>> The feature calls to "record all the orthogonal template parameters".
>>>>
>>>> While that's not what that word means, I think regardless it's asking
>>>> for global enumeration. C++ has a separate linking model. All the
>>>> types/functions are not known within a single unit. The xunion is
>>>> impossible to compute.
>>>>
>>>> This is a sister problem to what Jason is highlighting.
>>>>
>>>> When i was describing my algorithm, i left a big X mark, hoping that
>>>> someone will ask me, if people were attentive.
>>>> Line:
>>>> 4-b-
>>>> switch
>>>> *.....
>>>> * inside function body: goto X
>>>>
>>>> Then I left a hint at the bottom of the post in a form of code comment
>>>>
>>>> double var2 = *reinterpret_cast<double*>
>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int since
>>>> __X_index is int. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> ^^^^^
>>>>
>>>> The __X_index is not of type int but of type void*, the unnamed struct
>>>> that the compiler generate becomes:
>>>> struct __Xidfc_foo_eff
>>>> {
>>>> void* __X_index;
>>>> union
>>>> { int __Xi; double __Xd ; float __Xf; char __Xc };
>>>> };
>>>>
>>>> Since the compiler tagged useType function, as described by the
>>>> algorithm, then after finishing recording the orthogonal template
>>>> parameters, it replaces the body with the following.
>>>>
>>>> auto useType(effdecltype(foo) f)
>>>> {
>>>> f();
>>>> switch(f)
>>>> {
>>>> case &f<int>: //<- the pointer value
>>>> return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
>>>>
>>>> case &f<double>:
>>>> return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
>>>>
>>>> case &f<float>:
>>>> return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
>>>>
>>>> case &f<char>:
>>>> return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
>>>> }
>>>> }
>>>>
>>>> Pointers are comparable for equality.
>>>>
>>>> As you can see, you can capture the xunion.... right? There is no
>>>> fundamental misunderstanding ...
>>>> Plus everything is done by the compiler, there is no global map, no
>>>> find algorithm, no type hashs, no type erasure (heap allocation), no manual
>>>> intervention. All is automated at compile-time.
>>>>
>>>>
>>>> --
>>>> Std-Proposals mailing list
>>>> Std-Proposals_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>
>>> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2024-08-03 11:52:31