C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Function overload set type information loss

From: Breno Guimarães <brenorg_at_[hidden]>
Date: Sat, 3 Aug 2024 16:28:37 -0300
I read it and says that useType returns T because it had the line "return
T(0);"

You didnt mention all basic operators are generated for xunion so couldn't
guess that.
In the previous reply you seemed to imply it converted the xunion to int in
order to compare. But the correct thing would be to do virtual dispatch
because there could be custom operators for the types xunion represents.
But again, you also said virtual dispatch would only work for consumeType
and other tagged functions. So that means operator== is implicitly tagged I
guess. But the other translation units don't know you're using
operator==(xunion, int) so they to generate it by default... But then
again, not all types can be compared to int. So that would just crash I
guess?

Em sáb., 3 de ago. de 2024 16:20, organicoman <organicoman_at_[hidden]>
escreveu:

> It's just confusing to read because useType returns T but it's actually
> xunion.
>
> Can you point where you have read that useType returns T?
>
> auto useType(effdecltype(foo) f)
>
> Where auto is deduced to be the unnamed structure.
>
> So, when someone asks what are the limitations it will make it much
> clearer if you answer "yes you can only use the return of useType to call
> consumeType or some other function that is tagged".
>
> So you can't write:
> auto r = useType(elem);
> if (r == 1)
>
> You can do that. no problem at all.
>
> Because in the pre-pass the compiler is tracking where the return value of
> useType is used.
>
> So either it cast it to a concrete type, if it is assigned to a variable
>
> double var = r
>
> , also like this
>
> if (r == 1)
>
> Or it can trigger the virtual overload dispatch mechanism if used with
> consumeType.
>
> Right? Because same way xunion doesn't have the member function "get", it
> also doesnt have a operator== right?
>
> The type int also does have operator==, but the compiler provide it for
> you.
>
> The same for the unnamed type in question. Since it is a compiler
> generated structure if has full support for all operators by default.
>
> So it would be compilation error, correct?
>
> Nope.
>
> Let's do something.
>
> Read the post that contains the compiler pre-pass algorithm, and try to
> summarize for me what you've understood.
>
>
>
> -------- Original message --------
> From: Breno Guimarães <brenorg_at_[hidden]>
> Date: 8/3/24 10:58 PM (GMT+04:00)
> To: organicoman <organicoman_at_[hidden]>
> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
> tmiguelf_at_[hidden]>
> Subject: Re: [std-proposals] Function overload set type information loss
>
> It's just confusing to read because useType returns T but it's actually
> xunion.
>
> So, when someone asks what are the limitations it will make it much
> clearer if you answer "yes you can only use the return of useType to call
> consumeType or some other function that is tagged".
>
> So you can't write:
> auto r = useType(elem);
> if (r == 1)
>
> Right? Because same way xunion doesn't have the member function "get", it
> also doesnt have a operator== right?
> So it would be compilation error, correct?
>
> Em sáb., 3 de ago. de 2024 15:48, organicoman <organicoman_at_[hidden]>
> escreveu:
>
>> useType returns the unnamed type xunion, that i explained before.
>> And that unnamed type has no .get() member function.
>> So either you pass it to consumeType or something similarly tagged, or
>> you assign it to a variable with a concrete type like int, double, string,
>> struct S.... etc
>>
>>
>>
>>
>>
>> Sent from my Galaxy
>>
>>
>> -------- Original message --------
>> From: Breno Guimarães <brenorg_at_[hidden]>
>> Date: 8/3/24 10:34 PM (GMT+04:00)
>> To: organicoman <organicoman_at_[hidden]>
>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>> tmiguelf_at_[hidden]>
>> Subject: Re: [std-proposals] Function overload set type information loss
>>
>> Or can you do:
>>
>> for (auto elem : vec)
>> {
>> auto r = useType(elem); // now r is not void anymore right? It's some
>> type that came from the template instantiation
>> auto t = r.get(); // can't do this can you?
>> doSomething(t);
>> }
>>
>> Em sáb., 3 de ago. de 2024 15:22, Breno Guimarães <brenorg_at_[hidden]>
>> escreveu:
>>
>>> Oh, so the only way to use this function pointer is by doing
>>> consumeType(useType(elem)) ?
>>>
>>>
>>> Em sáb., 3 de ago. de 2024 15:16, organicoman <organicoman_at_[hidden]>
>>> escreveu:
>>>
>>>> Like
>>>>
>>>> template<class T>
>>>> void doSomething(T arg) {}
>>>>
>>>> ...
>>>>
>>>> for (auto elem : vec)
>>>> {
>>>> auto r = elem();
>>>> auto t = r.get();
>>>> doSomething(t);
>>>> }
>>>>
>>>> Since you don't know the type of "r" at compile time, you don't know
>>>> the type of "t" at compile time, so you don't know for which types to
>>>> instantiate "doSomething".
>>>> You see the loop in blue in your example:
>>>> It takes a function pointer from the vector then call.
>>>> That function pointer is of type void(*)(void)
>>>> So that example is a compile error.
>>>> And that's what confused me. So i assumed you are talking about
>>>> consumeType and gave the previous explanation, that you considered it was a
>>>> diversion.
>>>>
>>>> I'm trying to understand what are the limitations the user will face
>>>> when using such feature.
>>>>
>>>> There are no limitations as far as i know, except that,
>>>> If you use a template function name other than consumeType, then you
>>>> have to tag it manually so the pre-pass includes it in code generation.
>>>> Otherwise consumeType is the default name that the pre-pass is using
>>>> to concatenate the switch cases as i showed you before.
>>>>
>>>> Keep exploring it. Because there is trick that i really want to show
>>>> you, but you need to understand deeply the concept or it will not make
>>>> sense to you.
>>>>
>>>> -------- Original message --------
>>>> From: Breno Guimarães <brenorg_at_[hidden]>
>>>> Date: 8/3/24 9:54 PM (GMT+04:00)
>>>> To: organicoman <organicoman_at_[hidden]>
>>>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>>>> tmiguelf_at_[hidden]>
>>>> Subject: Re: [std-proposals] Function overload set type information
>>>> loss
>>>>
>>>> Like
>>>>
>>>> template<class T>
>>>> void doSomething(T arg) {}
>>>>
>>>> ...
>>>>
>>>> for (auto elem : vec)
>>>> {
>>>> auto r = elem();
>>>> auto t = r.get();
>>>> doSomething(t);
>>>> }
>>>>
>>>> Since you don't know the type of "r" at compile time, you don't know
>>>> the type of "t" at compile time, so you don't know for which types to
>>>> instantiate "doSomething".
>>>>
>>>> I'm trying to understand what are the limitations the user will face
>>>> when using such feature.
>>>> ...
>>>>
>>>> Em sáb., 3 de ago. de 2024 14:49, Breno Guimarães <brenorg_at_[hidden]>
>>>> escreveu:
>>>>
>>>>> And what would we get for a template taking arbitrary type?
>>>>>
>>>>> Em sáb., 3 de ago. de 2024 14:43, organicoman <organicoman_at_[hidden]>
>>>>> escreveu:
>>>>>
>>>>>> Not clear at all. You changed my question into another question and
>>>>>> answered that instead.
>>>>>>
>>>>>> I asked for which type we should instantiate the template that is
>>>>>> being invoked with argument "t" for which you don't know the type.
>>>>>>
>>>>>> Are you talking about your example? this one:
>>>>>>
>>>>>> for (auto elem: vec)
>>>>>> {
>>>>>> auto r = elem();
>>>>>> auto t = r.get(); //<-is this the t you are asking?
>>>>>> useType(t);
>>>>>> }
>>>>>>
>>>>>> If it is the case, then it means that you misunderstood the signature
>>>>>> of useType....please read the post carefully.
>>>>>>
>>>>>> useType takes a function pointer, not variable of arbitrary type
>>>>>>
>>>>>> Look what i wrote: (copy/paste)
>>>>>> auto useType(effdecltype(foo) f)
>>>>>> {
>>>>>> using T = __get_ortho_at__<0>(f);
>>>>>> f();
>>>>>> return T(0);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> If I am mistaken then, copy/paste the code from my example and point
>>>>>> where you want more explanation.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Not clear at all. You changed my question into another question and
>>>>>> answered that instead.
>>>>>>
>>>>>> I asked for which type we should instantiate the template that is
>>>>>> being invoked with argument "t" for which you don't know the type.
>>>>>>
>>>>>> Em sáb., 3 de ago. de 2024 12:47, organicoman <organicoman_at_[hidden]>
>>>>>> escreveu:
>>>>>>
>>>>>>> Oooh i get you now.
>>>>>>> Let me simplify that for you by an example.
>>>>>>>
>>>>>>> *auto* someFunction();
>>>>>>>
>>>>>>> Can you tell what this function's return type?
>>>>>>>
>>>>>>> No.
>>>>>>>
>>>>>>> So let me give you help about the implementation of this function.
>>>>>>>
>>>>>>> 1- this function returns an unnamed type.
>>>>>>> 2- this unnamed type is an aggregate
>>>>>>> 3- at offset 8 bytes there is a union of unknown members count and
>>>>>>> types.
>>>>>>>
>>>>>>> Asked:
>>>>>>> Read the value of the union member.
>>>>>>>
>>>>>>> User asks:
>>>>>>> But to what type should i cast that union to?
>>>>>>>
>>>>>>> Answer:
>>>>>>> That's your responsibility.....Unless....
>>>>>>> You use my tool.
>>>>>>>
>>>>>>> Users asks:
>>>>>>> What is it?
>>>>>>>
>>>>>>> Answer:
>>>>>>> You create a template function, you tune its body to execute
>>>>>>> whatever you want based of whatever type you want, and my tool will take in
>>>>>>> charge the responsibility of which overload it should call.
>>>>>>>
>>>>>>> That's in basic language what is happening.
>>>>>>>
>>>>>>> You don't need to know what type in the union.
>>>>>>>
>>>>>>> If you want to adventure and cast it to a specific type, you have
>>>>>>> two possibility:
>>>>>>> * can get what you want,
>>>>>>> * you can get type mismatch => garbage.
>>>>>>>
>>>>>>> All that, is orchestrated by the pre-pass, so it is preferable to
>>>>>>> let the tool handle the case.
>>>>>>>
>>>>>>> What about now? Clear?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from my Galaxy
>>>>>>>
>>>>>>>
>>>>>>> -------- Original message --------
>>>>>>> From: Breno Guimarães <brenorg_at_[hidden]>
>>>>>>> Date: 8/3/24 6:56 PM (GMT+04:00)
>>>>>>> To: organicoman <organicoman_at_[hidden]>
>>>>>>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>>>>>>> tmiguelf_at_[hidden]>
>>>>>>> Subject: Re: [std-proposals] Function overload set type information
>>>>>>> loss
>>>>>>>
>>>>>>> You didn't see how that doesn't work yet?
>>>>>>> You don't know the possible types inside xunion because of the
>>>>>>> multiple TU thing. So you don't know how to fully implement
>>>>>>> operator==(xunion,int). Other translation units can't help you implement
>>>>>>> that either because they don't even know an operator== is needed since that
>>>>>>> is in a body of another cpp file they never saw.
>>>>>>>
>>>>>>> But let's see a perhaps more obvious example:
>>>>>>>
>>>>>>> for (auto elem: vec)
>>>>>>> {
>>>>>>> auto r = elem();
>>>>>>> auto t = r.get();
>>>>>>> useType(t);
>>>>>>> }
>>>>>>>
>>>>>>> "r" is the fancy xunion. The you call get on it. So you need to do
>>>>>>> the virtual dispatch thing. But you don't know the type of "t", because
>>>>>>> different types will return different things when you call "get()" on them.
>>>>>>> So how would you know for which types to instantiate useType?
>>>>>>>
>>>>>>> Em sáb., 3 de ago. de 2024 11:22, organicoman <organicoman_at_[hidden]>
>>>>>>> escreveu:
>>>>>>>
>>>>>>>> I don't think I was clear enough.
>>>>>>>>
>>>>>>>> What if you have:
>>>>>>>> for (auto elem : vec)
>>>>>>>> {
>>>>>>>> auto r = useType(elem);
>>>>>>>> if (r == 1)
>>>>>>>> consumeType(r);
>>>>>>>> }
>>>>>>>>
>>>>>>>> The compiler on this translation unit doesn't know all possible
>>>>>>>> types (because the vec is global and filled in other TUs like you
>>>>>>>> antecipated).
>>>>>>>>
>>>>>>>> So this translation unit could reason about the types it knows and
>>>>>>>> generate the code for "if (r == 1)" but what about the types it doesn't
>>>>>>>> know?
>>>>>>>> The other translation units don't know the body of the for-loop so
>>>>>>>> they wouldn't be able to fill in that gap either.
>>>>>>>>
>>>>>>>> In this translation unit we could generate the if for "int" but
>>>>>>>> another one could require for float. Another translation unit could even
>>>>>>>> push a std::string which would make this code a syntax error.
>>>>>>>>
>>>>>>>> That's why I asked if it was a limitation that you couldn't do
>>>>>>>> anything other than directly call consumeType.
>>>>>>>>
>>>>>>>> The temporary value generated by useType would also have to be
>>>>>>>> destroyed and that's a variation of the if problem because it typically is
>>>>>>>> destroyed on the caller body. So you wouldn't know which destructor to call.
>>>>>>>>
>>>>>>>> I didn't understand your question very well, but i will use your
>>>>>>>> example and try to detail it.
>>>>>>>>
>>>>>>>> So let analyze this:
>>>>>>>>
>>>>>>>> if(r == 1)
>>>>>>>>
>>>>>>>> For the compiler, it sees it as
>>>>>>>>
>>>>>>>> if(operator == (r, 1))// bool operator==(xunion,int)
>>>>>>>>
>>>>>>>> It directly replace it by the cast i described previously.
>>>>>>>> And it will become
>>>>>>>>
>>>>>>>> if(
>>>>>>>> *reinterpret_cast<int*>
>>>>>>>> ((reinterpret_cast<char*>(&(r))+sizeof(void*))
>>>>>>>> == 1)
>>>>>>>>
>>>>>>>> And it becomes a regular comparison, calling
>>>>>>>>
>>>>>>>> bool operator(int, int)
>>>>>>>>
>>>>>>>> Remember what i said in that post:(copy/past)
>>>>>>>>
>>>>>>>> ------start------
>>>>>>>> "...
>>>>>>>> In the second case ( casting) which is the easiest, that line will
>>>>>>>> be replaced by
>>>>>>>>
>>>>>>>> double var2 = *reinterpret_cast<double*>
>>>>>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int since
>>>>>>>> __X_index is int
>>>>>>>>
>>>>>>>> No mater what is the union's active member.
>>>>>>>> The user have to take responsibility of his code.
>>>>>>>> And it is safe to still the guts of a temporary.
>>>>>>>> ..."
>>>>>>>> ------end--------
>>>>>>>>
>>>>>>>> The user takes responsibility to what he wants to cast to.
>>>>>>>>
>>>>>>>> You should package all operation inside a template function like
>>>>>>>> consumeType.
>>>>>>>>
>>>>>>>> You can think of consumeType as std::visit
>>>>>>>> which was written for you by your servant Mr.compiler.
>>>>>>>>
>>>>>>>> Did I answer your question?
>>>>>>>>
>>>>>>>> -------- Original message --------
>>>>>>>> From: Breno Guimarães <brenorg_at_[hidden]>
>>>>>>>> Date: 8/3/24 5:43 PM (GMT+04:00)
>>>>>>>> To: organicoman <organicoman_at_[hidden]>
>>>>>>>> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
>>>>>>>> tmiguelf_at_[hidden]>
>>>>>>>> Subject: Re: [std-proposals] Function overload set type information
>>>>>>>> loss
>>>>>>>>
>>>>>>>> I don't think I was clear enough.
>>>>>>>>
>>>>>>>> What if you have:
>>>>>>>> for (auto elem : vec)
>>>>>>>> {
>>>>>>>> auto r = useType(elem);
>>>>>>>> if (r == 1)
>>>>>>>> consumeType(r);
>>>>>>>> }
>>>>>>>>
>>>>>>>> The compiler on this translation unit doesn't know all possible
>>>>>>>> types (because the vec is global and filled in other TUs like you
>>>>>>>> antecipated).
>>>>>>>>
>>>>>>>> So this translation unit could reason about the types it knows and
>>>>>>>> generate the code for "if (r == 1)" but what about the types it doesn't
>>>>>>>> know?
>>>>>>>> The other translation units don't know the body of the for-loop so
>>>>>>>> they wouldn't be able to fill in that gap either.
>>>>>>>>
>>>>>>>> In this translation unit we could generate the if for "int" but
>>>>>>>> another one could require for float. Another translation unit could even
>>>>>>>> push a std::string which would make this code a syntax error.
>>>>>>>>
>>>>>>>> That's why I asked if it was a limitation that you couldn't do
>>>>>>>> anything other than directly call consumeType.
>>>>>>>>
>>>>>>>> The temporary value generated by useType would also have to be
>>>>>>>> destroyed and that's a variation of the if problem because it typically is
>>>>>>>> destroyed on the caller body. So you wouldn't know which destructor to call.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Em sáb., 3 de ago. de 2024 10:08, organicoman <organicoman_at_[hidden]>
>>>>>>>> escreveu:
>>>>>>>>
>>>>>>>>> That means you can only use these pointers with consumeType?
>>>>>>>>>
>>>>>>>>> If you do a for loop on the vec you can only call
>>>>>>>>> consumeType(useType(F))?
>>>>>>>>> Because if you do other things inside the for loop in one
>>>>>>>>> translation unit, the compiler won't have visibility of that in the second
>>>>>>>>> translation unit. So it would not know to generate the code for the for
>>>>>>>>> loop body in the first translation unit for the types in the second
>>>>>>>>> translation unit.
>>>>>>>>>
>>>>>>>>> Would that be a limitation? Or is there another way around that?
>>>>>>>>>
>>>>>>>>> As far as the vector is concerned, its elements type is void(*)(),
>>>>>>>>> check the algorithm step 10-2
>>>>>>>>>
>>>>>>>>> And if you recall in the explanation of that algorithm, i said:
>>>>>>>>> (copy/paste)
>>>>>>>>>
>>>>>>>>> -------- start---------
>>>>>>>>>
>>>>>>>>> "....
>>>>>>>>>
>>>>>>>>> If useType return value is assigned to a variable we have two
>>>>>>>>> cases.
>>>>>>>>>
>>>>>>>>> auto var1 = useType (F); // perfect capture
>>>>>>>>> double var2 = useType(F); // casting
>>>>>>>>>
>>>>>>>>> The first case, which is the most interesting and powerful, is
>>>>>>>>> most useful if you pass that var1 to a function template, consumeType, in
>>>>>>>>> our example above
>>>>>>>>> ...."
>>>>>>>>> -------end-----
>>>>>>>>>
>>>>>>>>> So, all revolves around useType, and its usage scope.
>>>>>>>>>
>>>>>>>>> Also, since the vector is a local variable to that translation
>>>>>>>>> unit, then the scope of useType is just the instantiated foo<T>
>>>>>>>>> in that unit. The rest is irrelevant.
>>>>>>>>> You don't pay for what you don't need.
>>>>>>>>>
>>>>>>>>> Allow me to anticipate one question.
>>>>>>>>>
>>>>>>>>> What if the vector was a global variable, filled accross
>>>>>>>>> translation unit?
>>>>>>>>> *It doesn't matter*.
>>>>>>>>> The linker is doing the concatenation algorithm that i described
>>>>>>>>> to you, so all is fine and dandy.
>>>>>>>>>
>>>>>>>>> Let's use an example:
>>>>>>>>> -----Source1.cpp
>>>>>>>>> #include "*temp.h*"
>>>>>>>>>
>>>>>>>>> extern vector<effdecltype(foo)> globalVector;
>>>>>>>>>
>>>>>>>>> /// no local insertion into the global vector
>>>>>>>>> /// use it directly in the loop
>>>>>>>>> for(F : globalVector)
>>>>>>>>> consumeType(useType(F));
>>>>>>>>>
>>>>>>>>> -----Source2.cpp
>>>>>>>>> #include "*temp.h*"
>>>>>>>>>
>>>>>>>>> extern vector<effdecltype(foo)> globalVector;
>>>>>>>>>
>>>>>>>>> /// just fill the global vector...
>>>>>>>>> globalVec.push_back(foo<int>);
>>>>>>>>> globalVec.push_back(foo<double>);
>>>>>>>>>
>>>>>>>>> *Explanation*:
>>>>>>>>> In the 1st translation unit, the virtual overload dispatch
>>>>>>>>> mechanism (read explanation in the post about the pre-pass
>>>>>>>>> algorithm)
>>>>>>>>> sees that, in this translation unit, the compiler didn't record
>>>>>>>>> any instance of foo<T>, so it emits an error (substitution
>>>>>>>>> failure cannot call a template function consumeType), *that if
>>>>>>>>> the vector was a local variable*, but because the compiler knows
>>>>>>>>> it is a global variable (remember that the vector is the entity
>>>>>>>>> which triggered this mechanism), then it replaces the whole
>>>>>>>>> overload dispatch switch with just the name of the generated function and
>>>>>>>>> let the linker resolve it. Like the following.
>>>>>>>>>
>>>>>>>>> ----starts like this
>>>>>>>>>
>>>>>>>>> for(F: globalVector)
>>>>>>>>> consumeType(useType(F));
>>>>>>>>>
>>>>>>>>> ....becomes
>>>>>>>>>
>>>>>>>>> // generared pre-pass code
>>>>>>>>> extern struct __X_foo_eff;
>>>>>>>>> __X_foo_eff useType(effdecltype(foo));
>>>>>>>>> void __X_consumeType(__X_foo_eff);
>>>>>>>>>
>>>>>>>>> // down in code
>>>>>>>>> for(F: globalVector)
>>>>>>>>> __X_consumeType(useType(F));
>>>>>>>>>
>>>>>>>>> The rest is known to everyone.
>>>>>>>>>
>>>>>>>>> The overall observation about this mechanism, is that, it doesn't
>>>>>>>>> leak type information to the runtime.
>>>>>>>>> I don't have a table let say
>>>>>>>>> hash(type)<->function_ptr
>>>>>>>>> Or
>>>>>>>>> textMangle(type)<->function_ptr
>>>>>>>>> Usually used by std::any or, some reflection algorithms.
>>>>>>>>> Which make this approach safe against reverse engineering the
>>>>>>>>> {hash, text}, and doesn't break the ASLR safety mechanism by saving
>>>>>>>>> function pointer in permanent global container.
>>>>>>>>>
>>>>>>>>> I hope it is clear..... at this point, if everyone understood the
>>>>>>>>> fundamental of this paper then I guess that you can start contributing.
>>>>>>>>>
>>>>>>>>> That's all.
>>>>>>>>>
>>>>>>>>> Em sex., 2 de ago. de 2024 23:25, organicoman <
>>>>>>>>> organicoman_at_[hidden]> escreveu:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So how would you generate the switch for each type given that
>>>>>>>>>> some instantiations of the template can happen only in another translation
>>>>>>>>>> unit?
>>>>>>>>>>
>>>>>>>>>> Let me illustrate with an example:
>>>>>>>>>>
>>>>>>>>>> Let's put in header file
>>>>>>>>>>
>>>>>>>>>> --- *temp.h*
>>>>>>>>>>
>>>>>>>>>> foo<T>
>>>>>>>>>>
>>>>>>>>>> consumeType <T>
>>>>>>>>>>
>>>>>>>>>> useType(effdecltype(foo) f)
>>>>>>>>>>
>>>>>>>>>> --- source1.cpp
>>>>>>>>>>
>>>>>>>>>> #include "*temp.h*"
>>>>>>>>>>
>>>>>>>>>> ///// adding instances
>>>>>>>>>>
>>>>>>>>>> vec.push_back(foo<int>);
>>>>>>>>>>
>>>>>>>>>> vec.push_back (foo<double>);
>>>>>>>>>>
>>>>>>>>>> /// calling function inside loop
>>>>>>>>>>
>>>>>>>>>> consumeType(useType(F)); //<- to be replaced
>>>>>>>>>>
>>>>>>>>>> --- source2.cpp
>>>>>>>>>>
>>>>>>>>>> #include "*temp.h*"
>>>>>>>>>>
>>>>>>>>>> ///// adding instances
>>>>>>>>>>
>>>>>>>>>> vec.push_back(foo<float>);
>>>>>>>>>>
>>>>>>>>>> vec.push_back (foo<char>);
>>>>>>>>>>
>>>>>>>>>> /// calling function inside loop
>>>>>>>>>>
>>>>>>>>>> consumeType(useType(F)); //<- to be replaced
>>>>>>>>>> *Compilation pre-pass:*
>>>>>>>>>> Source1: pseudo code only (don't hate me)
>>>>>>>>>> useType:
>>>>>>>>>> case foo<int>:
>>>>>>>>>> return xunion{int}
>>>>>>>>>> case foo<double>:
>>>>>>>>>> return xunion{double};
>>>>>>>>>> Source2:
>>>>>>>>>> useType:
>>>>>>>>>> case foo<float>:
>>>>>>>>>> return xunion{float}
>>>>>>>>>> case foo<char>:
>>>>>>>>>> return xunion{char}
>>>>>>>>>>
>>>>>>>>>> *Compilation pass:*
>>>>>>>>>> Since useType is tagged then add a bogus jump instruction that
>>>>>>>>>> will never be taken in that translation unit.
>>>>>>>>>> Source1.o: pseudo assembly
>>>>>>>>>> cmp esi, foo<int>;
>>>>>>>>>> je LBL_xunion_int ;
>>>>>>>>>> cmp esi,foo<double>
>>>>>>>>>> je LBL_xunion_double
>>>>>>>>>> jmp 0xfffffff; // hook jump
>>>>>>>>>> source2.o:
>>>>>>>>>> cmp esi, foo<float>
>>>>>>>>>> je LBL_xunion_float
>>>>>>>>>> cmp esi, foo<char>
>>>>>>>>>> je LBL_xunion<char>
>>>>>>>>>> jmp 0xfffffff // hook jump
>>>>>>>>>> *Linker: source1.o source2.o main.o*
>>>>>>>>>> 1- read mangled name of function
>>>>>>>>>> 2- is it tagged as effdecltype
>>>>>>>>>> no: do your usual job, then goto 6
>>>>>>>>>> 3- is there any previous cmp address saved?
>>>>>>>>>> yes: replace the 0xfffff of the hook jmp with the address of
>>>>>>>>>> the saved cmp.
>>>>>>>>>> 4- save the current first cmp address.
>>>>>>>>>> 5- resolve mangled name to be the current function address.
>>>>>>>>>> 6- move to next
>>>>>>>>>> 7- repeat if not finished.
>>>>>>>>>>
>>>>>>>>>> *Explanation:*
>>>>>>>>>> Let's assume the worst case scenario, where the two translation
>>>>>>>>>> unit are compiled separately.
>>>>>>>>>> Because if they were together, the compiler detects and fuse the
>>>>>>>>>> two generated function assembly into one, by concatenating the switch cases.
>>>>>>>>>>
>>>>>>>>>> Ok, I guess now, everyone knows what happens in the compiler
>>>>>>>>>> pre-pass.
>>>>>>>>>> So at the end of it, we will have, for each translation unit an
>>>>>>>>>> object file with different switch for the same function name.
>>>>>>>>>> Let's tackle the assembly now.
>>>>>>>>>> Because the compiler already knows, that useType is a tagged
>>>>>>>>>> function, it will add to its assembly a bogus jump instruction, that is
>>>>>>>>>> proven it will never be taken for that translation unit only.
>>>>>>>>>> jmp 0xFFFFFFFF;
>>>>>>>>>> Then if we want to use both *.o object files. Things are taken
>>>>>>>>>> differently.
>>>>>>>>>> Here comes the linker job.
>>>>>>>>>> On the command line, the linker parses the *.o from left to
>>>>>>>>>> right, like every one knows.
>>>>>>>>>> As soon as it detects the mangled name of useType, it knows that
>>>>>>>>>> it is tagged, it saves the address of the first compare instruction cmp
>>>>>>>>>> esi, foo<T> then carry on its job, when it enters the second
>>>>>>>>>> object file, it finds the same useType mangled name, thus it
>>>>>>>>>> knows it is tagged, it checks if it has any compare instruction saved from
>>>>>>>>>> previous *.o object file then stamps its address into the bogus jump
>>>>>>>>>> instruction of the current object file then saves the first cmp
>>>>>>>>>> of the current object file instruction, then marks the mangled name as
>>>>>>>>>> resolved with address of useType in this object file.... and so
>>>>>>>>>> on....
>>>>>>>>>>
>>>>>>>>>> current *.o
>>>>>>>>>> jmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] //
>>>>>>>>>> illustration only don't hate me
>>>>>>>>>>
>>>>>>>>>> until the last object file that contains the same mangled name,
>>>>>>>>>> at that point, it created a linked list between compare instructions.
>>>>>>>>>> The mangled useType will be resolved to the last same mangled
>>>>>>>>>> name seen, thus it becomes the entry point of the useType
>>>>>>>>>> function.
>>>>>>>>>>
>>>>>>>>>> It's like you are concatenating switch statement.
>>>>>>>>>> One observation worth reiterating,
>>>>>>>>>> Like i said in last reply, we will have code bloat.
>>>>>>>>>> That's because the body of useType before the switch statement
>>>>>>>>>> will be deduplicated as many as it is generated, and all, except one, will
>>>>>>>>>> be execute, the rest is dead meat never executed.
>>>>>>>>>> From this, i recommand to make the body of functions like useType
>>>>>>>>>> very small, let consumeType do the heavy lifting.
>>>>>>>>>> Or make the linker smart to fuse the whole thing into one giant
>>>>>>>>>> switch.
>>>>>>>>>> But the first recommendation is faster to implement.
>>>>>>>>>>
>>>>>>>>>> That's one simple approach, many other are way efficient. But we
>>>>>>>>>> need a compiler planner to detect best decisions. They will work in tendem.
>>>>>>>>>>
>>>>>>>>>> That's all.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Em sex., 2 de ago. de 2024 19:48, organicoman <
>>>>>>>>>> organicoman_at_[hidden]> escreveu:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So it doesn't work for multiple translation units?
>>>>>>>>>>>
>>>>>>>>>>> It perfectly does.
>>>>>>>>>>>
>>>>>>>>>>> Because for each translation unit, you tag the functions and
>>>>>>>>>>> generate the code, stamp it, then pass it the compilation step...and so on,
>>>>>>>>>>>
>>>>>>>>>>> There is no deduplication of function names, nor generation of
>>>>>>>>>>> extraneous code. You have exactly what you need in that unit.
>>>>>>>>>>>
>>>>>>>>>>> Just one thing.
>>>>>>>>>>>
>>>>>>>>>>> Sometimes the generated code could have the same instructions
>>>>>>>>>>> but under different function name, that causes bloating, thus a need to a
>>>>>>>>>>> compiler planner.
>>>>>>>>>>>
>>>>>>>>>>> Look the idea is veeeeeery basic.
>>>>>>>>>>>
>>>>>>>>>>> I consider a C++ source code, after the compiler does template
>>>>>>>>>>> instantiation, as a type's database. Unfortunately the current standards
>>>>>>>>>>> does not allow us to query anything from it.
>>>>>>>>>>>
>>>>>>>>>>> Imagine if you have the power to query how many types your
>>>>>>>>>>> program uses, what is the type-size distribution chart, sorting by type
>>>>>>>>>>> size, by layout size.... etc
>>>>>>>>>>>
>>>>>>>>>>> This is a data mine to optimize your code, reason about its
>>>>>>>>>>> implementation and make it more fast and efficient.
>>>>>>>>>>>
>>>>>>>>>>> That's what i am trying to start.
>>>>>>>>>>>
>>>>>>>>>>> I don't see the extent of my proposal but for sure, with some
>>>>>>>>>>> help, it will be developed to full potential.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <
>>>>>>>>>>> std-proposals_at_[hidden]> escreveu:
>>>>>>>>>>>
>>>>>>>>>>>> Your solution requires at compile time information that is only
>>>>>>>>>>>> available at runtime.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't understand what runtime data you are referring to.
>>>>>>>>>>>>
>>>>>>>>>>>> In the algorithm explained before. All the data, that the
>>>>>>>>>>>> compiler's pre-pass is manipulating, are compile time data.
>>>>>>>>>>>> At the level of a translation unit. A function with *orthogonal
>>>>>>>>>>>> template parameters* (foo<T>)
>>>>>>>>>>>> cannot exist unless you instantiate it.
>>>>>>>>>>>> Add to that, the function is tagged as per the algorithm, so it
>>>>>>>>>>>> is prone to record the types it was instantiated for.
>>>>>>>>>>>> But as an optimization, the compiler will not record the list
>>>>>>>>>>>> of *orthogonal **template parameters* unless it sees that some
>>>>>>>>>>>> variable is using it, otherwise no need to generate any code.
>>>>>>>>>>>> The compiler confirmed that by the usage of the effdecltype
>>>>>>>>>>>> operator inside the template argument of the vector in the
>>>>>>>>>>>> example.
>>>>>>>>>>>> Every time you instantiate foo with a type, you have to write
>>>>>>>>>>>> it explicitly in the source code.
>>>>>>>>>>>> foo<int>, foo<double>...etc
>>>>>>>>>>>> Next to that, the address of a function is a constexpr value,
>>>>>>>>>>>> you can use it as a non-type template parameter.
>>>>>>>>>>>> So as a summary, the list of *orthogonal template parameters*
>>>>>>>>>>>> is fetchable from the source code, and the address of the instances is
>>>>>>>>>>>> constexpr value.
>>>>>>>>>>>> These are the ingredients to make the algorithm work.
>>>>>>>>>>>>
>>>>>>>>>>>> Even if the compiler could trace data across any type of data
>>>>>>>>>>>> container (which it won't),
>>>>>>>>>>>>
>>>>>>>>>>>> It doesn't need to, because the information needed is the *orthogonal
>>>>>>>>>>>> template parameters* not the number of elements in the
>>>>>>>>>>>> container.
>>>>>>>>>>>> You can have 100 element of type foo<int>, but the list of *orthogonal
>>>>>>>>>>>> template parameters* is only { int }
>>>>>>>>>>>> It's like you are counting how many instances of foo<T> you
>>>>>>>>>>>> have in that translation unit.
>>>>>>>>>>>>
>>>>>>>>>>>> datapoints in container could take the form of any possible
>>>>>>>>>>>> combination on any position and be influenced to translation units not
>>>>>>>>>>>> visible at compile time. But for your solution to work it requires a fix
>>>>>>>>>>>> layout (which in practice doesn't happen) and perfect observability.
>>>>>>>>>>>> This is not a problem that needs to be solved, it's just how
>>>>>>>>>>>> things work.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me reiterate.
>>>>>>>>>>>> Look at this snippet:
>>>>>>>>>>>>
>>>>>>>>>>>> vec.push_back(&foo<int>);
>>>>>>>>>>>> int random;
>>>>>>>>>>>> cin >> random;
>>>>>>>>>>>> while(random - -) vec.push_back(&foo<char>);
>>>>>>>>>>>> cin >> random;
>>>>>>>>>>>> random = min(random, vec.size());
>>>>>>>>>>>> while(random - -) vec.pop_back();
>>>>>>>>>>>>
>>>>>>>>>>>> As you can see, i cannot tell nor track the vector size. But it
>>>>>>>>>>>> is irrelevant, because i have only two function pointer => { int, char }
>>>>>>>>>>>> And by consequence my switch statment will contain only two
>>>>>>>>>>>> cases:
>>>>>>>>>>>>
>>>>>>>>>>>> case &foo<int>:
>>>>>>>>>>>> return __Xic_foo_eff_{ &foo<int>, int(0) };
>>>>>>>>>>>> case &foo<char>:
>>>>>>>>>>>> return __Xic_foo_eff_{ &foo<char>, char(0) };
>>>>>>>>>>>>
>>>>>>>>>>>> The two case lables are constexpr values known at compile time.
>>>>>>>>>>>>
>>>>>>>>>>>> You cannot resolve this problem without appending extra data to
>>>>>>>>>>>> containers that is used to inform about the nature of the data. And you
>>>>>>>>>>>> cannot solve this without a runtime check and control flow to browse for
>>>>>>>>>>>> the right handler.
>>>>>>>>>>>> This is a problem of distinguishability and computability, it
>>>>>>>>>>>> is not something you can solve, it's a consequence of math as described by
>>>>>>>>>>>> Shannon.
>>>>>>>>>>>>
>>>>>>>>>>>> That's why std:any has additional data related to runtime type
>>>>>>>>>>>> information, and that's why it always needs some form of selection or hash
>>>>>>>>>>>> table or equivalent.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, i just proved to you that i don't need that data, and
>>>>>>>>>>>> that this approach is plausibly more effecient and safe.
>>>>>>>>>>>> Let's try to stretch it more, shall we?
>>>>>>>>>>>>
>>>>>>>>>>>> The way it's already done is already how this problem should be
>>>>>>>>>>>> solved, no new feature necessary. What you are proposing requires the
>>>>>>>>>>>> suspension of the laws of physics.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* organicoman <organicoman_at_[hidden]>
>>>>>>>>>>>> *Sent:* Friday, August 2, 2024 9:44:09 PM
>>>>>>>>>>>> *To:* Tiago Freire <tmiguelf_at_[hidden]>;
>>>>>>>>>>>> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>>>>>>>>>>> *Subject:* Re: [std-proposals] Function overload set type
>>>>>>>>>>>> information loss
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And also unimplementable...
>>>>>>>>>>>> But you don't have to take my word for it, you can try it
>>>>>>>>>>>> yourself.
>>>>>>>>>>>>
>>>>>>>>>>>> But ... that's the whole purpose of this thread.
>>>>>>>>>>>> Putting down an idea in form of a paper, or an algorithm and
>>>>>>>>>>>> tackle any contradiction it raise untill we hit the wall or we make it
>>>>>>>>>>>> through.
>>>>>>>>>>>> And we have just started agreeing about the core idea. Which is
>>>>>>>>>>>> Adding an operator that does overload dispatch and code
>>>>>>>>>>>> generation...etc
>>>>>>>>>>>>
>>>>>>>>>>>> If it is not implementable, then point where, i will work on
>>>>>>>>>>>> it...
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]>
>>>>>>>>>>>> on behalf of organicoman via Std-Proposals <
>>>>>>>>>>>> std-proposals_at_[hidden]>
>>>>>>>>>>>> *Sent:* Friday, August 2, 2024 9:15:36 PM
>>>>>>>>>>>> *To:* organicoman via Std-Proposals <
>>>>>>>>>>>> std-proposals_at_[hidden]>
>>>>>>>>>>>> *Cc:* organicoman <organicoman_at_[hidden]>
>>>>>>>>>>>> *Subject:* Re: [std-proposals] Function overload set type
>>>>>>>>>>>> information loss
>>>>>>>>>>>>
>>>>>>>>>>>> Hello Gašper,
>>>>>>>>>>>> Since participants are starting to get what I am talking about,
>>>>>>>>>>>> then allow me to answer your pending question.
>>>>>>>>>>>> This is what you've said in previous post.
>>>>>>>>>>>>
>>>>>>>>>>>> There is another fundamental misunderstanding here.
>>>>>>>>>>>>
>>>>>>>>>>>> The feature calls to "record all the orthogonal template
>>>>>>>>>>>> parameters".
>>>>>>>>>>>>
>>>>>>>>>>>> While that's not what that word means, I think regardless it's
>>>>>>>>>>>> asking for global enumeration. C++ has a separate linking model. All the
>>>>>>>>>>>> types/functions are not known within a single unit. The xunion is
>>>>>>>>>>>> impossible to compute.
>>>>>>>>>>>>
>>>>>>>>>>>> This is a sister problem to what Jason is highlighting.
>>>>>>>>>>>>
>>>>>>>>>>>> When i was describing my algorithm, i left a big X mark,
>>>>>>>>>>>> hoping that someone will ask me, if people were attentive.
>>>>>>>>>>>> Line:
>>>>>>>>>>>> 4-b-
>>>>>>>>>>>> switch
>>>>>>>>>>>> *.....
>>>>>>>>>>>> * inside function body: goto X
>>>>>>>>>>>>
>>>>>>>>>>>> Then I left a hint at the bottom of the post in a form of code
>>>>>>>>>>>> comment
>>>>>>>>>>>>
>>>>>>>>>>>> double var2 = *reinterpret_cast<double*>
>>>>>>>>>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int
>>>>>>>>>>>> since __X_index is int.
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> ^^^^^
>>>>>>>>>>>>
>>>>>>>>>>>> The __X_index is not of type int but of type void*, the
>>>>>>>>>>>> unnamed struct that the compiler generate becomes:
>>>>>>>>>>>> struct __Xidfc_foo_eff
>>>>>>>>>>>> {
>>>>>>>>>>>> void* __X_index;
>>>>>>>>>>>> union
>>>>>>>>>>>> { int __Xi; double __Xd ; float __Xf; char __Xc };
>>>>>>>>>>>> };
>>>>>>>>>>>>
>>>>>>>>>>>> Since the compiler tagged useType function, as described by
>>>>>>>>>>>> the algorithm, then after finishing recording the orthogonal template
>>>>>>>>>>>> parameters, it replaces the body with the following.
>>>>>>>>>>>>
>>>>>>>>>>>> auto useType(effdecltype(foo) f)
>>>>>>>>>>>> {
>>>>>>>>>>>> f();
>>>>>>>>>>>> switch(f)
>>>>>>>>>>>> {
>>>>>>>>>>>> case &f<int>: //<- the pointer value
>>>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
>>>>>>>>>>>>
>>>>>>>>>>>> case &f<double>:
>>>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
>>>>>>>>>>>>
>>>>>>>>>>>> case &f<float>:
>>>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
>>>>>>>>>>>>
>>>>>>>>>>>> case &f<char>:
>>>>>>>>>>>> return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Pointers are comparable for equality.
>>>>>>>>>>>>
>>>>>>>>>>>> As you can see, you can capture the xunion.... right? There is
>>>>>>>>>>>> no fundamental misunderstanding ...
>>>>>>>>>>>> Plus everything is done by the compiler, there is no global
>>>>>>>>>>>> map, no find algorithm, no type hashs, no type erasure (heap allocation),
>>>>>>>>>>>> no manual intervention. All is automated at compile-time.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Std-Proposals mailing list
>>>>>>>>>>>> Std-Proposals_at_[hidden]
>>>>>>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>>>>>>>>
>>>>>>>>>>>>

Received on 2024-08-03 19:28:54