ISOCPP std-proposals List: Re: [std-proposals] Function overload set type information loss

From: Breno Guimarães <brenorg_at_[hidden]>
Date: Sat, 3 Aug 2024 11:56:30 -0300

You didn't see how that doesn't work yet?
You don't know the possible types inside xunion because of the multiple TU
thing. So you don't know how to fully implement operator==(xunion,int).
Other translation units can't help you implement that either because they
don't even know an operator== is needed since that is in a body of another
cpp file they never saw.

But let's see a perhaps more obvious example:

for (auto elem: vec)
{
auto r = elem();
auto t = r.get();
useType(t);
}

"r" is the fancy xunion. The you call get on it. So you need to do the
virtual dispatch thing. But you don't know the type of "t", because
different types will return different things when you call "get()" on them.
So how would you know for which types to instantiate useType?

Em sáb., 3 de ago. de 2024 11:22, organicoman <organicoman_at_[hidden]>
escreveu:

> I don't think I was clear enough.
>
> What if you have:
> for (auto elem : vec)
> {
> auto r = useType(elem);
> if (r == 1)
> consumeType(r);
> }
>
> The compiler on this translation unit doesn't know all possible types
> (because the vec is global and filled in other TUs like you antecipated).
>
> So this translation unit could reason about the types it knows and
> generate the code for "if (r == 1)" but what about the types it doesn't
> know?
> The other translation units don't know the body of the for-loop so they
> wouldn't be able to fill in that gap either.
>
> In this translation unit we could generate the if for "int" but another
> one could require for float. Another translation unit could even push a
> std::string which would make this code a syntax error.
>
> That's why I asked if it was a limitation that you couldn't do anything
> other than directly call consumeType.
>
> The temporary value generated by useType would also have to be destroyed
> and that's a variation of the if problem because it typically is destroyed
> on the caller body. So you wouldn't know which destructor to call.
>
> I didn't understand your question very well, but i will use your example
> and try to detail it.
>
> So let analyze this:
>
> if(r == 1)
>
> For the compiler, it sees it as
>
> if(operator == (r, 1))// bool operator==(xunion,int)
>
> It directly replace it by the cast i described previously.
> And it will become
>
> if(
> *reinterpret_cast<int*>
> ((reinterpret_cast<char*>(&(r))+sizeof(void*))
> == 1)
>
> And it becomes a regular comparison, calling
>
> bool operator(int, int)
>
> Remember what i said in that post:(copy/past)
>
> ------start------
> "...
> In the second case ( casting) which is the easiest, that line will be
> replaced by
>
> double var2 = *reinterpret_cast<double*>
> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int since
> __X_index is int
>
> No mater what is the union's active member.
> The user have to take responsibility of his code.
> And it is safe to still the guts of a temporary.
> ..."
> ------end--------
>
> The user takes responsibility to what he wants to cast to.
>
> You should package all operation inside a template function like
> consumeType.
>
> You can think of consumeType as std::visit
> which was written for you by your servant Mr.compiler.
>
> Did I answer your question?
>
> -------- Original message --------
> From: Breno Guimarães <brenorg_at_[hidden]>
> Date: 8/3/24 5:43 PM (GMT+04:00)
> To: organicoman <organicoman_at_[hidden]>
> Cc: Std-Proposals <std-proposals_at_[hidden]>, Tiago Freire <
> tmiguelf_at_[hidden]>
> Subject: Re: [std-proposals] Function overload set type information loss
>
> I don't think I was clear enough.
>
> What if you have:
> for (auto elem : vec)
> {
> auto r = useType(elem);
> if (r == 1)
> consumeType(r);
> }
>
> The compiler on this translation unit doesn't know all possible types
> (because the vec is global and filled in other TUs like you antecipated).
>
> So this translation unit could reason about the types it knows and
> generate the code for "if (r == 1)" but what about the types it doesn't
> know?
> The other translation units don't know the body of the for-loop so they
> wouldn't be able to fill in that gap either.
>
> In this translation unit we could generate the if for "int" but another
> one could require for float. Another translation unit could even push a
> std::string which would make this code a syntax error.
>
> That's why I asked if it was a limitation that you couldn't do anything
> other than directly call consumeType.
>
> The temporary value generated by useType would also have to be destroyed
> and that's a variation of the if problem because it typically is destroyed
> on the caller body. So you wouldn't know which destructor to call.
>
>
>
>
> Em sáb., 3 de ago. de 2024 10:08, organicoman <organicoman_at_[hidden]>
> escreveu:
>
>> That means you can only use these pointers with consumeType?
>>
>> If you do a for loop on the vec you can only call consumeType(useType(F))?
>> Because if you do other things inside the for loop in one translation
>> unit, the compiler won't have visibility of that in the second translation
>> unit. So it would not know to generate the code for the for loop body in
>> the first translation unit for the types in the second translation unit.
>>
>> Would that be a limitation? Or is there another way around that?
>>
>> As far as the vector is concerned, its elements type is void(*)(), check
>> the algorithm step 10-2
>>
>> And if you recall in the explanation of that algorithm, i said:
>> (copy/paste)
>>
>> -------- start---------
>>
>> "....
>>
>> If useType return value is assigned to a variable we have two cases.
>>
>> auto var1 = useType (F); // perfect capture
>> double var2 = useType(F); // casting
>>
>> The first case, which is the most interesting and powerful, is most
>> useful if you pass that var1 to a function template, consumeType, in our
>> example above
>> ...."
>> -------end-----
>>
>> So, all revolves around useType, and its usage scope.
>>
>> Also, since the vector is a local variable to that translation unit,
>> then the scope of useType is just the instantiated foo<T> in that unit.
>> The rest is irrelevant.
>> You don't pay for what you don't need.
>>
>> Allow me to anticipate one question.
>>
>> What if the vector was a global variable, filled accross translation
>> unit?
>> *It doesn't matter*.
>> The linker is doing the concatenation algorithm that i described to you,
>> so all is fine and dandy.
>>
>> Let's use an example:
>> -----Source1.cpp
>> #include "*temp.h*"
>>
>> extern vector<effdecltype(foo)> globalVector;
>>
>> /// no local insertion into the global vector
>> /// use it directly in the loop
>> for(F : globalVector)
>> consumeType(useType(F));
>>
>> -----Source2.cpp
>> #include "*temp.h*"
>>
>> extern vector<effdecltype(foo)> globalVector;
>>
>> /// just fill the global vector...
>> globalVec.push_back(foo<int>);
>> globalVec.push_back(foo<double>);
>>
>> *Explanation*:
>> In the 1st translation unit, the virtual overload dispatch mechanism (read
>> explanation in the post about the pre-pass algorithm)
>> sees that, in this translation unit, the compiler didn't record any
>> instance of foo<T>, so it emits an error (substitution failure cannot
>> call a template function consumeType), *that if the vector was a local
>> variable*, but because the compiler knows it is a global variable (remember
>> that the vector is the entity which triggered this mechanism), then it
>> replaces the whole overload dispatch switch with just the name of the
>> generated function and let the linker resolve it. Like the following.
>>
>> ----starts like this
>>
>> for(F: globalVector)
>> consumeType(useType(F));
>>
>> ....becomes
>>
>> // generared pre-pass code
>> extern struct __X_foo_eff;
>> __X_foo_eff useType(effdecltype(foo));
>> void __X_consumeType(__X_foo_eff);
>>
>> // down in code
>> for(F: globalVector)
>> __X_consumeType(useType(F));
>>
>> The rest is known to everyone.
>>
>> The overall observation about this mechanism, is that, it doesn't leak
>> type information to the runtime.
>> I don't have a table let say
>> hash(type)<->function_ptr
>> Or
>> textMangle(type)<->function_ptr
>> Usually used by std::any or, some reflection algorithms.
>> Which make this approach safe against reverse engineering the {hash,
>> text}, and doesn't break the ASLR safety mechanism by saving function
>> pointer in permanent global container.
>>
>> I hope it is clear..... at this point, if everyone understood the
>> fundamental of this paper then I guess that you can start contributing.
>>
>> That's all.
>>
>> Em sex., 2 de ago. de 2024 23:25, organicoman <organicoman_at_[hidden]>
>> escreveu:
>>
>>>
>>> So how would you generate the switch for each type given that some
>>> instantiations of the template can happen only in another translation unit?
>>>
>>> Let me illustrate with an example:
>>>
>>> Let's put in header file
>>>
>>> --- *temp.h*
>>>
>>> foo<T>
>>>
>>> consumeType <T>
>>>
>>> useType(effdecltype(foo) f)
>>>
>>> --- source1.cpp
>>>
>>> #include "*temp.h*"
>>>
>>> ///// adding instances
>>>
>>> vec.push_back(foo<int>);
>>>
>>> vec.push_back (foo<double>);
>>>
>>> /// calling function inside loop
>>>
>>> consumeType(useType(F)); //<- to be replaced
>>>
>>> --- source2.cpp
>>>
>>> #include "*temp.h*"
>>>
>>> ///// adding instances
>>>
>>> vec.push_back(foo<float>);
>>>
>>> vec.push_back (foo<char>);
>>>
>>> /// calling function inside loop
>>>
>>> consumeType(useType(F)); //<- to be replaced
>>> *Compilation pre-pass:*
>>> Source1: pseudo code only (don't hate me)
>>> useType:
>>> case foo<int>:
>>> return xunion{int}
>>> case foo<double>:
>>> return xunion{double};
>>> Source2:
>>> useType:
>>> case foo<float>:
>>> return xunion{float}
>>> case foo<char>:
>>> return xunion{char}
>>>
>>> *Compilation pass:*
>>> Since useType is tagged then add a bogus jump instruction that will
>>> never be taken in that translation unit.
>>> Source1.o: pseudo assembly
>>> cmp esi, foo<int>;
>>> je LBL_xunion_int ;
>>> cmp esi,foo<double>
>>> je LBL_xunion_double
>>> jmp 0xfffffff; // hook jump
>>> source2.o:
>>> cmp esi, foo<float>
>>> je LBL_xunion_float
>>> cmp esi, foo<char>
>>> je LBL_xunion<char>
>>> jmp 0xfffffff // hook jump
>>> *Linker: source1.o source2.o main.o*
>>> 1- read mangled name of function
>>> 2- is it tagged as effdecltype
>>> no: do your usual job, then goto 6
>>> 3- is there any previous cmp address saved?
>>> yes: replace the 0xfffff of the hook jmp with the address of the
>>> saved cmp.
>>> 4- save the current first cmp address.
>>> 5- resolve mangled name to be the current function address.
>>> 6- move to next
>>> 7- repeat if not finished.
>>>
>>> *Explanation:*
>>> Let's assume the worst case scenario, where the two translation unit are
>>> compiled separately.
>>> Because if they were together, the compiler detects and fuse the two
>>> generated function assembly into one, by concatenating the switch cases.
>>>
>>> Ok, I guess now, everyone knows what happens in the compiler pre-pass.
>>> So at the end of it, we will have, for each translation unit an object
>>> file with different switch for the same function name.
>>> Let's tackle the assembly now.
>>> Because the compiler already knows, that useType is a tagged function,
>>> it will add to its assembly a bogus jump instruction, that is proven it
>>> will never be taken for that translation unit only.
>>> jmp 0xFFFFFFFF;
>>> Then if we want to use both *.o object files. Things are taken
>>> differently.
>>> Here comes the linker job.
>>> On the command line, the linker parses the *.o from left to right, like
>>> every one knows.
>>> As soon as it detects the mangled name of useType, it knows that it is
>>> tagged, it saves the address of the first compare instruction cmp esi,
>>> foo<T> then carry on its job, when it enters the second object file,
>>> it finds the same useType mangled name, thus it knows it is tagged, it
>>> checks if it has any compare instruction saved from previous *.o object
>>> file then stamps its address into the bogus jump instruction of the current
>>> object file then saves the first cmp of the current object file
>>> instruction, then marks the mangled name as resolved with address of
>>> useType in this object file.... and so on....
>>>
>>> current *.o
>>> jmp 0xfffffff <-> becomes <-> jmp lea prev[cmp esi, foo<int>] //
>>> illustration only don't hate me
>>>
>>> until the last object file that contains the same mangled name, at that
>>> point, it created a linked list between compare instructions.
>>> The mangled useType will be resolved to the last same mangled name
>>> seen, thus it becomes the entry point of the useType function.
>>>
>>> It's like you are concatenating switch statement.
>>> One observation worth reiterating,
>>> Like i said in last reply, we will have code bloat.
>>> That's because the body of useType before the switch statement will be
>>> deduplicated as many as it is generated, and all, except one, will be
>>> execute, the rest is dead meat never executed.
>>> From this, i recommand to make the body of functions like useType very
>>> small, let consumeType do the heavy lifting.
>>> Or make the linker smart to fuse the whole thing into one giant switch.
>>> But the first recommendation is faster to implement.
>>>
>>> That's one simple approach, many other are way efficient. But we need a
>>> compiler planner to detect best decisions. They will work in tendem.
>>>
>>> That's all.
>>>
>>>
>>>
>>>
>>>
>>> Em sex., 2 de ago. de 2024 19:48, organicoman <organicoman_at_[hidden]>
>>> escreveu:
>>>
>>>>
>>>>
>>>> So it doesn't work for multiple translation units?
>>>>
>>>> It perfectly does.
>>>>
>>>> Because for each translation unit, you tag the functions and generate
>>>> the code, stamp it, then pass it the compilation step...and so on,
>>>>
>>>> There is no deduplication of function names, nor generation of
>>>> extraneous code. You have exactly what you need in that unit.
>>>>
>>>> Just one thing.
>>>>
>>>> Sometimes the generated code could have the same instructions but under
>>>> different function name, that causes bloating, thus a need to a compiler
>>>> planner.
>>>>
>>>> Look the idea is veeeeeery basic.
>>>>
>>>> I consider a C++ source code, after the compiler does template
>>>> instantiation, as a type's database. Unfortunately the current standards
>>>> does not allow us to query anything from it.
>>>>
>>>> Imagine if you have the power to query how many types your program
>>>> uses, what is the type-size distribution chart, sorting by type size, by
>>>> layout size.... etc
>>>>
>>>> This is a data mine to optimize your code, reason about its
>>>> implementation and make it more fast and efficient.
>>>>
>>>> That's what i am trying to start.
>>>>
>>>> I don't see the extent of my proposal but for sure, with some help, it
>>>> will be developed to full potential.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Em sex., 2 de ago. de 2024 19:16, organicoman via Std-Proposals <
>>>> std-proposals_at_[hidden]> escreveu:
>>>>
>>>>> Your solution requires at compile time information that is only
>>>>> available at runtime.
>>>>>
>>>>> I don't understand what runtime data you are referring to.
>>>>>
>>>>> In the algorithm explained before. All the data, that the compiler's
>>>>> pre-pass is manipulating, are compile time data.
>>>>> At the level of a translation unit. A function with *orthogonal
>>>>> template parameters* (foo<T>)
>>>>> cannot exist unless you instantiate it.
>>>>> Add to that, the function is tagged as per the algorithm, so it is
>>>>> prone to record the types it was instantiated for.
>>>>> But as an optimization, the compiler will not record the list of *orthogonal
>>>>> **template parameters* unless it sees that some variable is using it,
>>>>> otherwise no need to generate any code.
>>>>> The compiler confirmed that by the usage of the effdecltype operator
>>>>> inside the template argument of the vector in the example.
>>>>> Every time you instantiate foo with a type, you have to write it
>>>>> explicitly in the source code.
>>>>> foo<int>, foo<double>...etc
>>>>> Next to that, the address of a function is a constexpr value, you can
>>>>> use it as a non-type template parameter.
>>>>> So as a summary, the list of *orthogonal template parameters* is
>>>>> fetchable from the source code, and the address of the instances is
>>>>> constexpr value.
>>>>> These are the ingredients to make the algorithm work.
>>>>>
>>>>> Even if the compiler could trace data across any type of data
>>>>> container (which it won't),
>>>>>
>>>>> It doesn't need to, because the information needed is the *orthogonal
>>>>> template parameters* not the number of elements in the container.
>>>>> You can have 100 element of type foo<int>, but the list of *orthogonal
>>>>> template parameters* is only { int }
>>>>> It's like you are counting how many instances of foo<T> you have in
>>>>> that translation unit.
>>>>>
>>>>> datapoints in container could take the form of any possible
>>>>> combination on any position and be influenced to translation units not
>>>>> visible at compile time. But for your solution to work it requires a fix
>>>>> layout (which in practice doesn't happen) and perfect observability.
>>>>> This is not a problem that needs to be solved, it's just how things
>>>>> work.
>>>>>
>>>>> Let me reiterate.
>>>>> Look at this snippet:
>>>>>
>>>>> vec.push_back(&foo<int>);
>>>>> int random;
>>>>> cin >> random;
>>>>> while(random - -) vec.push_back(&foo<char>);
>>>>> cin >> random;
>>>>> random = min(random, vec.size());
>>>>> while(random - -) vec.pop_back();
>>>>>
>>>>> As you can see, i cannot tell nor track the vector size. But it is
>>>>> irrelevant, because i have only two function pointer => { int, char }
>>>>> And by consequence my switch statment will contain only two cases:
>>>>>
>>>>> case &foo<int>:
>>>>> return __Xic_foo_eff_{ &foo<int>, int(0) };
>>>>> case &foo<char>:
>>>>> return __Xic_foo_eff_{ &foo<char>, char(0) };
>>>>>
>>>>> The two case lables are constexpr values known at compile time.
>>>>>
>>>>> You cannot resolve this problem without appending extra data to
>>>>> containers that is used to inform about the nature of the data. And you
>>>>> cannot solve this without a runtime check and control flow to browse for
>>>>> the right handler.
>>>>> This is a problem of distinguishability and computability, it is not
>>>>> something you can solve, it's a consequence of math as described by Shannon.
>>>>>
>>>>> That's why std:any has additional data related to runtime type
>>>>> information, and that's why it always needs some form of selection or hash
>>>>> table or equivalent.
>>>>>
>>>>> Well, i just proved to you that i don't need that data, and that this
>>>>> approach is plausibly more effecient and safe.
>>>>> Let's try to stretch it more, shall we?
>>>>>
>>>>> The way it's already done is already how this problem should be
>>>>> solved, no new feature necessary. What you are proposing requires the
>>>>> suspension of the laws of physics.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* organicoman <organicoman_at_[hidden]>
>>>>> *Sent:* Friday, August 2, 2024 9:44:09 PM
>>>>> *To:* Tiago Freire <tmiguelf_at_[hidden]>;
>>>>> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Function overload set type information
>>>>> loss
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> And also unimplementable...
>>>>> But you don't have to take my word for it, you can try it yourself.
>>>>>
>>>>> But ... that's the whole purpose of this thread.
>>>>> Putting down an idea in form of a paper, or an algorithm and tackle
>>>>> any contradiction it raise untill we hit the wall or we make it through.
>>>>> And we have just started agreeing about the core idea. Which is
>>>>> Adding an operator that does overload dispatch and code
>>>>> generation...etc
>>>>>
>>>>> If it is not implementable, then point where, i will work on it...
>>>>>
>>>>> ------------------------------
>>>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
>>>>> behalf of organicoman via Std-Proposals <
>>>>> std-proposals_at_[hidden]>
>>>>> *Sent:* Friday, August 2, 2024 9:15:36 PM
>>>>> *To:* organicoman via Std-Proposals <std-proposals_at_[hidden]>
>>>>> *Cc:* organicoman <organicoman_at_[hidden]>
>>>>> *Subject:* Re: [std-proposals] Function overload set type information
>>>>> loss
>>>>>
>>>>> Hello Gašper,
>>>>> Since participants are starting to get what I am talking about, then
>>>>> allow me to answer your pending question.
>>>>> This is what you've said in previous post.
>>>>>
>>>>> There is another fundamental misunderstanding here.
>>>>>
>>>>> The feature calls to "record all the orthogonal template parameters".
>>>>>
>>>>> While that's not what that word means, I think regardless it's asking
>>>>> for global enumeration. C++ has a separate linking model. All the
>>>>> types/functions are not known within a single unit. The xunion is
>>>>> impossible to compute.
>>>>>
>>>>> This is a sister problem to what Jason is highlighting.
>>>>>
>>>>> When i was describing my algorithm, i left a big X mark, hoping that
>>>>> someone will ask me, if people were attentive.
>>>>> Line:
>>>>> 4-b-
>>>>> switch
>>>>> *.....
>>>>> * inside function body: goto X
>>>>>
>>>>> Then I left a hint at the bottom of the post in a form of code comment
>>>>>
>>>>> double var2 = *reinterpret_cast<double*>
>>>>> ((reinterpret_cast<char*>(&(useType()))+sizeof(int)); // int since
>>>>> __X_index is int.
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> ^^^^^
>>>>>
>>>>> The __X_index is not of type int but of type void*, the unnamed
>>>>> struct that the compiler generate becomes:
>>>>> struct __Xidfc_foo_eff
>>>>> {
>>>>> void* __X_index;
>>>>> union
>>>>> { int __Xi; double __Xd ; float __Xf; char __Xc };
>>>>> };
>>>>>
>>>>> Since the compiler tagged useType function, as described by the
>>>>> algorithm, then after finishing recording the orthogonal template
>>>>> parameters, it replaces the body with the following.
>>>>>
>>>>> auto useType(effdecltype(foo) f)
>>>>> {
>>>>> f();
>>>>> switch(f)
>>>>> {
>>>>> case &f<int>: //<- the pointer value
>>>>> return __Xidfc_foo_eff{(void*) &f<int>, int(0)};
>>>>>
>>>>> case &f<double>:
>>>>> return __Xidfc_foo_eff{(void*) &f<double>, double(0)};
>>>>>
>>>>> case &f<float>:
>>>>> return __Xidfc_foo_eff{(void*) &f<float>, float(0)};
>>>>>
>>>>> case &f<char>:
>>>>> return __Xidfc_foo_eff{(void*) &f<char>, char(0)};
>>>>> }
>>>>> }
>>>>>
>>>>> Pointers are comparable for equality.
>>>>>
>>>>> As you can see, you can capture the xunion.... right? There is no
>>>>> fundamental misunderstanding ...
>>>>> Plus everything is done by the compiler, there is no global map, no
>>>>> find algorithm, no type hashs, no type erasure (heap allocation), no manual
>>>>> intervention. All is automated at compile-time.
>>>>>
>>>>>
>>>>> --
>>>>> Std-Proposals mailing list
>>>>> Std-Proposals_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>>
>>>>>

Received on 2024-08-03 14:56:46