Date: Sat, 30 Aug 2025 16:54:56 -0700
Hello,
Calling a C++ API implemented in a shared object (.so/.dll) from a
foreign language should be convenient instead of impossible. Foreign
languages and their libraries are often implemented in C and so
providing a mechanism for C to use the C++ function calling convention
without requiring a translation layer around everything would reduce
friction.
I would like to propose we somehow standardize what is part of the
Itanium C++ ABI's section 5.1 External Names (a.k.a. Mangling) and
certain other parts. This should allow a script to call C++
constructors/destructors on a buffer, use C++ operators and pass data
by reference to C++. Providing C++ reflection data in a format
accessible to a foreign language is also discussed as a second part to
this proposal below. Once again, please forgive my contrarian
tendencies.
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
This proposal is the result of an experiment I did lately where I used
libclang to reimplement a C++ object model in Python by parsing C++
headers and then generating a script that could call the C++ symbols
in a .so directly. The main challenge was that I had to hijack
Python's support for calling C functions and use
implementation-defined knowledge of the C++ ABI to do so. The only
real problem I found was that it is impossible to return an object
with a non-trivial destructor by value from a C++ function to a
foreign language without that language knowing the C++ calling
convention.
The motivation for this experiment was that the state of the art for
binding C++ to Python is not too great. If you are willing to write a
lot of C wrapper code using the Python C API then you will have a good
solution. Otherwise, if you want to use C++ based script binding
tools, the ones I tried have performance issues. I was burning 50% of
my compile time and 50% of my .so size. I hate to name names, because
they seem well written, are more mature than my efforts and represent
a lot of effort, but that was pybind11 and nanobind. Possibly things
will improve if someone uses the new C++ reflection code to generate
bindings without using templates.
Overall, having a tool to write out my C++ script bindings in 0.5
seconds using C++ as an interface definition language was a great
experience by comparison. The point of using script is to be able to
iterate quickly and this helped enable that.
Here is an example of an overloaded C++ constructor being called
directly from Python. The C++ symbols have been loaded into the Python
module's global namespace and so they appear to be called directly.
The self arg is a "this" pointer to a buffer.
def __init__(self,*_Args,**_Kwargs):
match _Len(_Args):
case 0:
return
_ZN12OperatorTestC1Ev(_Ctypes.byref(self))
case 1:
return
_ZN12OperatorTestC1Ei(_Ctypes.byref(self),_Args[0])
My biggest complaint was that I needed any of this technology at all.
Arguably, the C/C++ compiler could emit a form of pre-compiled header
that described a part of the C/C++ API found in a C/C++ header. Then
every scripting engine could just load that instead of needing the
normal script binding boilerplate that is used. The C++ symbols
already have type information encoded in them and so it seems strange
to be manually configuring marshaling code for them in another
language. (Please forgive me if I am rubbing you the wrong way for the
second time in this email.)
For those who are really curious, the script for parsing a C++ API and
generating a direct call wrapper is here:
https://github.com/whatchamacallem/hatchlingplatform/blob/main/entanglement_example/src/entanglement.py
Nota Bene: There was one bug I can't fix with the current design of
C++ and Python. Returning a class by value will result in it being
destructed without being copied first.
I am happy to pull together the parts of the Itanium ABI that would
need to be standardized into a proposal if anyone is interested. This
is step 1: "float the idea." The first part would be to allow C to
identify and call C++ function pointers (in this case directly out of
a .so, although that detail has not been standardized) with code
written only in C that was not compiled with the types involved. The
manner in which it is done could still be implementation defined as
long as there was agreement between the two languages as to how to
operate the additional machinery C++ needs for the particular
platforms ABI. The second optional part would be to standardize a set
of requirements for reflection data for a subset of C++ that can be
read from a shared object directly or stored along side one.
Regards,
Adrian
Calling a C++ API implemented in a shared object (.so/.dll) from a
foreign language should be convenient instead of impossible. Foreign
languages and their libraries are often implemented in C and so
providing a mechanism for C to use the C++ function calling convention
without requiring a translation layer around everything would reduce
friction.
I would like to propose we somehow standardize what is part of the
Itanium C++ ABI's section 5.1 External Names (a.k.a. Mangling) and
certain other parts. This should allow a script to call C++
constructors/destructors on a buffer, use C++ operators and pass data
by reference to C++. Providing C++ reflection data in a format
accessible to a foreign language is also discussed as a second part to
this proposal below. Once again, please forgive my contrarian
tendencies.
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
This proposal is the result of an experiment I did lately where I used
libclang to reimplement a C++ object model in Python by parsing C++
headers and then generating a script that could call the C++ symbols
in a .so directly. The main challenge was that I had to hijack
Python's support for calling C functions and use
implementation-defined knowledge of the C++ ABI to do so. The only
real problem I found was that it is impossible to return an object
with a non-trivial destructor by value from a C++ function to a
foreign language without that language knowing the C++ calling
convention.
The motivation for this experiment was that the state of the art for
binding C++ to Python is not too great. If you are willing to write a
lot of C wrapper code using the Python C API then you will have a good
solution. Otherwise, if you want to use C++ based script binding
tools, the ones I tried have performance issues. I was burning 50% of
my compile time and 50% of my .so size. I hate to name names, because
they seem well written, are more mature than my efforts and represent
a lot of effort, but that was pybind11 and nanobind. Possibly things
will improve if someone uses the new C++ reflection code to generate
bindings without using templates.
Overall, having a tool to write out my C++ script bindings in 0.5
seconds using C++ as an interface definition language was a great
experience by comparison. The point of using script is to be able to
iterate quickly and this helped enable that.
Here is an example of an overloaded C++ constructor being called
directly from Python. The C++ symbols have been loaded into the Python
module's global namespace and so they appear to be called directly.
The self arg is a "this" pointer to a buffer.
def __init__(self,*_Args,**_Kwargs):
match _Len(_Args):
case 0:
return
_ZN12OperatorTestC1Ev(_Ctypes.byref(self))
case 1:
return
_ZN12OperatorTestC1Ei(_Ctypes.byref(self),_Args[0])
My biggest complaint was that I needed any of this technology at all.
Arguably, the C/C++ compiler could emit a form of pre-compiled header
that described a part of the C/C++ API found in a C/C++ header. Then
every scripting engine could just load that instead of needing the
normal script binding boilerplate that is used. The C++ symbols
already have type information encoded in them and so it seems strange
to be manually configuring marshaling code for them in another
language. (Please forgive me if I am rubbing you the wrong way for the
second time in this email.)
For those who are really curious, the script for parsing a C++ API and
generating a direct call wrapper is here:
https://github.com/whatchamacallem/hatchlingplatform/blob/main/entanglement_example/src/entanglement.py
Nota Bene: There was one bug I can't fix with the current design of
C++ and Python. Returning a class by value will result in it being
destructed without being copied first.
I am happy to pull together the parts of the Itanium ABI that would
need to be standardized into a proposal if anyone is interested. This
is step 1: "float the idea." The first part would be to allow C to
identify and call C++ function pointers (in this case directly out of
a .so, although that detail has not been standardized) with code
written only in C that was not compiled with the types involved. The
manner in which it is done could still be implementation defined as
long as there was agreement between the two languages as to how to
operate the additional machinery C++ needs for the particular
platforms ABI. The second optional part would be to standardize a set
of requirements for reflection data for a subset of C++ that can be
read from a shared object directly or stored along side one.
Regards,
Adrian
Received on 2025-08-30 23:55:11