Date: Sun, 18 Feb 2024 19:06:17 -0800
On Sun, Feb 18, 2024 at 6:29 PM Chuanqi Xu <chuanqi.xcq_at_[hidden]>
wrote:
> > As for interface only modules, I think they will be necessary to
> support. My preference here is to stick an attribute on the module
> declaration that tells build systems that there may not be a linker input
> with external definitions, and so they need to ensure at least one object
> file with linkonce_odr definitions exists. When the compiler builds an
> object file for a module with this attribute, it emits everything as
> linkonce_odr. This allows us to keep the assumption that we don't need to
> generate these object files in the general case, but still allow for
> multiple object files to exist for interface only modules without trying to
> communicate that in a side band.
>
> Then we downgrade (or change) the named modules to something pretty
> similar with header modules. it sounds not like a good idea since it breaks
> the ability of named modules to avoid duplicated compilations in the middle
> and back end. Also it is a drastic change to the ABI...
>
I agree that it adds a bunch of costs, but I think we are going to end up
with it regardless. It's definitely a change to the ABI, but it's not
incompatible. If you do have a strong definition anywhere then that takes
over. Requiring the attribute would mean it only happens when someone
specifically asks for it.
I'm happy to see how far we can go without it, but I'm not at all going to
be surprised when someone ships a module and tells people to just include
it as part of their project and it works fine until some 3rd party tries to
use two different libraries that did this.
> > In previous discussions of this issue over the years I've always
> asserted that the distributed library needs to have any module interface
> object files, but that it would also be nice to have a linkonce_odr ABI to
> support interface only libraries if possible. If you look at the code Clang
> generates today, even an empty module, it generates an external definition
> of the module initialization function. If multiple consumers of a given
> library decide they need to generate their own, then you will get a
> multiple definition error from the linker.
>
> In the general case, I feel everyone here agree that the interface object
> files should be part of the distributed library (.a, .so). And for the std
> module, we (especially build system vendors) need to review how should we
> support std modules. If we like the status quo, then it is the
> responsibility of the build system to make sure the multiple definitions
> you described wouldn't happen. If we want the std modules to keep the
> common behavior, we should ask for the standard library vendors to change
> the distributed library.
>
The thing is it's nearly impossible for a build system to do this. As soon
as you're mixing libraries compiled at different times and potentially with
different build systems, you don't know if some other library already has a
copy of the .o for a module interface you depend on. We should just tell
stdlib vendors to include the module interface object files as part of the
stdlib library. If some specific ABI issue comes up, we can deal with that,
but for libc++ and libstdc++ I don't think there are any unless
std::ios_base::Init somehow has them.
- Michael Spencer
> Thanks,
> Chuanqi
>
> ------------------------------------------------------------------
> From: SG15 <sg15_at_[hidden]>
> Send Time:2024 Feb. 19 (Mon.) 09:32
> To:SG15<sg15_at_[hidden]>
> Cc:Michael Spencer<bigcheesegs_at_[hidden]>
> Subject:Re: [SG15] Packaging: Where should "library interface object
> files" live?
>
> I'm going to use the LLVM linkage type names in this as the names of these
> things differ between ELF, MachO, and COFF; and LLVM has a well defined
> mapping: https://llvm.org/docs/LangRef.html#linkage-types
>
> In previous discussions of this issue over the years I've always asserted
> that the distributed library needs to have any module interface object
> files, but that it would also be nice to have a linkonce_odr ABI to support
> interface only libraries if possible. If you look at the code Clang
> generates today, even an empty module, it generates an external definition
> of the module initialization function. If multiple consumers of a given
> library decide they need to generate their own, then you will get a
> multiple definition error from the linker.
>
> libc++ already deals with differing ABI issues today, and actually goes
> further than any other library I'm aware of to make that work. libc++ can
> continue to do this with exactly the same mechanism they use now
> (__abi_tag__ and being very careful). Modules don't change this, and given
> libc++'s current implementation strategy of `using` declarations, the .o
> file they generate for the std module will only contain a module init
> function.
>
> The benefit of using external definitions is that nobody else ever needs
> to generate them, the compiler can always assume they will be present.
> There is also some debug info that can be contained in the object file
> instead of duplicated.
>
> For other libraries, modules don't change ABI concerns either. If you
> include code as part of your module interface it has exactly the same ABI
> concerns as with headers with regard to how the BMI is built. The only new
> thing is that now the library author has some say over how the BMI is
> built; however, this is not absolute control, and so you need to be
> prepared to deal with arbitrary differences anyway, just as with headers. A
> library author should document what differences they support.
>
> As for interface only modules, I think they will be necessary to support.
> My preference here is to stick an attribute on the module declaration that
> tells build systems that there may not be a linker input with external
> definitions, and so they need to ensure at least one object file with
> linkonce_odr definitions exists. When the compiler builds an object file
> for a module with this attribute, it emits everything as linkonce_odr. This
> allows us to keep the assumption that we don't need to generate these
> object files in the general case, but still allow for multiple object files
> to exist for interface only modules without trying to communicate that in a
> side band.
>
> - Michael Spencer
>
> On Tue, Feb 13, 2024 at 8:11 AM Jan Kokemüller via SG15 <
> sg15_at_[hidden]> wrote:
> Hi,
>
> let's say I'm packaging a modularized C++ library "foo" that consists of a
> module implementation unit "foo.cpp" and an importable module unit
> "foo.cppm".
> Where should the "library interface object files" live? In the
> "libfoo.{a,so}"
> I ship, or can I punt this task to the consumer, who will compile the
> importable module unit "foo.cppm" anyway (to get the BMI's)?
>
> With "library interface object files" I mean the object files that are
> generated by compiling the importable module unit "foo.cppm". I'm using the
> terminology from Daniela Engert's talk here:
> <https://youtu.be/nP8QcvPpGeM?t=333>
>
> At least with Clang, the "library interface object files" will at least
> contain
> the symbol for the "module initializer function" as laid out by the
> proposed
> updates to the Itanium ABI:
> <
> https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048
> >
>
> So far I had assumed from following discussions and from my own experiments
> with CMake >= 3.28 and reading Conan's plan
> (<https://blog.conan.io/2023/10/17/modules-the-packaging-story.html>)
> that the
> "library interface object files" (that contain e.g. the symbols for the
> module
> initializer functions) will live in the library artifact "libfoo.{a,so}".
> That
> way, as a consumer of that library, I can describe that library in my CMake
> build system by creating an "imported" CMake target, without having to
> build
> anything else except for the BMI's of the importable module units.
>
> Furthermore, I had also assumed that the P2577R2 style metadata file that
> describes the modules of a library is placed next to a library artifact
> that
> contains the "library interface object files" (including the module
> initializer
> symbols). That is also the reason I thought there always _exists_ a library
> artifact for the metadata file to be placed next to, as the library
> artifact
> will always at least contain the module initializer symbol.
>
> In contrast, an alternative style of packaging a modularized library is
> possible, where the library artifact does _not_ contain the "library
> interface
> object files", instead requiring the consumer to build them in addition to
> the
> BMI's. In CMake terms, users then could _not_ create an "imported" library
> target, instead having to add a "proper" library target to their build that
> "owns" the "library interface object files".
>
> In my mind, this alternative style creates a number of headaches for the
> build
> and packaging ecosystems as they have to cope with those additional
> libraries
> required for holding the "library interface object files". It would
> certainly
> be simpler for consumers if those symbols where "owned" by the library
> artifact
> itself.
>
> I stumbled across this issue as I was trying to consume the experimental
> libc++
> "std" module. libc++ chose the second approach, i.e. the module initializer
> symbols are not packaged up in any library artifact provided by libc++. In
> the
> resulting discussion on the libc++ bug tracker
> (<https://github.com/llvm/llvm-project/issues/80639>) people have
> encouraged me
> to approach SG15.
>
> What do you think about this issue? I'm curious about use cases for the
> alternative packaging approach. Certainly there would need to be another
> key
> like "library-contains-interface-object-files" in the metadata file so that
> users know what kind of packaging approach was used. But I hope there can
> be
> convergence on one approach so that kind of complexity could be avoided.
>
> -Jan
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
>
wrote:
> > As for interface only modules, I think they will be necessary to
> support. My preference here is to stick an attribute on the module
> declaration that tells build systems that there may not be a linker input
> with external definitions, and so they need to ensure at least one object
> file with linkonce_odr definitions exists. When the compiler builds an
> object file for a module with this attribute, it emits everything as
> linkonce_odr. This allows us to keep the assumption that we don't need to
> generate these object files in the general case, but still allow for
> multiple object files to exist for interface only modules without trying to
> communicate that in a side band.
>
> Then we downgrade (or change) the named modules to something pretty
> similar with header modules. it sounds not like a good idea since it breaks
> the ability of named modules to avoid duplicated compilations in the middle
> and back end. Also it is a drastic change to the ABI...
>
I agree that it adds a bunch of costs, but I think we are going to end up
with it regardless. It's definitely a change to the ABI, but it's not
incompatible. If you do have a strong definition anywhere then that takes
over. Requiring the attribute would mean it only happens when someone
specifically asks for it.
I'm happy to see how far we can go without it, but I'm not at all going to
be surprised when someone ships a module and tells people to just include
it as part of their project and it works fine until some 3rd party tries to
use two different libraries that did this.
> > In previous discussions of this issue over the years I've always
> asserted that the distributed library needs to have any module interface
> object files, but that it would also be nice to have a linkonce_odr ABI to
> support interface only libraries if possible. If you look at the code Clang
> generates today, even an empty module, it generates an external definition
> of the module initialization function. If multiple consumers of a given
> library decide they need to generate their own, then you will get a
> multiple definition error from the linker.
>
> In the general case, I feel everyone here agree that the interface object
> files should be part of the distributed library (.a, .so). And for the std
> module, we (especially build system vendors) need to review how should we
> support std modules. If we like the status quo, then it is the
> responsibility of the build system to make sure the multiple definitions
> you described wouldn't happen. If we want the std modules to keep the
> common behavior, we should ask for the standard library vendors to change
> the distributed library.
>
The thing is it's nearly impossible for a build system to do this. As soon
as you're mixing libraries compiled at different times and potentially with
different build systems, you don't know if some other library already has a
copy of the .o for a module interface you depend on. We should just tell
stdlib vendors to include the module interface object files as part of the
stdlib library. If some specific ABI issue comes up, we can deal with that,
but for libc++ and libstdc++ I don't think there are any unless
std::ios_base::Init somehow has them.
- Michael Spencer
> Thanks,
> Chuanqi
>
> ------------------------------------------------------------------
> From: SG15 <sg15_at_[hidden]>
> Send Time:2024 Feb. 19 (Mon.) 09:32
> To:SG15<sg15_at_[hidden]>
> Cc:Michael Spencer<bigcheesegs_at_[hidden]>
> Subject:Re: [SG15] Packaging: Where should "library interface object
> files" live?
>
> I'm going to use the LLVM linkage type names in this as the names of these
> things differ between ELF, MachO, and COFF; and LLVM has a well defined
> mapping: https://llvm.org/docs/LangRef.html#linkage-types
>
> In previous discussions of this issue over the years I've always asserted
> that the distributed library needs to have any module interface object
> files, but that it would also be nice to have a linkonce_odr ABI to support
> interface only libraries if possible. If you look at the code Clang
> generates today, even an empty module, it generates an external definition
> of the module initialization function. If multiple consumers of a given
> library decide they need to generate their own, then you will get a
> multiple definition error from the linker.
>
> libc++ already deals with differing ABI issues today, and actually goes
> further than any other library I'm aware of to make that work. libc++ can
> continue to do this with exactly the same mechanism they use now
> (__abi_tag__ and being very careful). Modules don't change this, and given
> libc++'s current implementation strategy of `using` declarations, the .o
> file they generate for the std module will only contain a module init
> function.
>
> The benefit of using external definitions is that nobody else ever needs
> to generate them, the compiler can always assume they will be present.
> There is also some debug info that can be contained in the object file
> instead of duplicated.
>
> For other libraries, modules don't change ABI concerns either. If you
> include code as part of your module interface it has exactly the same ABI
> concerns as with headers with regard to how the BMI is built. The only new
> thing is that now the library author has some say over how the BMI is
> built; however, this is not absolute control, and so you need to be
> prepared to deal with arbitrary differences anyway, just as with headers. A
> library author should document what differences they support.
>
> As for interface only modules, I think they will be necessary to support.
> My preference here is to stick an attribute on the module declaration that
> tells build systems that there may not be a linker input with external
> definitions, and so they need to ensure at least one object file with
> linkonce_odr definitions exists. When the compiler builds an object file
> for a module with this attribute, it emits everything as linkonce_odr. This
> allows us to keep the assumption that we don't need to generate these
> object files in the general case, but still allow for multiple object files
> to exist for interface only modules without trying to communicate that in a
> side band.
>
> - Michael Spencer
>
> On Tue, Feb 13, 2024 at 8:11 AM Jan Kokemüller via SG15 <
> sg15_at_[hidden]> wrote:
> Hi,
>
> let's say I'm packaging a modularized C++ library "foo" that consists of a
> module implementation unit "foo.cpp" and an importable module unit
> "foo.cppm".
> Where should the "library interface object files" live? In the
> "libfoo.{a,so}"
> I ship, or can I punt this task to the consumer, who will compile the
> importable module unit "foo.cppm" anyway (to get the BMI's)?
>
> With "library interface object files" I mean the object files that are
> generated by compiling the importable module unit "foo.cppm". I'm using the
> terminology from Daniela Engert's talk here:
> <https://youtu.be/nP8QcvPpGeM?t=333>
>
> At least with Clang, the "library interface object files" will at least
> contain
> the symbol for the "module initializer function" as laid out by the
> proposed
> updates to the Itanium ABI:
> <
> https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048
> >
>
> So far I had assumed from following discussions and from my own experiments
> with CMake >= 3.28 and reading Conan's plan
> (<https://blog.conan.io/2023/10/17/modules-the-packaging-story.html>)
> that the
> "library interface object files" (that contain e.g. the symbols for the
> module
> initializer functions) will live in the library artifact "libfoo.{a,so}".
> That
> way, as a consumer of that library, I can describe that library in my CMake
> build system by creating an "imported" CMake target, without having to
> build
> anything else except for the BMI's of the importable module units.
>
> Furthermore, I had also assumed that the P2577R2 style metadata file that
> describes the modules of a library is placed next to a library artifact
> that
> contains the "library interface object files" (including the module
> initializer
> symbols). That is also the reason I thought there always _exists_ a library
> artifact for the metadata file to be placed next to, as the library
> artifact
> will always at least contain the module initializer symbol.
>
> In contrast, an alternative style of packaging a modularized library is
> possible, where the library artifact does _not_ contain the "library
> interface
> object files", instead requiring the consumer to build them in addition to
> the
> BMI's. In CMake terms, users then could _not_ create an "imported" library
> target, instead having to add a "proper" library target to their build that
> "owns" the "library interface object files".
>
> In my mind, this alternative style creates a number of headaches for the
> build
> and packaging ecosystems as they have to cope with those additional
> libraries
> required for holding the "library interface object files". It would
> certainly
> be simpler for consumers if those symbols where "owned" by the library
> artifact
> itself.
>
> I stumbled across this issue as I was trying to consume the experimental
> libc++
> "std" module. libc++ chose the second approach, i.e. the module initializer
> symbols are not packaged up in any library artifact provided by libc++. In
> the
> resulting discussion on the libc++ bug tracker
> (<https://github.com/llvm/llvm-project/issues/80639>) people have
> encouraged me
> to approach SG15.
>
> What do you think about this issue? I'm curious about use cases for the
> alternative packaging approach. Certainly there would need to be another
> key
> like "library-contains-interface-object-files" in the metadata file so that
> users know what kind of packaging approach was used. But I hope there can
> be
> convergence on one approach so that kind of complexity could be avoided.
>
> -Jan
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
>
Received on 2024-02-19 03:06:33