Date: Mon, 19 Feb 2024 11:22:21 +0800
> I agree that it adds a bunch of costs
It is not only about cost for implementations but some usability for named modules.
For example, for:
```
export module a;
export int a() { ... }
```
The function `a()` will be compiled (and optimized) exactly once. This is a pretty good property. But with the proposing method:
```
export module a [[inline_entities]];
export int a() { ... }
```
Now the function `a()` will be compiled in every TU using it (directly and indirectly). I feel it really like headers except it can't export macros..
> The thing is it's nearly impossible for a build system to do this. As soon as you're mixing libraries compiled at different times and potentially with different build systems, you don't know if some other library already has a copy of the .o for a module interface you depend on.
It depends on the scope of topics. If we're only talking about std modules, it should be possible since the scope is highlly limited.
> We should just tell stdlib vendors to include the module interface object files as part of the stdlib library. If some specific ABI issue comes up, we can deal with that, but for libc++ and libstdc++ I don't think there are any unless std::ios_base::Init somehow has them.
Agreed.
Thanks,
Chuanqi
------------------------------------------------------------------
From:Michael Spencer <bigcheesegs_at_[hidden]>
Send Time:2024 Feb. 19 (Mon.) 11:06
To:Chuanqi<chuanqi.xcq_at_[hidden]libaba-inc.com>
Cc:SG15<sg15_at_[hidden]>
Subject:Re: [SG15] Packaging: Where should "library interface object files" live?
On Sun, Feb 18, 2024 at 6:29 PM Chuanqi Xu <chuanqi.xcq_at_[hidden] <mailto:chuanqi.xcq_at_[hidden] >> wrote:
> As for interface only modules, I think they will be necessary to support. My preference here is to stick an attribute on the module declaration that tells build systems that there may not be a linker input with external definitions, and so they need to ensure at least one object file with linkonce_odr definitions exists. When the compiler builds an object file for a module with this attribute, it emits everything as linkonce_odr. This allows us to keep the assumption that we don't need to generate these object files in the general case, but still allow for multiple object files to exist for interface only modules without trying to communicate that in a side band.
Then we downgrade (or change) the named modules to something pretty similar with header modules. it sounds not like a good idea since it breaks the ability of named modules to avoid duplicated compilations in the middle and back end. Also it is a drastic change to the ABI...
I agree that it adds a bunch of costs, but I think we are going to end up with it regardless. It's definitely a change to the ABI, but it's not incompatible. If you do have a strong definition anywhere then that takes over. Requiring the attribute would mean it only happens when someone specifically asks for it.
I'm happy to see how far we can go without it, but I'm not at all going to be surprised when someone ships a module and tells people to just include it as part of their project and it works fine until some 3rd party tries to use two different libraries that did this.
> In previous discussions of this issue over the years I've always asserted that the distributed library needs to have any module interface object files, but that it would also be nice to have a linkonce_odr ABI to support interface only libraries if possible. If you look at the code Clang generates today, even an empty module, it generates an external definition of the module initialization function. If multiple consumers of a given library decide they need to generate their own, then you will get a multiple definition error from the linker.
In the general case, I feel everyone here agree that the interface object files should be part of the distributed library (.a, .so). And for the std module, we (especially build system vendors) need to review how should we support std modules. If we like the status quo, then it is the responsibility of the build system to make sure the multiple definitions you described wouldn't happen. If we want the std modules to keep the common behavior, we should ask for the standard library vendors to change the distributed library.
The thing is it's nearly impossible for a build system to do this. As soon as you're mixing libraries compiled at different times and potentially with different build systems, you don't know if some other library already has a copy of the .o for a module interface you depend on. We should just tell stdlib vendors to include the module interface object files as part of the stdlib library. If some specific ABI issue comes up, we can deal with that, but for libc++ and libstdc++ I don't think there are any unless std::ios_base::Init somehow has them.
- Michael Spencer
Thanks,
Chuanqi
------------------------------------------------------------------
From: SG15 <sg15_at_lists.isocpp.org <mailto:sg15_at_[hidden] >>
Send Time:2024 Feb. 19 (Mon.) 09:32
To:SG15<sg15_at_[hidden] <mailto:sg15_at_[hidden] >>
Cc:Michael Spencer<bigcheesegs_at_[hidden] <mailto:bigcheesegs_at_[hidden] >>
Subject:Re: [SG15] Packaging: Where should "library interface object files" live?
I'm going to use the LLVM linkage type names in this as the names of these things differ between ELF, MachO, and COFF; and LLVM has a well defined mapping: https://llvm.org/docs/LangRef.html#linkage-types <https://llvm.org/docs/LangRef.html#linkage-types >
In previous discussions of this issue over the years I've always asserted that the distributed library needs to have any module interface object files, but that it would also be nice to have a linkonce_odr ABI to support interface only libraries if possible. If you look at the code Clang generates today, even an empty module, it generates an external definition of the module initialization function. If multiple consumers of a given library decide they need to generate their own, then you will get a multiple definition error from the linker.
libc++ already deals with differing ABI issues today, and actually goes further than any other library I'm aware of to make that work. libc++ can continue to do this with exactly the same mechanism they use now (__abi_tag__ and being very careful). Modules don't change this, and given libc++'s current implementation strategy of `using` declarations, the .o file they generate for the std module will only contain a module init function.
The benefit of using external definitions is that nobody else ever needs to generate them, the compiler can always assume they will be present. There is also some debug info that can be contained in the object file instead of duplicated.
For other libraries, modules don't change ABI concerns either. If you include code as part of your module interface it has exactly the same ABI concerns as with headers with regard to how the BMI is built. The only new thing is that now the library author has some say over how the BMI is built; however, this is not absolute control, and so you need to be prepared to deal with arbitrary differences anyway, just as with headers. A library author should document what differences they support.
As for interface only modules, I think they will be necessary to support. My preference here is to stick an attribute on the module declaration that tells build systems that there may not be a linker input with external definitions, and so they need to ensure at least one object file with linkonce_odr definitions exists. When the compiler builds an object file for a module with this attribute, it emits everything as linkonce_odr. This allows us to keep the assumption that we don't need to generate these object files in the general case, but still allow for multiple object files to exist for interface only modules without trying to communicate that in a side band.
- Michael Spencer
On Tue, Feb 13, 2024 at 8:11 AM Jan Kokemüller via SG15 <sg15_at_[hidden] <mailto:sg15_at_[hidden] >> wrote:
Hi,
let's say I'm packaging a modularized C++ library "foo" that consists of a
module implementation unit "foo.cpp" and an importable module unit "foo.cppm".
Where should the "library interface object files" live? In the "libfoo.{a,so}"
I ship, or can I punt this task to the consumer, who will compile the
importable module unit "foo.cppm" anyway (to get the BMI's)?
With "library interface object files" I mean the object files that are
generated by compiling the importable module unit "foo.cppm". I'm using the
terminology from Daniela Engert's talk here:
<https://youtu.be/nP8QcvPpGeM?t=333 <https://youtu.be/nP8QcvPpGeM?t=333 >>
At least with Clang, the "library interface object files" will at least contain
the symbol for the "module initializer function" as laid out by the proposed
updates to the Itanium ABI:
<https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048 <https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048 >>
So far I had assumed from following discussions and from my own experiments
with CMake >= 3.28 and reading Conan's plan
(<https://blog.conan.io/2023/10/17/modules-the-packaging-story.html <https://blog.conan.io/2023/10/17/modules-the-packaging-story.html >>) that the
"library interface object files" (that contain e.g. the symbols for the module
initializer functions) will live in the library artifact "libfoo.{a,so}". That
way, as a consumer of that library, I can describe that library in my CMake
build system by creating an "imported" CMake target, without having to build
anything else except for the BMI's of the importable module units.
Furthermore, I had also assumed that the P2577R2 style metadata file that
describes the modules of a library is placed next to a library artifact that
contains the "library interface object files" (including the module initializer
symbols). That is also the reason I thought there always _exists_ a library
artifact for the metadata file to be placed next to, as the library artifact
will always at least contain the module initializer symbol.
In contrast, an alternative style of packaging a modularized library is
possible, where the library artifact does _not_ contain the "library interface
object files", instead requiring the consumer to build them in addition to the
BMI's. In CMake terms, users then could _not_ create an "imported" library
target, instead having to add a "proper" library target to their build that
"owns" the "library interface object files".
In my mind, this alternative style creates a number of headaches for the build
and packaging ecosystems as they have to cope with those additional libraries
required for holding the "library interface object files". It would certainly
be simpler for consumers if those symbols where "owned" by the library artifact
itself.
I stumbled across this issue as I was trying to consume the experimental libc++
"std" module. libc++ chose the second approach, i.e. the module initializer
symbols are not packaged up in any library artifact provided by libc++. In the
resulting discussion on the libc++ bug tracker
(<https://github.com/llvm/llvm-project/issues/80639 <https://github.com/llvm/llvm-project/issues/80639 >>) people have encouraged me
to approach SG15.
What do you think about this issue? I'm curious about use cases for the
alternative packaging approach. Certainly there would need to be another key
like "library-contains-interface-object-files" in the metadata file so that
users know what kind of packaging approach was used. But I hope there can be
convergence on one approach so that kind of complexity could be avoided.
-Jan
_______________________________________________
SG15 mailing list
SG15_at_[hidden] <mailto:SG15_at_[hidden]g >
https://lists.isocpp.org/mailman/listinfo.cgi/sg15 <https://lists.isocpp.org/mailman/listinfo.cgi/sg15 >
It is not only about cost for implementations but some usability for named modules.
For example, for:
```
export module a;
export int a() { ... }
```
The function `a()` will be compiled (and optimized) exactly once. This is a pretty good property. But with the proposing method:
```
export module a [[inline_entities]];
export int a() { ... }
```
Now the function `a()` will be compiled in every TU using it (directly and indirectly). I feel it really like headers except it can't export macros..
> The thing is it's nearly impossible for a build system to do this. As soon as you're mixing libraries compiled at different times and potentially with different build systems, you don't know if some other library already has a copy of the .o for a module interface you depend on.
It depends on the scope of topics. If we're only talking about std modules, it should be possible since the scope is highlly limited.
> We should just tell stdlib vendors to include the module interface object files as part of the stdlib library. If some specific ABI issue comes up, we can deal with that, but for libc++ and libstdc++ I don't think there are any unless std::ios_base::Init somehow has them.
Agreed.
Thanks,
Chuanqi
------------------------------------------------------------------
From:Michael Spencer <bigcheesegs_at_[hidden]>
Send Time:2024 Feb. 19 (Mon.) 11:06
To:Chuanqi<chuanqi.xcq_at_[hidden]libaba-inc.com>
Cc:SG15<sg15_at_[hidden]>
Subject:Re: [SG15] Packaging: Where should "library interface object files" live?
On Sun, Feb 18, 2024 at 6:29 PM Chuanqi Xu <chuanqi.xcq_at_[hidden] <mailto:chuanqi.xcq_at_[hidden] >> wrote:
> As for interface only modules, I think they will be necessary to support. My preference here is to stick an attribute on the module declaration that tells build systems that there may not be a linker input with external definitions, and so they need to ensure at least one object file with linkonce_odr definitions exists. When the compiler builds an object file for a module with this attribute, it emits everything as linkonce_odr. This allows us to keep the assumption that we don't need to generate these object files in the general case, but still allow for multiple object files to exist for interface only modules without trying to communicate that in a side band.
Then we downgrade (or change) the named modules to something pretty similar with header modules. it sounds not like a good idea since it breaks the ability of named modules to avoid duplicated compilations in the middle and back end. Also it is a drastic change to the ABI...
I agree that it adds a bunch of costs, but I think we are going to end up with it regardless. It's definitely a change to the ABI, but it's not incompatible. If you do have a strong definition anywhere then that takes over. Requiring the attribute would mean it only happens when someone specifically asks for it.
I'm happy to see how far we can go without it, but I'm not at all going to be surprised when someone ships a module and tells people to just include it as part of their project and it works fine until some 3rd party tries to use two different libraries that did this.
> In previous discussions of this issue over the years I've always asserted that the distributed library needs to have any module interface object files, but that it would also be nice to have a linkonce_odr ABI to support interface only libraries if possible. If you look at the code Clang generates today, even an empty module, it generates an external definition of the module initialization function. If multiple consumers of a given library decide they need to generate their own, then you will get a multiple definition error from the linker.
In the general case, I feel everyone here agree that the interface object files should be part of the distributed library (.a, .so). And for the std module, we (especially build system vendors) need to review how should we support std modules. If we like the status quo, then it is the responsibility of the build system to make sure the multiple definitions you described wouldn't happen. If we want the std modules to keep the common behavior, we should ask for the standard library vendors to change the distributed library.
The thing is it's nearly impossible for a build system to do this. As soon as you're mixing libraries compiled at different times and potentially with different build systems, you don't know if some other library already has a copy of the .o for a module interface you depend on. We should just tell stdlib vendors to include the module interface object files as part of the stdlib library. If some specific ABI issue comes up, we can deal with that, but for libc++ and libstdc++ I don't think there are any unless std::ios_base::Init somehow has them.
- Michael Spencer
Thanks,
Chuanqi
------------------------------------------------------------------
From: SG15 <sg15_at_lists.isocpp.org <mailto:sg15_at_[hidden] >>
Send Time:2024 Feb. 19 (Mon.) 09:32
To:SG15<sg15_at_[hidden] <mailto:sg15_at_[hidden] >>
Cc:Michael Spencer<bigcheesegs_at_[hidden] <mailto:bigcheesegs_at_[hidden] >>
Subject:Re: [SG15] Packaging: Where should "library interface object files" live?
I'm going to use the LLVM linkage type names in this as the names of these things differ between ELF, MachO, and COFF; and LLVM has a well defined mapping: https://llvm.org/docs/LangRef.html#linkage-types <https://llvm.org/docs/LangRef.html#linkage-types >
In previous discussions of this issue over the years I've always asserted that the distributed library needs to have any module interface object files, but that it would also be nice to have a linkonce_odr ABI to support interface only libraries if possible. If you look at the code Clang generates today, even an empty module, it generates an external definition of the module initialization function. If multiple consumers of a given library decide they need to generate their own, then you will get a multiple definition error from the linker.
libc++ already deals with differing ABI issues today, and actually goes further than any other library I'm aware of to make that work. libc++ can continue to do this with exactly the same mechanism they use now (__abi_tag__ and being very careful). Modules don't change this, and given libc++'s current implementation strategy of `using` declarations, the .o file they generate for the std module will only contain a module init function.
The benefit of using external definitions is that nobody else ever needs to generate them, the compiler can always assume they will be present. There is also some debug info that can be contained in the object file instead of duplicated.
For other libraries, modules don't change ABI concerns either. If you include code as part of your module interface it has exactly the same ABI concerns as with headers with regard to how the BMI is built. The only new thing is that now the library author has some say over how the BMI is built; however, this is not absolute control, and so you need to be prepared to deal with arbitrary differences anyway, just as with headers. A library author should document what differences they support.
As for interface only modules, I think they will be necessary to support. My preference here is to stick an attribute on the module declaration that tells build systems that there may not be a linker input with external definitions, and so they need to ensure at least one object file with linkonce_odr definitions exists. When the compiler builds an object file for a module with this attribute, it emits everything as linkonce_odr. This allows us to keep the assumption that we don't need to generate these object files in the general case, but still allow for multiple object files to exist for interface only modules without trying to communicate that in a side band.
- Michael Spencer
On Tue, Feb 13, 2024 at 8:11 AM Jan Kokemüller via SG15 <sg15_at_[hidden] <mailto:sg15_at_[hidden] >> wrote:
Hi,
let's say I'm packaging a modularized C++ library "foo" that consists of a
module implementation unit "foo.cpp" and an importable module unit "foo.cppm".
Where should the "library interface object files" live? In the "libfoo.{a,so}"
I ship, or can I punt this task to the consumer, who will compile the
importable module unit "foo.cppm" anyway (to get the BMI's)?
With "library interface object files" I mean the object files that are
generated by compiling the importable module unit "foo.cppm". I'm using the
terminology from Daniela Engert's talk here:
<https://youtu.be/nP8QcvPpGeM?t=333 <https://youtu.be/nP8QcvPpGeM?t=333 >>
At least with Clang, the "library interface object files" will at least contain
the symbol for the "module initializer function" as laid out by the proposed
updates to the Itanium ABI:
<https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048 <https://github.com/itanium-cxx-abi/cxx-abi/pull/144/files#diff-b803017e5afd1b6dfe35e5e0e719d895559129c35b93f056074a72928269ae23R5022-R5048 >>
So far I had assumed from following discussions and from my own experiments
with CMake >= 3.28 and reading Conan's plan
(<https://blog.conan.io/2023/10/17/modules-the-packaging-story.html <https://blog.conan.io/2023/10/17/modules-the-packaging-story.html >>) that the
"library interface object files" (that contain e.g. the symbols for the module
initializer functions) will live in the library artifact "libfoo.{a,so}". That
way, as a consumer of that library, I can describe that library in my CMake
build system by creating an "imported" CMake target, without having to build
anything else except for the BMI's of the importable module units.
Furthermore, I had also assumed that the P2577R2 style metadata file that
describes the modules of a library is placed next to a library artifact that
contains the "library interface object files" (including the module initializer
symbols). That is also the reason I thought there always _exists_ a library
artifact for the metadata file to be placed next to, as the library artifact
will always at least contain the module initializer symbol.
In contrast, an alternative style of packaging a modularized library is
possible, where the library artifact does _not_ contain the "library interface
object files", instead requiring the consumer to build them in addition to the
BMI's. In CMake terms, users then could _not_ create an "imported" library
target, instead having to add a "proper" library target to their build that
"owns" the "library interface object files".
In my mind, this alternative style creates a number of headaches for the build
and packaging ecosystems as they have to cope with those additional libraries
required for holding the "library interface object files". It would certainly
be simpler for consumers if those symbols where "owned" by the library artifact
itself.
I stumbled across this issue as I was trying to consume the experimental libc++
"std" module. libc++ chose the second approach, i.e. the module initializer
symbols are not packaged up in any library artifact provided by libc++. In the
resulting discussion on the libc++ bug tracker
(<https://github.com/llvm/llvm-project/issues/80639 <https://github.com/llvm/llvm-project/issues/80639 >>) people have encouraged me
to approach SG15.
What do you think about this issue? I'm curious about use cases for the
alternative packaging approach. Certainly there would need to be another key
like "library-contains-interface-object-files" in the metadata file so that
users know what kind of packaging approach was used. But I hope there can be
convergence on one approach so that kind of complexity could be avoided.
-Jan
_______________________________________________
SG15 mailing list
SG15_at_[hidden] <mailto:SG15_at_[hidden]g >
https://lists.isocpp.org/mailman/listinfo.cgi/sg15 <https://lists.isocpp.org/mailman/listinfo.cgi/sg15 >
Received on 2024-02-19 03:22:26