Date: Fri, 11 Feb 2022 17:12:15 -0500
Hello,
The conversations we've been having over the past few weeks have been
challenging, but I think it has been very good to allow time for us to
mature the ideas.
After the latest round of conversations, something crossed my mind
that I hadn't connected before, so here goes a summary of it. I want
to get a general feeling from you all before I sit down to write a
proper paper.
When I started the original proposal, my main constraint was that I
couldn't find a way to build on top of the build systems and package
managers for distributing a C++ module library because there's not
enough convergence there.
The thing that only now occurred to me is that there is one thing
where there is a convergence, even if just a very thin one, mostly
surrounded by implementation-defined semantics, and that is "the link
line".
What I mean by this is that pretty much any
build-system+package-manager combo currently needs to find some
solution to assembling the link line when consuming a library that was
shipped as a prebuilt artifact.
So, even if we don't have convergence on how that "link line" is
assembled, we do have convergence that it needs to be.
The breakthrough I had was to understand that it would be fine to
build the convention for module libraries entirely on top of the
implementation-defined bits.
Ok, enough preamble, here's the actual idea:
>From the publishing side
===================
When shipping a C++ library with modules, you will ship a metadata
file alongside the library artifact (e.g.: .a, .so, .dll, etc) where
the implementation-defined library extension is replaced by
".modules".
That file will contain the metadata necessary to find which BMIs were
provided off-the-shelf (with information to match the compatibility
requirements) as well as all the necessary information to produce your
own BMI when needed -- including path to the interface unit, module
dependencies, include directories, and compile definitions.
That metadata will cover all modules provided by that library.
>From the consuming side
===================
The package manager and the build system will use their
implementation-defined methods to assemble the link line.
Given a link line, they can use the same implementation-defined method
used to find the library artifacts (e.g.: .a, .so), and look for a
file that is alongside that artifact with the implementation-defined
extension replaced by ".modules".
In a POSIX system, using pkg-config, it would look something like this:
Step 1, find the link line:
$ pkg-config --libs --static foobar
-L/usr/lib -lfoobar -lbarbaz -lbazqux -lquxqux
Step 2, find the library files
/usr/lib/libfoobar.a
/usr/lib/libbarbaz.so
/usr/lib/libbazqux.a
/ust/lib/libquxqux.so
Step 3, locate .modules files alongside those libraries, if those
exist (non-module libraries wouldn't provide them)
/usr/lib/libfoobar.modules
/usr/lib/libbarbaz.modules
Step 4, read the metadata files and assemble the entire module graph
for the modules external to the build system.
I am not familiar with Windows development, but I am moderately
confident an analogous set of steps can be found.
Interesting side effects
=================
1. It strengthens the relationship between the library artifact and
the parsing of the module interface unit, since the metadata file is
made available alongside the library file, we can be confident that
you're doing the parsing that is consistent with that library
artifact.
2. Because it's a very thin extension to a lot of
implementation-defined bits, it requires very little to no change at
all to the package management side. Even something like pkg-config
would remain fully useful as-is in this case.
3. In the case where libraries contain lots of modules, it could
actually represent a performance gain when compared to finding each
module independently, since you would read the information about all
the modules in the library in one go.
4. Since it's limited to the things in the link line, this cost is not
affected by the number of libraries available in the system.
Important caveats
==============
1. Libraries that want to avoid shipping object code (i.e.: the module
equivalent to header-only libraries) will still need to ship an
archive for them to be discoverable.
2. If you have a shared object that encapsulates the link of other
libraries, your modules will need to encapsulate modules provided by
those. E.g.: if libbarbaz.so.15 is in the NEED section of your shared
object, and you don't want folks to have -lbarbaz in their linkline,
you need to make sure your module interface units don't import the
modules from barbaz.
Other considerations
================
Could we just put it inside of the library archive? Yes. But that
would not be necessarily true for shared objects, and furthermore it
would cause the build system to do lots of sparse reads on a
potentially very large file. Therefore, I contend that having it as an
additional file is more interesting.
Can it be relocatable? It likely makes sense for it to be specified
that the metadata file can find files relative to itself, e.g.: either
using something like $ORIGIN or just assuming that non-absolute paths
are relative to the metadata file. But as with my other proposal,
package managers may need to replace additional variables at install
time.
Could we just put everything in a zip file? Probably, but again, this
comes at the cost of doing lots of sparse reads on potentially very
large files, so it's likely better to have the library installed
unpacked.
What is the format of that metadata file? It likely needs to be
something similar to P1689R4, although it needs to account for either
not having a BMI or even having multiple BMIs.
Shouldn't the file be c++ specific? Probably not. In fact, IIRC, the
Kitware proposal went through some lengths to make sure we weren't
unnecessarily diverging from Fortran modules, so maybe this could host
information for both C++ and Fortran.
So... what do y'all think?
daniel
The conversations we've been having over the past few weeks have been
challenging, but I think it has been very good to allow time for us to
mature the ideas.
After the latest round of conversations, something crossed my mind
that I hadn't connected before, so here goes a summary of it. I want
to get a general feeling from you all before I sit down to write a
proper paper.
When I started the original proposal, my main constraint was that I
couldn't find a way to build on top of the build systems and package
managers for distributing a C++ module library because there's not
enough convergence there.
The thing that only now occurred to me is that there is one thing
where there is a convergence, even if just a very thin one, mostly
surrounded by implementation-defined semantics, and that is "the link
line".
What I mean by this is that pretty much any
build-system+package-manager combo currently needs to find some
solution to assembling the link line when consuming a library that was
shipped as a prebuilt artifact.
So, even if we don't have convergence on how that "link line" is
assembled, we do have convergence that it needs to be.
The breakthrough I had was to understand that it would be fine to
build the convention for module libraries entirely on top of the
implementation-defined bits.
Ok, enough preamble, here's the actual idea:
>From the publishing side
===================
When shipping a C++ library with modules, you will ship a metadata
file alongside the library artifact (e.g.: .a, .so, .dll, etc) where
the implementation-defined library extension is replaced by
".modules".
That file will contain the metadata necessary to find which BMIs were
provided off-the-shelf (with information to match the compatibility
requirements) as well as all the necessary information to produce your
own BMI when needed -- including path to the interface unit, module
dependencies, include directories, and compile definitions.
That metadata will cover all modules provided by that library.
>From the consuming side
===================
The package manager and the build system will use their
implementation-defined methods to assemble the link line.
Given a link line, they can use the same implementation-defined method
used to find the library artifacts (e.g.: .a, .so), and look for a
file that is alongside that artifact with the implementation-defined
extension replaced by ".modules".
In a POSIX system, using pkg-config, it would look something like this:
Step 1, find the link line:
$ pkg-config --libs --static foobar
-L/usr/lib -lfoobar -lbarbaz -lbazqux -lquxqux
Step 2, find the library files
/usr/lib/libfoobar.a
/usr/lib/libbarbaz.so
/usr/lib/libbazqux.a
/ust/lib/libquxqux.so
Step 3, locate .modules files alongside those libraries, if those
exist (non-module libraries wouldn't provide them)
/usr/lib/libfoobar.modules
/usr/lib/libbarbaz.modules
Step 4, read the metadata files and assemble the entire module graph
for the modules external to the build system.
I am not familiar with Windows development, but I am moderately
confident an analogous set of steps can be found.
Interesting side effects
=================
1. It strengthens the relationship between the library artifact and
the parsing of the module interface unit, since the metadata file is
made available alongside the library file, we can be confident that
you're doing the parsing that is consistent with that library
artifact.
2. Because it's a very thin extension to a lot of
implementation-defined bits, it requires very little to no change at
all to the package management side. Even something like pkg-config
would remain fully useful as-is in this case.
3. In the case where libraries contain lots of modules, it could
actually represent a performance gain when compared to finding each
module independently, since you would read the information about all
the modules in the library in one go.
4. Since it's limited to the things in the link line, this cost is not
affected by the number of libraries available in the system.
Important caveats
==============
1. Libraries that want to avoid shipping object code (i.e.: the module
equivalent to header-only libraries) will still need to ship an
archive for them to be discoverable.
2. If you have a shared object that encapsulates the link of other
libraries, your modules will need to encapsulate modules provided by
those. E.g.: if libbarbaz.so.15 is in the NEED section of your shared
object, and you don't want folks to have -lbarbaz in their linkline,
you need to make sure your module interface units don't import the
modules from barbaz.
Other considerations
================
Could we just put it inside of the library archive? Yes. But that
would not be necessarily true for shared objects, and furthermore it
would cause the build system to do lots of sparse reads on a
potentially very large file. Therefore, I contend that having it as an
additional file is more interesting.
Can it be relocatable? It likely makes sense for it to be specified
that the metadata file can find files relative to itself, e.g.: either
using something like $ORIGIN or just assuming that non-absolute paths
are relative to the metadata file. But as with my other proposal,
package managers may need to replace additional variables at install
time.
Could we just put everything in a zip file? Probably, but again, this
comes at the cost of doing lots of sparse reads on potentially very
large files, so it's likely better to have the library installed
unpacked.
What is the format of that metadata file? It likely needs to be
something similar to P1689R4, although it needs to account for either
not having a BMI or even having multiple BMIs.
Shouldn't the file be c++ specific? Probably not. In fact, IIRC, the
Kitware proposal went through some lengths to make sure we weren't
unnecessarily diverging from Fortran modules, so maybe this could host
information for both C++ and Fortran.
So... what do y'all think?
daniel
Received on 2022-02-11 22:12:27