ISOCPP SG15 List: Re: Defining Roles of Tools in Dependency Management

From: Bret Brown <mail_at_[hidden]>
Date: Sat, 23 Mar 2024 16:38:19 -0400

Rui and everyone else:

Jake has his facts about the status quo correct. My goal is to iterate on
the status quo, providing a smaller and more portable surface area for
various tools to support. I'm not alone. There are quite a few others
iterating on this goal, including maintainers of CMake, Meson, Conan,
vcpkg, and some other things.

What do I mean by that? There is a specification for a JSON file format to
eventually replace both pkg-config's *.pc files and CMake's *.cmake files
called CPS. Bill Hoffman and I presented on our plan to pursue that
technology this fall at CppCon [link
<https://www.youtube.com/watch?v=IwuBZpLUq8Q>]. Daniel Ruoso and I have
been talking about packaging and how to converge on it for a few years now,
both in SG-15 and in various talks at C++Now and CppCon, and we feel this
is the best available path forward.

CPS already has a specification that is being iterated on [link
<https://cps-org.github.io/cps/>]. That specification has details about
what to name CPS files (${name}.cps) and how to find them (predefined
search paths that can be overridden with environment variables). We have
also started a tool to eventually replace pkg-config itself called
cps-config [link <https://github.com/cps-org/cps-config>] that will
implement relevant portions of CPS. And work in CMake is beginning to add
support as well. Ideally as time goes on, use of non-portable mechanisms
like pkg-config files and CMake-language export files will decrease.

As to this specific thread, I added a discussion to the Tokyo SG-15 meeting
about dependency management because in the process of defining CPS and
cps-config, it is becoming important to be clear about what the CPS JSON
metadata files are for and what they are not for. That will give us better
clarity about which information is essential to support and which
information is important but is not a core goal for CPS. If someone wants
to put github download URLs in the CPS spec, is that out of scope? I think
yes. What about build parallelism settings? I think that's out of scope
too. What about distinguishing between debug and release builds? Well, that
seems essential to understand dependencies properly, so it would need to be
in scope.

My current thinking, which is still evolving and open to suggestion, is
that we should define three primary actions that are taken by build systems
and packaging systems, including systems that do at least some of both
building and packaging.

   - Providing dependencies
   - Resolving dependencies
   - Building (using dependencies)

I'll break these down a little now for illustration. But first, it's worth
noting that configuration of the build is something that will always
require some authored configuration choices. That is, a combination of
project configuration (like CMakeLists.txt or msbuild configuration) and
workflow configuration (e.g., instrument with thread sanitizer) will inform
the behavior of each of the actions. But we do want to clarify and separate
these actions from one another so that engineers can use and maintain
projects that are correct, simple, and portable.

*Providing dependencies* is the process of creating a build environment.
Dependency providers think in terms of tarballs, zip files, moving files
around on disk, and metadata like version numbers. If you ask a dependency
provider about googletest, it will think of a libgtest-dev package and what
URLs or hosting services might provide the versions of googletest that are
available.

*Resolving dependencies* is the process of discovering provided
dependencies and creating a coherent model of them. This model will
logically be a directed graph containing projects that rely on one another.
If you ask a dependency resolver about googletest, it will think of a
logical library named "gtest", perhaps. This is what build systems like
Meson and CMake model. A specific build system might use nouns like
"subproject" or "dependency", but it's still the same idea. Providing a
more portable way to describe needs at this level is the primary concern of
CPS files.

*Building* is querying a dependency solution (or dependency model or
dependency graph, perhaps?) to create a build graph of input files, build
commands, intermediate files, and output files. If you ask a build system
about googletest, it will think of a libgtest.a and the headers that come
with it, and it will think of specific flags needed in specific commands to
use googletest.

Note that all three actions can include some verification of the work of
other tools. Build systems currently complain (indirectly through
compilers) if required header files were never installed. LInkers complain
if some libraries were not suitable for a position independent executable.
But also note that it's not generally appropriate for tools to step outside
of their roles. For instance, I wouldn't expect my compiler to go download
a missing file to repair my build.

Hopefully you can see the kind of distinction of responsibilities and
metadata I need to start making. I'm communicating to you all to ensure we
end up with a plan that we can all support.

Bret

On Sat, Mar 23, 2024 at 3:28 PM Jake Arkinstall via SG15 <
sg15_at_[hidden]> wrote:

> My take as a nix user.
>
> On Thu, 21 Mar 2024, 02:08 Ruki Wang via SG15, <sg15_at_[hidden]>
> wrote:
>
>> Hi, all
>>
>> As far as I know, there are currently a number of ways to find libraries
>> from the system and package manager:
>>
>> 1. cmake find_package and its .cmake import file
>>
>
> IIRC, find_package traverses the CMAKE_MODULES_PATH environment variable.
>
> 2. pkgconfig and .pc files
>>
>
> IIRC, pkgconfig traverses the PKG_CONFIG_PATH environment variable.
>
> 3. directly from the system library path, e.g. /usr/lib ...
>>
>
> More generally, traversing LD_LIBRARY_PATH (otherwise nothing would work
> on nixos, as it has no /usr/lib - everything is done through, you named it,
> environment variables)
>
> If we just specify a uniform package metadata format to get this library
>> information. Then where should this metadata file be stored?
>>
>
> Where ever the package manager wants to store it. Some may want to use a
> system directory, some may want to use a directory within a managed root
> path (e.g. bazel), some may want to micromanage each dependency and pass a
> flood of information to the build system (e.g. nix).
>
> The one thing I wouldn't be happy with is if it is a requirement to
> provide this metadata file inside of the link directory, for instance. It
> should be an option, but I'd want such information to be stored entirely
> separate from the library's shared object directory. This allows existing
> shared object directories to be left untouched, which makes their hashes
> unchanged. If we were to expect a new file to be added to any library we
> wanted to import, I would have to rebuild my entire system as most of
> nixpkgs would be invalidated (the output directory hashes would change).
>
> It could be possible to avoid a full rebuild by wrapping the existing
> library, copying/symlinking files over, then adding a new file on top, but
> that would be quite painful. It would be far less cumbersome to add (into
> nixpkgs) an additional, separate output for libraries.
>
> How should the build system find it?
>>
>
> Environment variables are the only way I can think of that would work for
> me. And there's strong precedent for using them for this purpose
>
> I think this would suffice for the Unix family of systems. I don't know
> about Windows - I know that environment handling is different enough in
> Windows for nix to be a no-go there. That's all I know, though.
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>

Received on 2024-03-23 20:38:33