C++ Logo

sg15

Advanced search

Proposal for module metadata format to be used by the std library and others

From: Daniel Ruoso <daniel_at_[hidden]>
Date: Tue, 12 Dec 2023 16:56:32 -0500
As discussed in the meeting on 2023-12-12, I'm putting together a proposal
for the metadata format to be used both when discovering the std modules as
well as for pre-built libraries in general.

There was general agreement that while the std library always needs special
cases, we will need some metadata format to describe the std modules, and
it would be unfortunate if the format used for the std library was
different than that used in other scenarios.

There was some desire to have this be a starting point for a wider metadata
format to cover other aspects beyond modules, so this proposal will be
split in two. First covering modules, and later offering an option on how
that information could be embedded into a wider format.

Before jumping into the format itself, here's the list of requirements that
guided the design:

 * Module discovery should be subordinate to ABI decisions having been made
already. The outcome is that we don't expect the module metadata to be used
to discover which ABI settings are available for a given standard library
or how to choose between them. The way that this manifests is that,
following the lead from P2577R2, we expect the libraries to ship those
metadata files on a one-to-one mapping with the library binary used as
input to the linker.

 * The path to the module source files should be discovered via the
meatadata file. The outcome is that we don't expect a mechanism to search
for those module files in a directory, nor a specific standard on how they
should be named. but rather the build system will be given that information
directly. This also allows the choice of the standard library splitting its
implementation into interface partitions for ease of maintainability.

 * The compiler should offer a mechanism to introspect the path to the
metadata file of the standard library, with the settings that the compiler
will be used. That means build systems don't need to perform the lookup of
the metadata.
The format should be extensible to cover vendor-specific settings.
The format should be versioned to allow future backward-compatible changes.

 * The format should allow indicating when the intention is to build the
std module, since the compiler should reject a module accidentally named in
a way that conflicts with the std module.

Requirements that are likely to become important in the future, but that
are not otherwise included in this proposal:

 * Pre-built libraries, including the std library, may want to advertise
pre-built BMIs to be reused by the build system. This requires more
convergence on the mechanisms to identify BMI compatibility, which have
been discussed in P2581R2, but not yet supported by implementors.

 * Future integration with more general package management facilities.
Although my expectation is that this is a step in that direction, where it
may be possible to either make the data described here embedded there, or
have the path to this metadata referenced instead. Particularly this
proposal is not trying to specify requirements that would allow discovering
when using this would be appropriate or not.

Some a-priori decisions that are assumed in this format:

 * JSON: while there are many competing alternatives for the serialization
format for this metadata, various other parts of the tooling ecosystem
(such as compile_commands.json and the dependency scanning output) are
already using this format, therefore I chose to just stick to it, rather
than consider introducing a new serialization format.

 * Relative file paths: Any non-absolute path described in this file will
be presumed to have the directory where the metadata file was found as the
base for the lookup.

Since the goal of this proposal is to evaluate specific usage, I'll will
prioritize describing the file with examples, rather than writing a JSON
schema for it. The final design should still be encoded that way, but I
feel the format of json schema would make this conversation harder to
maintain.

# Envelope

The first thing I want to address is the envelope that will contain the
module-specific metadata. For that there are two options:

## Module-specific file

If we decide to go with a file that refers only to module metadata, the
envelope could look like:

{
   "version": 1,
   "revision": 0,
   "modules": []
}

## Wider "library manifest" file

If we decide we should go with a wider format for future extensibility, the
envelope could look like:

{
   "version": 1,
   "revision": 0,
   "c++": {
      "modules": []
   }
}

# The modules value

In both possible envelopes, module metadata is represented as an array of
objects, where all the importable module units provided by this library
that may be reachable by a consumer in this library will be described.

While we currently don't expect the std module to depend on any module not
provided by the std library itself at this point, there are already
situations where the std library has external dependencies (e.g.: tbb for
libstdc++). This is an area that needs further exploration in the future,
and it may be the case that the compiler may need to report several module
metadata files, rather than just one.

## Describing a module

The following keys and values are expected in the object in the modules
array:

 * logical-name (mandatory): This includes the name of the module being
provided, the same semantics of P1689 applies.

 * is-interface (optional, default to true): This describes whether this
contributes to the external interface of the module, the same semantics of
P1689 applies.

 * source-path (mandatory): The path to the source code of the importable
unit. If expressed as a relative path, lookup is done from the directory
where the module metadata was found.

 * is-std-library (optional, default to false): Indicates that the module
is allowed to use names that are reserved to the standard library.

 * local-arguments (optional), an object describing arguments that should
be applied for translating this particular importable unit, but that
doesn't need to be in the compilation of the translation unit importing
this module:

 ** include-directories (optional): an array of paths that need to be
appended to the compilation include search path, same semantics as
appending -I in gcc and clang.

 ** system-include-directories (optional): an array of paths that need to
be appended to the compilation include path as system locations, same
semantics as appending -isystem in gcc and clang.

 ** definitions: an array of objects to be appended in order.

 *** name (mandatory): the name of the definition to be used

 *** value (optional): the value to be set. If missing, it is the
equivalent of -DFOO in gcc and clang.

 *** undef (optional, defaults to false, incompatible with value): The
equivalent of -UFOO in gcc and clang.

 *** vendor (optional): extension point for vendor-specific configurations.
This is an object where the key is the name of the vendor, and the value is
implementation-defined.

 * vendor (optional): extension point for vendor-specific information. This
is an object where the key is the name of the vendor and the value is
implementation-defined.

# Example

Here's how I would expect that would look like for a standard library
(assuming the modules file for now), such as libc++:

{
  "version": 1,
  "revision": 1,
  "modules": [
    {
      "logical-name": "std",
      "source-path": "modules/std.cppm",
      "is-standard-library": true
    },
    {
      "logical-name": "std.compat",
      "source-path": "modules/std.compat.cppm"
      "is-std-library": true
    },
    {
      "logical-name": "std:someinterfacepartition",
      "source-path": "modules/std-someinterfacepartition.cppm"
      "is-std-library": true
    }
  ]
}

Note that this specifically doesn't use any of the local arguments, because
I don't really think that's going to be needed for the standard library
case. The only special case is the is-standard-library key, to allow the
build system to know this is not an accidental collision with the reserved
names. We may decide not to settle the local-arguments part of the proposal
now for that reason.

Daniel

Received on 2023-12-12 21:56:45