C++ Logo

sg15

Advanced search

Re: Proposal for module metadata format to be used by the std library and others

From: Olga Arkhipova <olgaark_at_[hidden]>
Date: Wed, 13 Dec 2023 19:19:46 +0000
>> I think it will be necessary to support paths that are relative to some other parameterized location to accommodate dependencies on header files or modules provided by other projects/packages.

If a library can reference another libraries, this is not needed. All other dependencies (including module dependencies) will be resolved by using referenced library manifest.

>> Do implementation module units need to be mentioned at all? I'm not opposed to allowing for them, but if I'm following correctly, since the metadata file is consumed to satisfy module imports, they don't contribute to anything that is relevant to import. I would expect is-interface to always be true.
Currently, is-interface=false means that this is an internal partition, which has to be built to a BMI and passed as a module reference on the command line to be able to use the primary module in the code.

But it does not look to me that we need "is-interface" metadata as build will scan the sources and get the exact data what this source is.
Actually, I think we should not have "is-interface" as module source metadata at all to avoid any conflicts with scan data.
This kind of data (together with dependencies) would be useful for BMI, but not the source.

Olga

From: SG15 <sg15-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG15
Sent: Wednesday, December 13, 2023 10:46 AM
To: sg15_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>
Subject: Re: [SG15] Proposal for module metadata format to be used by the std library and others

On 12/12/23 4:56 PM, Daniel Ruoso via SG15 wrote:
As discussed in the meeting on 2023-12-12, I'm putting together a proposal for the metadata format to be used both when discovering the std modules as well as for pre-built libraries in general.

There was general agreement that while the std library always needs special cases, we will need some metadata format to describe the std modules, and it would be unfortunate if the format used for the std library was different than that used in other scenarios.

There was some desire to have this be a starting point for a wider metadata format to cover other aspects beyond modules, so this proposal will be split in two. First covering modules, and later offering an option on how that information could be embedded into a wider format.

Before jumping into the format itself, here's the list of requirements that guided the design:

 * Module discovery should be subordinate to ABI decisions having been made already. The outcome is that we don't expect the module metadata to be used to discover which ABI settings are available for a given standard library or how to choose between them. The way that this manifests is that, following the lead from P2577R2, we expect the libraries to ship those metadata files on a one-to-one mapping with the library binary used as input to the linker.
I think someone mentioned it yesterday, but this will presumably have to account for multilib libraries in some way.

 * The path to the module source files should be discovered via the meatadata file. The outcome is that we don't expect a mechanism to search for those module files in a directory, nor a specific standard on how they should be named. but rather the build system will be given that information directly. This also allows the choice of the standard library splitting its implementation into interface partitions for ease of maintainability.

 * The compiler should offer a mechanism to introspect the path to the metadata file of the standard library, with the settings that the compiler will be used. That means build systems don't need to perform the lookup of the metadata.
The format should be extensible to cover vendor-specific settings.
The format should be versioned to allow future backward-compatible changes.

 * The format should allow indicating when the intention is to build the std module, since the compiler should reject a module accidentally named in a way that conflicts with the std module.

Requirements that are likely to become important in the future, but that are not otherwise included in this proposal:

 * Pre-built libraries, including the std library, may want to advertise pre-built BMIs to be reused by the build system. This requires more convergence on the mechanisms to identify BMI compatibility, which have been discussed in P2581R2, but not yet supported by implementors.

 * Future integration with more general package management facilities. Although my expectation is that this is a step in that direction, where it may be possible to either make the data described here embedded there, or have the path to this metadata referenced instead. Particularly this proposal is not trying to specify requirements that would allow discovering when using this would be appropriate or not.

Some a-priori decisions that are assumed in this format:

 * JSON: while there are many competing alternatives for the serialization format for this metadata, various other parts of the tooling ecosystem (such as compile_commands.json and the dependency scanning output) are already using this format, therefore I chose to just stick to it, rather than consider introducing a new serialization format.

 * Relative file paths: Any non-absolute path described in this file will be presumed to have the directory where the metadata file was found as the base for the lookup.
I think it will be necessary to support paths that are relative to some other parameterized location to accommodate dependencies on header files or modules provided by other projects/packages. This would help to reduce the otherwise common practice of a build system having to concatenate include paths for essentially every project it knows about when building any package it knows about.

Since the goal of this proposal is to evaluate specific usage, I'll will prioritize describing the file with examples, rather than writing a JSON schema for it. The final design should still be encoded that way, but I feel the format of json schema would make this conversation harder to maintain.
I agree. JSON schema is great ... for later :)

# Envelope

The first thing I want to address is the envelope that will contain the module-specific metadata. For that there are two options:

## Module-specific file

If we decide to go with a file that refers only to module metadata, the envelope could look like:

{
   "version": 1,
   "revision": 0,
   "modules": []
}

## Wider "library manifest" file

If we decide we should go with a wider format for future extensibility, the envelope could look like:

{
   "version": 1,
   "revision": 0,
   "c++": {
      "modules": []
   }
}

# The modules value

In both possible envelopes, module metadata is represented as an array of objects, where all the importable module units provided by this library that may be reachable by a consumer in this library will be described.

While we currently don't expect the std module to depend on any module not provided by the std library itself at this point, there are already situations where the std library has external dependencies (e.g.: tbb for libstdc++). This is an area that needs further exploration in the future, and it may be the case that the compiler may need to report several module metadata files, rather than just one.

## Describing a module

The following keys and values are expected in the object in the modules array:

 * logical-name (mandatory): This includes the name of the module being provided, the same semantics of P1689 applies.

 * is-interface (optional, default to true): This describes whether this contributes to the external interface of the module, the same semantics of P1689 applies.
Do implementation module units need to be mentioned at all? I'm not opposed to allowing for them, but if I'm following correctly, since the metadata file is consumed to satisfy module imports, they don't contribute to anything that is relevant to import. I would expect is-interface to always be true.

 * source-path (mandatory): The path to the source code of the importable unit. If expressed as a relative path, lookup is done from the directory where the module metadata was found.

 * is-std-library (optional, default to false): Indicates that the module is allowed to use names that are reserved to the standard library.

 * local-arguments (optional), an object describing arguments that should be applied for translating this particular importable unit, but that doesn't need to be in the compilation of the translation unit importing this module:

Since local-arguments (which I presume to mean compiler command line options) are necessarily implementation specific, I think this should either be generalized or named such that it reflects an implementation dependency. Perhaps:

"local-arguments": [
  { "gcc-compatible": [ "-fconstexpr-depth=512" ] },
  { "cl-compatible": [ "/constexpr:depth512" ] },
  { "circle": [ "--ftemplate-depth=768" ] }
]

Tom.

 ** include-directories (optional): an array of paths that need to be appended to the compilation include search path, same semantics as appending -I in gcc and clang.

 ** system-include-directories (optional): an array of paths that need to be appended to the compilation include path as system locations, same semantics as appending -isystem in gcc and clang.

 ** definitions: an array of objects to be appended in order.

 *** name (mandatory): the name of the definition to be used

 *** value (optional): the value to be set. If missing, it is the equivalent of -DFOO in gcc and clang.

 *** undef (optional, defaults to false, incompatible with value): The equivalent of -UFOO in gcc and clang.

 *** vendor (optional): extension point for vendor-specific configurations. This is an object where the key is the name of the vendor, and the value is implementation-defined.

 * vendor (optional): extension point for vendor-specific information. This is an object where the key is the name of the vendor and the value is implementation-defined.

# Example

Here's how I would expect that would look like for a standard library (assuming the modules file for now), such as libc++:

{
  "version": 1,
  "revision": 1,
  "modules": [
    {
      "logical-name": "std",
      "source-path": "modules/std.cppm",
      "is-standard-library": true
    },
    {
      "logical-name": "std.compat",
      "source-path": "modules/std.compat.cppm"
      "is-std-library": true
    },
    {
      "logical-name": "std:someinterfacepartition",
      "source-path": "modules/std-someinterfacepartition.cppm"
      "is-std-library": true
    }
  ]
}

Note that this specifically doesn't use any of the local arguments, because I don't really think that's going to be needed for the standard library case. The only special case is the is-standard-library key, to allow the build system to know this is not an accidental collision with the reserved names. We may decide not to settle the local-arguments part of the proposal now for that reason.

Daniel



_______________________________________________

SG15 mailing list

SG15_at_[hidden]<mailto:SG15_at_[hidden]>

https://lists.isocpp.org/mailman/listinfo.cgi/sg15

Received on 2023-12-13 19:19:50