C++ Logo

sg15

Advanced search

Re: Proposal for module metadata format to be used by the std library and others

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 13 Dec 2023 13:45:45 -0500
On 12/12/23 4:56 PM, Daniel Ruoso via SG15 wrote:
> As discussed in the meeting on 2023-12-12, I'm putting together a
> proposal for the metadata format to be used both when discovering the
> std modules as well as for pre-built libraries in general.
>
> There was general agreement that while the std library always needs
> special cases, we will need some metadata format to describe the std
> modules, and it would be unfortunate if the format used for the std
> library was different than that used in other scenarios.
>
> There was some desire to have this be a starting point for a wider
> metadata format to cover other aspects beyond modules, so this
> proposal will be split in two. First covering modules, and later
> offering an option on how that information could be embedded into a
> wider format.
>
> Before jumping into the format itself, here's the list of requirements
> that guided the design:
>
> * Module discovery should be subordinate to ABI decisions having been
> made already. The outcome is that we don't expect the module metadata
> to be used to discover which ABI settings are available for a given
> standard library or how to choose between them. The way that this
> manifests is that, following the lead from P2577R2, we expect the
> libraries to ship those metadata files on a one-to-one mapping with
> the library binary used as input to the linker.
I think someone mentioned it yesterday, but this will presumably have to
account for multilib libraries in some way.
>
> * The path to the module source files should be discovered via the
> meatadata file. The outcome is that we don't expect a mechanism to
> search for those module files in a directory, nor a specific standard
> on how they should be named. but rather the build system will be given
> that information directly. This also allows the choice of the standard
> library splitting its implementation into interface partitions for
> ease of maintainability.
>
> * The compiler should offer a mechanism to introspect the path to the
> metadata file of the standard library, with the settings that the
> compiler will be used. That means build systems don't need to perform
> the lookup of the metadata.
> The format should be extensible to cover vendor-specific settings.
> The format should be versioned to allow future backward-compatible
> changes.
>
> * The format should allow indicating when the intention is to build
> the std module, since the compiler should reject a module accidentally
> named in a way that conflicts with the std module.
>
> Requirements that are likely to become important in the future, but
> that are not otherwise included in this proposal:
>
> * Pre-built libraries, including the std library, may want to
> advertise pre-built BMIs to be reused by the build system. This
> requires more convergence on the mechanisms to identify BMI
> compatibility, which have been discussed in P2581R2, but not yet
> supported by implementors.
>
> * Future integration with more general package management facilities.
> Although my expectation is that this is a step in that direction,
> where it may be possible to either make the data described here
> embedded there, or have the path to this metadata referenced instead.
> Particularly this proposal is not trying to specify requirements that
> would allow discovering when using this would be appropriate or not.
>
> Some a-priori decisions that are assumed in this format:
>
> * JSON: while there are many competing alternatives for the
> serialization format for this metadata, various other parts of the
> tooling ecosystem (such as compile_commands.json and the dependency
> scanning output) are already using this format, therefore I chose to
> just stick to it, rather than consider introducing a new serialization
> format.
>
> * Relative file paths: Any non-absolute path described in this file
> will be presumed to have the directory where the metadata file was
> found as the base for the lookup.
I think it will be necessary to support paths that are relative to some
other parameterized location to accommodate dependencies on header files
or modules provided by other projects/packages. This would help to
reduce the otherwise common practice of a build system having to
concatenate include paths for essentially every project it knows about
when building any package it knows about.
>
> Since the goal of this proposal is to evaluate specific usage, I'll
> will prioritize describing the file with examples, rather than writing
> a JSON schema for it. The final design should still be encoded that
> way, but I feel the format of json schema would make this conversation
> harder to maintain.
I agree. JSON schema is great ... for later :)
>
> # Envelope
> The first thing I want to address is the envelope that will contain
> the module-specific metadata. For that there are two options:
>
> ## Module-specific file
>
> If we decide to go with a file that refers only to module metadata,
> the envelope could look like:
>
> {
> "version": 1,
> "revision": 0,
> "modules": []
> }
>
> ## Wider "library manifest" file
>
> If we decide we should go with a wider format for future
> extensibility, the envelope could look like:
>
> {
> "version": 1,
> "revision": 0,
> "c++": {
> "modules": []
> }
> }
>
> # The modules value
>
> In both possible envelopes, module metadata is represented as an array
> of objects, where all the importable module units provided by this
> library that may be reachable by a consumer in this library will be
> described.
>
> While we currently don't expect the std module to depend on any module
> not provided by the std library itself at this point, there are
> already situations where the std library has external dependencies
> (e.g.: tbb for libstdc++). This is an area that needs further
> exploration in the future, and it may be the case that the compiler
> may need to report several module metadata files, rather than just one.
>
> ## Describing a module
>
> The following keys and values are expected in the object in the
> modules array:
>
> * logical-name (mandatory): This includes the name of the module
> being provided, the same semantics of P1689 applies.
>
> * is-interface (optional, default to true): This describes whether
> this contributes to the external interface of the module, the same
> semantics of P1689 applies.
Do implementation module units need to be mentioned at all? I'm not
opposed to allowing for them, but if I'm following correctly, since the
metadata file is consumed to satisfy module imports, they don't
contribute to anything that is relevant to import. I would expect
is-interface to always be true.
>
> * source-path (mandatory): The path to the source code of the
> importable unit. If expressed as a relative path, lookup is done from
> the directory where the module metadata was found.
>
> * is-std-library (optional, default to false): Indicates that the
> module is allowed to use names that are reserved to the standard library.
>
> * local-arguments (optional), an object describing arguments that
> should be applied for translating this particular importable unit, but
> that doesn't need to be in the compilation of the translation unit
> importing this module:

Since local-arguments (which I presume to mean compiler command line
options) are necessarily implementation specific, I think this should
either be generalized or named such that it reflects an implementation
dependency. Perhaps:

    "local-arguments": [
       { "gcc-compatible": [ "-fconstexpr-depth=512" ] },
       { "cl-compatible": [ "/constexpr:depth512" ] },
       { "circle": [ "--ftemplate-depth=768" ] }
    ]

Tom.

>
> ** include-directories (optional): an array of paths that need to be
> appended to the compilation include search path, same semantics as
> appending -I in gcc and clang.
>
> ** system-include-directories (optional): an array of paths that need
> to be appended to the compilation include path as system locations,
> same semantics as appending -isystem in gcc and clang.
>
> ** definitions: an array of objects to be appended in order.
>
> *** name (mandatory): the name of the definition to be used
>
> *** value (optional): the value to be set. If missing, it is the
> equivalent of -DFOO in gcc and clang.
>
> *** undef (optional, defaults to false, incompatible with value): The
> equivalent of -UFOO in gcc and clang.
>
> *** vendor (optional): extension point for vendor-specific
> configurations. This is an object where the key is the name of the
> vendor, and the value is implementation-defined.
>
> * vendor (optional): extension point for vendor-specific information.
> This is an object where the key is the name of the vendor and the
> value is implementation-defined.
>
> # Example
>
> Here's how I would expect that would look like for a standard library
> (assuming the modules file for now), such as libc++:
>
> {
> "version": 1,
> "revision": 1,
> "modules": [
> {
> "logical-name": "std",
> "source-path": "modules/std.cppm",
> "is-standard-library": true
> },
> {
> "logical-name": "std.compat",
> "source-path": "modules/std.compat.cppm"
> "is-std-library": true
> },
> {
> "logical-name": "std:someinterfacepartition",
> "source-path": "modules/std-someinterfacepartition.cppm"
> "is-std-library": true
> }
> ]
> }
>
> Note that this specifically doesn't use any of the local arguments,
> because I don't really think that's going to be needed for the
> standard library case. The only special case is the
> is-standard-library key, to allow the build system to know this is not
> an accidental collision with the reserved names. We may decide not to
> settle the local-arguments part of the proposal now for that reason.
>
> Daniel
>
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15

Received on 2023-12-13 18:45:48