C++ Logo

sg15

Advanced search

[Tooling] Dependency format with module details implementation

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Tue, 12 Mar 2019 19:15:20 -0400
Hi,

I've now implemented the format that was discussed the past two weeks in
GCC and CMake (the implementation forgoes all `bikeshed-` prefixes). New
Docker image is here:

    https://hub.docker.com/r/benboeckel/cxx-modules-sandbox (tag 20190312.1)

Source code references are hosted here:

    https://gitlab.kitware.com/ben.boeckel/cxx-modules-sandbox/tree/docker-20190312.1

What follows is the JSON Schema, notes on interpretation of different
fields, notes on GCC's implementation, and finally notes on CMake's
usage of the format. I'll be working on a patch for Clang to do the same
this week. Changes since the previous schema:

  - `readable` field for object-based filepaths has been added.
  - `logical` is now a `#filepath` rather than a string. This was
    necessary because when I used the format for CMake's Fortran
    implementation, there is no logical name and it is just the
    filepath.

What the format is not:

  - Intended to communicate information outside of a build (portability
    is not a consideration). These files should never leave a build
    tree.

What I'm looking for:

  - Are there any properties which should be added? Note that none seem
    necessary given the working implementation I have.
  - Are any fields redundant (except the informational `readable`
    property)?
  - Are there any platforms for which the `#filepath` specification is
    especially onerous or unfriendly (given that JSON tells us strings
    must be Unicode)?
  - Suggestions for a name for the format. I'm currently using "trtbd"
    for "Technical Report to-be-determined".

What I'm not looking for (right now):

  - Bikeshedding. Feel free to suggest names for fields or the format
    itself, but please refrain from commenting on name suggestions
    themselves (in support for or against). I'll collect up all
    suggestions and we can discuss them in the future once we have the
    semantics nailed down.

==================== 8< ====================
{
  "$schema": "",
  "$id": "http://example.com/root.json",
  "type": "object",
  "title": "SG15 TR depformat",
  "definitions": {
    "filepath": {
      "$id": "#filepath",
      "type": [
        "object",
        "string"
      ],
      "description": "A filepath. Strings must be valid UTF-8. All other encodings should use raw data objects.",
      "minLength": 1,
      "required": [
        "bikeshed-format",
        "bikeshed-data"
      ],
      "properties": {
        "bikeshed-format": {
          "$id": "#format",
          "enum": ["bikeshed-raw8", "bikeshed-raw16"],
          "description": "Interpretation of the raw data bytes"
        },
        "bikeshed-data": {
          "$id": "#data",
          "type": "array",
          "description": "Raw filepath bytes",
          "minItems": 1,
          "items": {
            "type": "integer",
            "minimum": 1
          }
        },
        "bikeshed-readable": {
          "$id": "#readable",
          "type": "string",
          "description": "Readable version of the filename (purely for human consumption)"
          "minLength": 1,
        }
      }
    },
    "depinfo": {
      "$id": "#depinfo",
      "type": "object",
      "description": "Dependency information for a source file",
      "required": [
        "bikeshed-input"
      ],
      "properties": {
        "bikeshed-input": {
          "$ref": "#/definitions/filepath"
        },
        "bikeshed-outputs": {
          "$id": "#outputs",
          "type": "array",
          "description": "Files output by this execution",
          "uniqueItems": true,
          "items": {
            "$ref": "#/definitions/filepath"
          }
        },
        "bikeshed-depends": {
          "$id": "#depends",
          "type": "array",
          "description": "Paths read during this execution",
          "uniqueItems": true,
          "items": {
            "$ref": "#/definitions/filepath"
          }
        },
        "bikeshed-future-compile": {
          "$ref": "#/definitions/future-depinfo"
        },
        "bikeshed-future-link": {
          "$ref": "#/definitions/future-depinfo"
        }
      }
    },
    "future-depinfo": {
      "$id": "#future-depinfo",
      "type": "object",
      "bikeshed-outputs": {
        "$id": "#outputs",
        "type": "array",
        "description": "Files output by a future rule for this source using the same flags",
        "uniqueItems": true,
        "items": {
          "$ref": "#/definitions/filepath"
        }
      },
      "bikeshed-provides": {
        "$id": "#provides",
        "type": "array",
        "description": "Modules provided by a future compile rule for this source using the same flags",
        "uniqueItems": true,
        "items": {
          "$ref": "#/definitions/module-desc"
        }
      },
      "bikeshed-requires": {
        "$id": "#requires",
        "type": "array",
        "description": "Modules required by a future compile rule for this source using the same flags",
        "uniqueItems": true,
        "items": {
          "$ref": "#/definitions/module-desc"
        }
      }
    },
    "module-desc": {
      "$id": "#module-desc",
      "type": "object",
      "required": [
        "bikeshed-logical"
      ],
      "properties": {
        "bikeshed-filepath": {
          "$ref": "#/definitions/filepath"
        },
        "bikeshed-logical": {
          "$ref": "#/definitions/filepath"
        }
      }
    }
  },
  "required": [
    "version",
    "bikeshed-sources"
  ],
  "properties": {
    "version": {
      "$id": "#version",
      "type": "integer",
      "description": "The version of the output specification"
    },
    "revision": {
      "$id": "#revision",
      "type": "integer",
      "description": "The revision of the output specification",
      "default": 0
    },
    "bikeshed-sources": {
      "$id": "#sources",
      "type": "array",
      "title": "sources",
      "minItems": 1,
      "items": {
        "$ref": "#/definitions/depinfo"
      }
    }
  }
}
==================== >8 ====================

Notes on the format itself:

  - Property names can be bikeshedded later. I've marked them all with a
    `bikeshed-` prefix.
  - Across all `bikeshed-sources`, `bikeshed-input` values must be
    unique.
  - All fields with `_` with a prefix must be ignored by
    implementations.
  - `${version}.${revision}` follows Semantic Version logic.
  - Notes on `#filepath`:
    * Relative paths are to be interpreted as being relative to the
      working directory of the producer's command. [ Note: -- Since
      build systems generate the command and the build tool's input
      file, they should know this when translating from this file back
      into the build tool's mechanisms. ]
    * as a string:
      - valid utf-8 string.
      - may contain URL-encoded escape sequences (`%xx`) for embedded
        non-utf-8 bytes. All such bytes are embedded as a single byte in
        the resulting filepath. [ Note: -- The goal here is to make it
        possible for a filepath which is mostly utf-8 to still use a
        string. On Windows, filepaths with surrogate halves are out of
        luck since they'd need `raw16` which is not supported here. ]
      - has a unique, unambiguous, decoding to the system platform's
        encoding for the actual filepath to use. [ Note: -- For example,
        on Windows, the almost-utf-16 encoding expected by
        `CreateFileW`; z/OS would use EBCDIC; macOS would expect a
        normalized Unicode sequence. ]
    * as an object:
      - format of `raw8` indicates that `data` contains 8-bit integers.
      - format of `raw16` indicates that `data` contains 16-bit integers.
      - the `readable` field is informational and not to be interpreted.
      - after an array of the integer values is decoded, the path may be
        passed to platform APIs as-is.

Notes on GCC's implementation:

  - There are 3 new flags:
    * `-fdep-file=` specifies where to write this information
    * `-fdep-format=` specifies the format of the information
    * `-fdep-output=` specifies the output the compilation rule will
      create (analogous to `-MT`)
  - Setting `-fdep-format=` forces the `-MF` output to be silent about
    module information. This is because `ninja` doesn't know how to
    interpret the format of the additional information specified in the
    file (since it uses much more Makefile syntax).
  - Future work (in order of decreasing priority):
    * Add `-fdep-scan` which suppresses preprocessor output completely.
    * This mode also allows for smarter preprocessing logic to avoid
      expanding any macros which cannot specify additional
      dependency-relevant information.
    * Get `gfortran` to do this as well.
    * Hook up `-gsplit-dwarf` and similar flags to add to this
      information.

Notes on CMake's usage:

  - Only supports a single `sources` entry right now. Probably not that
    hard to extend, but given that there's no multi-file scanner, it
    would be untested anyways.
  - The first output listed in each `/sources/0/future-compile/outputs`
    is the "primary" output of the compilation rule (this will typically
    be the object file for the TU) and is used to hook up things
    internally. Ideally CMake would pass more information to its
    internal collator, but that requires some refactoring that I've put
    off for now. This is also used as the base for the `.modmap` file
    used for that information.
  - There is sidecar information passed along for some things the
    compiler is not going to understand but is necessary for the
    collator to do its work:
    * output path for module files
    * format to use for generated modmap files
    * source and build directory information for the containing target
      (used for generating ninja-compatible paths)
    * some extra bits for Fortran that are irrelevant to C++, but still
      present in principle

Example command line for scanning:

    /home/boeckb/misc/root/gcc-modules/bin/c++ \
      -std=gnu++2a \
      -E ../simple/import.cpp \
      -MT simple/CMakeFiles/simple.dir/import.cpp-pp.cpp \
      -MD \
      -MF simple/CMakeFiles/simple.dir/import.cpp.o.pp.d \
      -fmodules-ts \
      -fdep-file=simple/CMakeFiles/simple.dir/import.cpp.o.ddi \
      -fdep-output=simple/CMakeFiles/simple.dir/import.cpp.o \
      -fdep-format=trtbd \
      -o simple/CMakeFiles/simple.dir/import.cpp-pp.cpp

The fact that the `-pp.cpp` file is the output in the generated
`build.ninja` causes the `-o` and `-MT` bits to show up. Better would be
to use `-MT simple/CMakeFiles/simple.dir/import.cpp.o.ddi -fdep-scan`
and not have a `-o` flag at all. It'd be great if `ninja` would have
`deps = trtbd` support (it would likely have to ignore all `future-*`
fields since paths almost certainly need munged to be properly
interpreted by `ninja`).

Here is the `trtbd` output for this command (reformatted using `jq`; GCC
doesn't output JSON that looks this good ;) ):

==================== 8< ====================
{
  "sources": [
    {
      "input": "../simple/import.cpp",
      "outputs": [
        "simple/CMakeFiles/simple.dir/import.cpp-pp.cpp"
      ],
      "future-compile": {
        "outputs": [
          "simple/CMakeFiles/simple.dir/import.cpp.o",
          "I.gcm"
        ],
        "provides": [
          {
            "filepath": "I.gcm",
            "logical": "I"
          }
        ],
        "requires": [
          {
            "logical": "M"
          }
        ]
      },
      "depends": [
        "../simple/import.cpp",
        "/usr/include/stdc-predef.h"
      ]
    }
  ],
  "version": 0,
  "revision": 0
}
==================== >8 ====================

Example command line for compiling:

    /home/boeckb/misc/root/gcc-modules/bin/c++ \
      -std=gnu++2a \
      -MD \
      -MT simple/CMakeFiles/simple.dir/import.cpp.o \
      -MF simple/CMakeFiles/simple.dir/import.cpp.o.d \
      -fmodules-ts \
      -fmodule-mapper=simple/CMakeFiles/simple.dir/import.cpp.o.modmap \
      -fdep-format=trtbd \
      -o simple/CMakeFiles/simple.dir/import.cpp.o \
      -c ../simple/import.cpp

The `-fdep-format=trtbd` is here purely to suppress the module
details from the `-MF` output so that `ninja` is happy with it. The
information goes nowhere with these flags. Ideally, a better flag would
exist to do that (is `-MS` available?).

Thanks,

--Ben

Received on 2019-03-13 00:15:34