Date: Tue, 12 Mar 2019 19:15:20 -0400
Hi,
I've now implemented the format that was discussed the past two weeks in
GCC and CMake (the implementation forgoes all `bikeshed-` prefixes). New
Docker image is here:
https://hub.docker.com/r/benboeckel/cxx-modules-sandbox (tag 20190312.1)
Source code references are hosted here:
https://gitlab.kitware.com/ben.boeckel/cxx-modules-sandbox/tree/docker-20190312.1
What follows is the JSON Schema, notes on interpretation of different
fields, notes on GCC's implementation, and finally notes on CMake's
usage of the format. I'll be working on a patch for Clang to do the same
this week. Changes since the previous schema:
- `readable` field for object-based filepaths has been added.
- `logical` is now a `#filepath` rather than a string. This was
necessary because when I used the format for CMake's Fortran
implementation, there is no logical name and it is just the
filepath.
What the format is not:
- Intended to communicate information outside of a build (portability
is not a consideration). These files should never leave a build
tree.
What I'm looking for:
- Are there any properties which should be added? Note that none seem
necessary given the working implementation I have.
- Are any fields redundant (except the informational `readable`
property)?
- Are there any platforms for which the `#filepath` specification is
especially onerous or unfriendly (given that JSON tells us strings
must be Unicode)?
- Suggestions for a name for the format. I'm currently using "trtbd"
for "Technical Report to-be-determined".
What I'm not looking for (right now):
- Bikeshedding. Feel free to suggest names for fields or the format
itself, but please refrain from commenting on name suggestions
themselves (in support for or against). I'll collect up all
suggestions and we can discuss them in the future once we have the
semantics nailed down.
==================== 8< ====================
{
"$schema": "",
"$id": "http://example.com/root.json",
"type": "object",
"title": "SG15 TR depformat",
"definitions": {
"filepath": {
"$id": "#filepath",
"type": [
"object",
"string"
],
"description": "A filepath. Strings must be valid UTF-8. All other encodings should use raw data objects.",
"minLength": 1,
"required": [
"bikeshed-format",
"bikeshed-data"
],
"properties": {
"bikeshed-format": {
"$id": "#format",
"enum": ["bikeshed-raw8", "bikeshed-raw16"],
"description": "Interpretation of the raw data bytes"
},
"bikeshed-data": {
"$id": "#data",
"type": "array",
"description": "Raw filepath bytes",
"minItems": 1,
"items": {
"type": "integer",
"minimum": 1
}
},
"bikeshed-readable": {
"$id": "#readable",
"type": "string",
"description": "Readable version of the filename (purely for human consumption)"
"minLength": 1,
}
}
},
"depinfo": {
"$id": "#depinfo",
"type": "object",
"description": "Dependency information for a source file",
"required": [
"bikeshed-input"
],
"properties": {
"bikeshed-input": {
"$ref": "#/definitions/filepath"
},
"bikeshed-outputs": {
"$id": "#outputs",
"type": "array",
"description": "Files output by this execution",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-depends": {
"$id": "#depends",
"type": "array",
"description": "Paths read during this execution",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-future-compile": {
"$ref": "#/definitions/future-depinfo"
},
"bikeshed-future-link": {
"$ref": "#/definitions/future-depinfo"
}
}
},
"future-depinfo": {
"$id": "#future-depinfo",
"type": "object",
"bikeshed-outputs": {
"$id": "#outputs",
"type": "array",
"description": "Files output by a future rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-provides": {
"$id": "#provides",
"type": "array",
"description": "Modules provided by a future compile rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/module-desc"
}
},
"bikeshed-requires": {
"$id": "#requires",
"type": "array",
"description": "Modules required by a future compile rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/module-desc"
}
}
},
"module-desc": {
"$id": "#module-desc",
"type": "object",
"required": [
"bikeshed-logical"
],
"properties": {
"bikeshed-filepath": {
"$ref": "#/definitions/filepath"
},
"bikeshed-logical": {
"$ref": "#/definitions/filepath"
}
}
}
},
"required": [
"version",
"bikeshed-sources"
],
"properties": {
"version": {
"$id": "#version",
"type": "integer",
"description": "The version of the output specification"
},
"revision": {
"$id": "#revision",
"type": "integer",
"description": "The revision of the output specification",
"default": 0
},
"bikeshed-sources": {
"$id": "#sources",
"type": "array",
"title": "sources",
"minItems": 1,
"items": {
"$ref": "#/definitions/depinfo"
}
}
}
}
==================== >8 ====================
Notes on the format itself:
- Property names can be bikeshedded later. I've marked them all with a
`bikeshed-` prefix.
- Across all `bikeshed-sources`, `bikeshed-input` values must be
unique.
- All fields with `_` with a prefix must be ignored by
implementations.
- `${version}.${revision}` follows Semantic Version logic.
- Notes on `#filepath`:
* Relative paths are to be interpreted as being relative to the
working directory of the producer's command. [ Note: -- Since
build systems generate the command and the build tool's input
file, they should know this when translating from this file back
into the build tool's mechanisms. ]
* as a string:
- valid utf-8 string.
- may contain URL-encoded escape sequences (`%xx`) for embedded
non-utf-8 bytes. All such bytes are embedded as a single byte in
the resulting filepath. [ Note: -- The goal here is to make it
possible for a filepath which is mostly utf-8 to still use a
string. On Windows, filepaths with surrogate halves are out of
luck since they'd need `raw16` which is not supported here. ]
- has a unique, unambiguous, decoding to the system platform's
encoding for the actual filepath to use. [ Note: -- For example,
on Windows, the almost-utf-16 encoding expected by
`CreateFileW`; z/OS would use EBCDIC; macOS would expect a
normalized Unicode sequence. ]
* as an object:
- format of `raw8` indicates that `data` contains 8-bit integers.
- format of `raw16` indicates that `data` contains 16-bit integers.
- the `readable` field is informational and not to be interpreted.
- after an array of the integer values is decoded, the path may be
passed to platform APIs as-is.
Notes on GCC's implementation:
- There are 3 new flags:
* `-fdep-file=` specifies where to write this information
* `-fdep-format=` specifies the format of the information
* `-fdep-output=` specifies the output the compilation rule will
create (analogous to `-MT`)
- Setting `-fdep-format=` forces the `-MF` output to be silent about
module information. This is because `ninja` doesn't know how to
interpret the format of the additional information specified in the
file (since it uses much more Makefile syntax).
- Future work (in order of decreasing priority):
* Add `-fdep-scan` which suppresses preprocessor output completely.
* This mode also allows for smarter preprocessing logic to avoid
expanding any macros which cannot specify additional
dependency-relevant information.
* Get `gfortran` to do this as well.
* Hook up `-gsplit-dwarf` and similar flags to add to this
information.
Notes on CMake's usage:
- Only supports a single `sources` entry right now. Probably not that
hard to extend, but given that there's no multi-file scanner, it
would be untested anyways.
- The first output listed in each `/sources/0/future-compile/outputs`
is the "primary" output of the compilation rule (this will typically
be the object file for the TU) and is used to hook up things
internally. Ideally CMake would pass more information to its
internal collator, but that requires some refactoring that I've put
off for now. This is also used as the base for the `.modmap` file
used for that information.
- There is sidecar information passed along for some things the
compiler is not going to understand but is necessary for the
collator to do its work:
* output path for module files
* format to use for generated modmap files
* source and build directory information for the containing target
(used for generating ninja-compatible paths)
* some extra bits for Fortran that are irrelevant to C++, but still
present in principle
Example command line for scanning:
/home/boeckb/misc/root/gcc-modules/bin/c++ \
-std=gnu++2a \
-E ../simple/import.cpp \
-MT simple/CMakeFiles/simple.dir/import.cpp-pp.cpp \
-MD \
-MF simple/CMakeFiles/simple.dir/import.cpp.o.pp.d \
-fmodules-ts \
-fdep-file=simple/CMakeFiles/simple.dir/import.cpp.o.ddi \
-fdep-output=simple/CMakeFiles/simple.dir/import.cpp.o \
-fdep-format=trtbd \
-o simple/CMakeFiles/simple.dir/import.cpp-pp.cpp
The fact that the `-pp.cpp` file is the output in the generated
`build.ninja` causes the `-o` and `-MT` bits to show up. Better would be
to use `-MT simple/CMakeFiles/simple.dir/import.cpp.o.ddi -fdep-scan`
and not have a `-o` flag at all. It'd be great if `ninja` would have
`deps = trtbd` support (it would likely have to ignore all `future-*`
fields since paths almost certainly need munged to be properly
interpreted by `ninja`).
Here is the `trtbd` output for this command (reformatted using `jq`; GCC
doesn't output JSON that looks this good ;) ):
==================== 8< ====================
{
"sources": [
{
"input": "../simple/import.cpp",
"outputs": [
"simple/CMakeFiles/simple.dir/import.cpp-pp.cpp"
],
"future-compile": {
"outputs": [
"simple/CMakeFiles/simple.dir/import.cpp.o",
"I.gcm"
],
"provides": [
{
"filepath": "I.gcm",
"logical": "I"
}
],
"requires": [
{
"logical": "M"
}
]
},
"depends": [
"../simple/import.cpp",
"/usr/include/stdc-predef.h"
]
}
],
"version": 0,
"revision": 0
}
==================== >8 ====================
Example command line for compiling:
/home/boeckb/misc/root/gcc-modules/bin/c++ \
-std=gnu++2a \
-MD \
-MT simple/CMakeFiles/simple.dir/import.cpp.o \
-MF simple/CMakeFiles/simple.dir/import.cpp.o.d \
-fmodules-ts \
-fmodule-mapper=simple/CMakeFiles/simple.dir/import.cpp.o.modmap \
-fdep-format=trtbd \
-o simple/CMakeFiles/simple.dir/import.cpp.o \
-c ../simple/import.cpp
The `-fdep-format=trtbd` is here purely to suppress the module
details from the `-MF` output so that `ninja` is happy with it. The
information goes nowhere with these flags. Ideally, a better flag would
exist to do that (is `-MS` available?).
Thanks,
--Ben
I've now implemented the format that was discussed the past two weeks in
GCC and CMake (the implementation forgoes all `bikeshed-` prefixes). New
Docker image is here:
https://hub.docker.com/r/benboeckel/cxx-modules-sandbox (tag 20190312.1)
Source code references are hosted here:
https://gitlab.kitware.com/ben.boeckel/cxx-modules-sandbox/tree/docker-20190312.1
What follows is the JSON Schema, notes on interpretation of different
fields, notes on GCC's implementation, and finally notes on CMake's
usage of the format. I'll be working on a patch for Clang to do the same
this week. Changes since the previous schema:
- `readable` field for object-based filepaths has been added.
- `logical` is now a `#filepath` rather than a string. This was
necessary because when I used the format for CMake's Fortran
implementation, there is no logical name and it is just the
filepath.
What the format is not:
- Intended to communicate information outside of a build (portability
is not a consideration). These files should never leave a build
tree.
What I'm looking for:
- Are there any properties which should be added? Note that none seem
necessary given the working implementation I have.
- Are any fields redundant (except the informational `readable`
property)?
- Are there any platforms for which the `#filepath` specification is
especially onerous or unfriendly (given that JSON tells us strings
must be Unicode)?
- Suggestions for a name for the format. I'm currently using "trtbd"
for "Technical Report to-be-determined".
What I'm not looking for (right now):
- Bikeshedding. Feel free to suggest names for fields or the format
itself, but please refrain from commenting on name suggestions
themselves (in support for or against). I'll collect up all
suggestions and we can discuss them in the future once we have the
semantics nailed down.
==================== 8< ====================
{
"$schema": "",
"$id": "http://example.com/root.json",
"type": "object",
"title": "SG15 TR depformat",
"definitions": {
"filepath": {
"$id": "#filepath",
"type": [
"object",
"string"
],
"description": "A filepath. Strings must be valid UTF-8. All other encodings should use raw data objects.",
"minLength": 1,
"required": [
"bikeshed-format",
"bikeshed-data"
],
"properties": {
"bikeshed-format": {
"$id": "#format",
"enum": ["bikeshed-raw8", "bikeshed-raw16"],
"description": "Interpretation of the raw data bytes"
},
"bikeshed-data": {
"$id": "#data",
"type": "array",
"description": "Raw filepath bytes",
"minItems": 1,
"items": {
"type": "integer",
"minimum": 1
}
},
"bikeshed-readable": {
"$id": "#readable",
"type": "string",
"description": "Readable version of the filename (purely for human consumption)"
"minLength": 1,
}
}
},
"depinfo": {
"$id": "#depinfo",
"type": "object",
"description": "Dependency information for a source file",
"required": [
"bikeshed-input"
],
"properties": {
"bikeshed-input": {
"$ref": "#/definitions/filepath"
},
"bikeshed-outputs": {
"$id": "#outputs",
"type": "array",
"description": "Files output by this execution",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-depends": {
"$id": "#depends",
"type": "array",
"description": "Paths read during this execution",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-future-compile": {
"$ref": "#/definitions/future-depinfo"
},
"bikeshed-future-link": {
"$ref": "#/definitions/future-depinfo"
}
}
},
"future-depinfo": {
"$id": "#future-depinfo",
"type": "object",
"bikeshed-outputs": {
"$id": "#outputs",
"type": "array",
"description": "Files output by a future rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/filepath"
}
},
"bikeshed-provides": {
"$id": "#provides",
"type": "array",
"description": "Modules provided by a future compile rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/module-desc"
}
},
"bikeshed-requires": {
"$id": "#requires",
"type": "array",
"description": "Modules required by a future compile rule for this source using the same flags",
"uniqueItems": true,
"items": {
"$ref": "#/definitions/module-desc"
}
}
},
"module-desc": {
"$id": "#module-desc",
"type": "object",
"required": [
"bikeshed-logical"
],
"properties": {
"bikeshed-filepath": {
"$ref": "#/definitions/filepath"
},
"bikeshed-logical": {
"$ref": "#/definitions/filepath"
}
}
}
},
"required": [
"version",
"bikeshed-sources"
],
"properties": {
"version": {
"$id": "#version",
"type": "integer",
"description": "The version of the output specification"
},
"revision": {
"$id": "#revision",
"type": "integer",
"description": "The revision of the output specification",
"default": 0
},
"bikeshed-sources": {
"$id": "#sources",
"type": "array",
"title": "sources",
"minItems": 1,
"items": {
"$ref": "#/definitions/depinfo"
}
}
}
}
==================== >8 ====================
Notes on the format itself:
- Property names can be bikeshedded later. I've marked them all with a
`bikeshed-` prefix.
- Across all `bikeshed-sources`, `bikeshed-input` values must be
unique.
- All fields with `_` with a prefix must be ignored by
implementations.
- `${version}.${revision}` follows Semantic Version logic.
- Notes on `#filepath`:
* Relative paths are to be interpreted as being relative to the
working directory of the producer's command. [ Note: -- Since
build systems generate the command and the build tool's input
file, they should know this when translating from this file back
into the build tool's mechanisms. ]
* as a string:
- valid utf-8 string.
- may contain URL-encoded escape sequences (`%xx`) for embedded
non-utf-8 bytes. All such bytes are embedded as a single byte in
the resulting filepath. [ Note: -- The goal here is to make it
possible for a filepath which is mostly utf-8 to still use a
string. On Windows, filepaths with surrogate halves are out of
luck since they'd need `raw16` which is not supported here. ]
- has a unique, unambiguous, decoding to the system platform's
encoding for the actual filepath to use. [ Note: -- For example,
on Windows, the almost-utf-16 encoding expected by
`CreateFileW`; z/OS would use EBCDIC; macOS would expect a
normalized Unicode sequence. ]
* as an object:
- format of `raw8` indicates that `data` contains 8-bit integers.
- format of `raw16` indicates that `data` contains 16-bit integers.
- the `readable` field is informational and not to be interpreted.
- after an array of the integer values is decoded, the path may be
passed to platform APIs as-is.
Notes on GCC's implementation:
- There are 3 new flags:
* `-fdep-file=` specifies where to write this information
* `-fdep-format=` specifies the format of the information
* `-fdep-output=` specifies the output the compilation rule will
create (analogous to `-MT`)
- Setting `-fdep-format=` forces the `-MF` output to be silent about
module information. This is because `ninja` doesn't know how to
interpret the format of the additional information specified in the
file (since it uses much more Makefile syntax).
- Future work (in order of decreasing priority):
* Add `-fdep-scan` which suppresses preprocessor output completely.
* This mode also allows for smarter preprocessing logic to avoid
expanding any macros which cannot specify additional
dependency-relevant information.
* Get `gfortran` to do this as well.
* Hook up `-gsplit-dwarf` and similar flags to add to this
information.
Notes on CMake's usage:
- Only supports a single `sources` entry right now. Probably not that
hard to extend, but given that there's no multi-file scanner, it
would be untested anyways.
- The first output listed in each `/sources/0/future-compile/outputs`
is the "primary" output of the compilation rule (this will typically
be the object file for the TU) and is used to hook up things
internally. Ideally CMake would pass more information to its
internal collator, but that requires some refactoring that I've put
off for now. This is also used as the base for the `.modmap` file
used for that information.
- There is sidecar information passed along for some things the
compiler is not going to understand but is necessary for the
collator to do its work:
* output path for module files
* format to use for generated modmap files
* source and build directory information for the containing target
(used for generating ninja-compatible paths)
* some extra bits for Fortran that are irrelevant to C++, but still
present in principle
Example command line for scanning:
/home/boeckb/misc/root/gcc-modules/bin/c++ \
-std=gnu++2a \
-E ../simple/import.cpp \
-MT simple/CMakeFiles/simple.dir/import.cpp-pp.cpp \
-MD \
-MF simple/CMakeFiles/simple.dir/import.cpp.o.pp.d \
-fmodules-ts \
-fdep-file=simple/CMakeFiles/simple.dir/import.cpp.o.ddi \
-fdep-output=simple/CMakeFiles/simple.dir/import.cpp.o \
-fdep-format=trtbd \
-o simple/CMakeFiles/simple.dir/import.cpp-pp.cpp
The fact that the `-pp.cpp` file is the output in the generated
`build.ninja` causes the `-o` and `-MT` bits to show up. Better would be
to use `-MT simple/CMakeFiles/simple.dir/import.cpp.o.ddi -fdep-scan`
and not have a `-o` flag at all. It'd be great if `ninja` would have
`deps = trtbd` support (it would likely have to ignore all `future-*`
fields since paths almost certainly need munged to be properly
interpreted by `ninja`).
Here is the `trtbd` output for this command (reformatted using `jq`; GCC
doesn't output JSON that looks this good ;) ):
==================== 8< ====================
{
"sources": [
{
"input": "../simple/import.cpp",
"outputs": [
"simple/CMakeFiles/simple.dir/import.cpp-pp.cpp"
],
"future-compile": {
"outputs": [
"simple/CMakeFiles/simple.dir/import.cpp.o",
"I.gcm"
],
"provides": [
{
"filepath": "I.gcm",
"logical": "I"
}
],
"requires": [
{
"logical": "M"
}
]
},
"depends": [
"../simple/import.cpp",
"/usr/include/stdc-predef.h"
]
}
],
"version": 0,
"revision": 0
}
==================== >8 ====================
Example command line for compiling:
/home/boeckb/misc/root/gcc-modules/bin/c++ \
-std=gnu++2a \
-MD \
-MT simple/CMakeFiles/simple.dir/import.cpp.o \
-MF simple/CMakeFiles/simple.dir/import.cpp.o.d \
-fmodules-ts \
-fmodule-mapper=simple/CMakeFiles/simple.dir/import.cpp.o.modmap \
-fdep-format=trtbd \
-o simple/CMakeFiles/simple.dir/import.cpp.o \
-c ../simple/import.cpp
The `-fdep-format=trtbd` is here purely to suppress the module
details from the `-MF` output so that `ninja` is happy with it. The
information goes nowhere with these flags. Ideally, a better flag would
exist to do that (is `-MS` available?).
Thanks,
--Ben
Received on 2019-03-13 00:15:34