C++ Logo

sg15

Advanced search

Re: [Tooling] [isocpp-modules] Dependency information for module-aware build tools

From: Ben Craig <ben.craig_at_[hidden]>
Date: Tue, 5 Mar 2019 14:26:27 +0000
Here are some of my stock JSON questions. I think these all have pretty easy answers, but I find it worthwhile to ask them just to be sure.

Are compilers allowed to add extra fields to this JSON structure?
What should clients of this JSON structure do when encountering a field that it doesn't recognize?
Do you foresee files of this format being authored and saved by people, or will it always be a process-to-process interchange format?
If a person will be involved in authoring files of this format, how should they write comments?

> -----Original Message-----
> From: Modules <modules-bounces_at_[hidden]> On Behalf Of Ben
> Boeckel via Modules
> Sent: Monday, March 4, 2019 4:58 PM
> To: modules_at_[hidden]; tooling_at_[hidden]
> Cc: Ben Boeckel <ben.boeckel_at_[hidden]>; brad.king_at_[hidden]
> Subject: [EXTERNAL] [isocpp-modules] Dependency information for module-
> aware build tools
>
> Hi,
>
> For CMake support for C++ modules, I've patched GCC so it outputs
> dependency information in a JSON format. Before going too far down this
> road, I'd like to get feedback on the format. This is for the purposes of being
> able to implement D1483R1[1] without requiring build tools to implement a
> C++ parser and instead have the compiler do the "scan" step described
> there.
>
> { //
> "outputs": [ // Files to be output for this
> "source.o" // compilation[2].
> ], //
> "provides": [ // BMI files provided by this
> "I.gcm" // compilation.
> ], //
> "logical-provides": { // Mapping of module names provided
> "I": "I.gcm" // to provided BMI files.
> }, //
> "requires": [ // Modules names required by this
> "M" // compilation.
> ], //
> "depends": [ // Preprocessor dependency files
> "../path/to/source.cpp", // which affect this scan (so it can
> "/usr/include/stdc-predef.h" // be rerun if necessary).
> ], //
> "version": 0, // The file format version.
> "revision": 1 // The file format revision.
> } //
>
> This example output is for a file with the contents:
>
> export module I;
> import M;
>
> export int i() {
> return m();
> }
>
> My existing patch to GCC is currently missing `revision` and uses `version` ==
> 1 (but my CMake patches also don't check the field right now). I'd like to get a
> Clang patch written up in the next few weeks.
>
> Points to note:
>
> - All top-level types are as-is and the key names are never localized:
> * `outputs`: array
> * `provides`: array
> * `logical-provides`: object
> * `requires`: array
> * `depends`: array
> * `version`: int
> * `revision`: int
> - Values are strings if the name or path is valid UTF-8. The keys of
> `logical-provides` must be strings, therefore `requires` must also
> be only strings since these are used as lookup keys to find the
> on-disk file representing the listed `provide` (shouldn't be an
> issue since these are module names).
> - In the case of invalid UTF-8, an object is used with the following
> layout (all data here is literal and not localized):
>
> { //
> "format": "...", // The format of the data.
> "data": [...] // Array of integers interpreted as
> } // the appropriate integer size.
>
> - Relative paths are relative to the working directory of the
> compiler. Build tools may need to rewrite paths for the build tool
> to actually understand them.[3]
> - `version` is bumped if there is any semantic data added (e.g., more
> information which is required to get a correct build), types change,
> etc.
> - `revision` is bumped if additionally helpful, but not semantically
> important, field is added to the format.
>
> Defined formats (I'm fine with bikeshedding these names once the overall
> format has been hammered out):
>
> - "raw8": interpret `data` as an array of uint8_t bytes to be passed
> to platform-specific filesystem APIs as an 8-bit encoding
> - "raw16": interpret `data` as an array of uint16_t bytes to be passed
> to platform-specific filesystem APIs as a 16-bit encoding
> - "raw32": interpret `data` as an array of uint32_t bytes to be passed
> to platform-specific filesystem APIs as a 32-bit encoding
>
> This basically means "check if it is UTF-8, if it is, escape `\` and `"` and output
> that, otherwise indicate the byte size of the data and write it as an integer
> array".
>
> In the future, we can add additional formats as a revision bump.
>
> So,
>
> - Is anything missing from this format?
> - Is there any issue with getting this information from compilers?
> - Are any of the constraints too onerous on compilers?
> - Are there any constraints which should be added to make it even
> easier for build tools to parse/interpret this format?
> - For non-UTF-8 data, do we want to default to `raw8` format without
> one specified? Or should it always be required?
> - Are non-UTF-8 module names valid? Does anyone know what SG16 is
> saying about Unicode identifiers (which I presume would affect
> module names as well)?
>
> Thanks,
>
> --Ben
>
> [1]https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mathstuf.fedorapeople.org_fortran-2Dmodules_fortran-
> 2Dmodules.html&d=DwICAg&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcS
> jixA&r=y8mub81SfUi-
> UCZRX0Vl1g&m=P_WBBxKiLgEhi9clLb5x2DtsoverENeMBseMq7Sb3O4&s=wyl
> AqKTHcQauTu2XemfKf54UAg4YRfDIwWZpfF2Pz18&e=
> [2]Note that some flags to GCC can cause it to output multiple files for a
> compilation step (such as -fsplit-dwarf). It is my hope that such flags can be
> wired up to this facility in the future as I don't think it is done right now.
> [3]Additionally, CMake takes the `.gcm` files and places it elsewhere in the
> tree via GCC's module map files. Object files are also placed via `-o` which is
> otherwise occupied during the scan step at the moment, so their paths may
> also need to be reinterpreted at the build tool level.
> _______________________________________________
> Modules mailing list
> Modules_at_[hidden]
> Subscription: https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__lists.isocpp.org_mailman_listinfo.cgi_modules&d=DwICAg&c=I_0YwoKy
> 7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-
> UCZRX0Vl1g&m=P_WBBxKiLgEhi9clLb5x2DtsoverENeMBseMq7Sb3O4&s=nIcl
> 1LiMFzkNXhiiVYe5H4VwpSPSoFVB5eAJtd9u0H0&e=
> Link to this post: https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__lists.isocpp.org_modules_2019_03_0123.php&d=DwICAg&c=I_0YwoKy7
> z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-
> UCZRX0Vl1g&m=P_WBBxKiLgEhi9clLb5x2DtsoverENeMBseMq7Sb3O4&s=0EB
> _rHdGXvmbNMtyJeYJKVYvn-NyEaY-2YZ2qOF3geM&e=

Received on 2019-03-05 15:26:36