sg15: Re: [SG15] Scandeps format post-P1689R3 discussion

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Thu, 13 May 2021 08:29:37 -0400

[ Isabella seems to be having posting to the list. Replying back to the
list with her responses untrimmed for full context. ]

On Thu, May 13, 2021 at 03:05:22 -0700, Isabella Muerte wrote:
> On Tue, May 11, 2021, at 12:15, Ben Boeckel via SG15 wrote:
> > Hi,
> >
> > After discussing with some implementers about real-world usage of the r3
> > format. In these discussions the following things have been considered:
> >
> > # Dropping the `inputs`, `outputs`, and `depends` arrays
> >
> > These fields are intended to capture the dependency information of the
> > scanning step itself. The intended replacement is the normal dependency
> > mechanisms offered by compilers (be it `/showIncludes`, `-M`, or other
> > possible mechanisms). These are really only of use to the build tool
> > itself to know when to recompile the file, so putting it into the file
> > format itself is unnecessary.
>
> I'm very much against relying on 'existing' mechanisms as none of them
> are uniform between implementations, nor even between locale
> implementations. I would be fine with having multiple formats, or even
> spinning these arrays out into their own file, but a uniform format
> for these is desperately needed, especially in multi-language
> repositories where C++ code might be wrapped for use in something like
> Node, Ruby, Python, etc. C++ needs to be usable outside of a purely
> C++ environment, and making other languages struggle when they can
> just call `cargo build` or even just `rustc <entrypoint-file>` instead
> is going to be a detriment.

I believe that was part of the original thought of putting it into the
file. However, with tools other than the build not caring about it,
shoving it off into another file seems sensible. I don't know if the
format could be specified though. Ninja developers have rejected other
depfile formats for a long time and just want to read Makefile snippets
(with the obvious `/showIncludes` concession). Maybe this would be
usable enough?

There was a mention of adding a reference to this file in the dependency
format, but we'd need another schema for what this file contains. Build
tools are going to have to translate anyways from this new format to
whatever build tools expect today (basically Makefile snippets) until
build tools which do support this new format actually exist. Hence the
balance between "just use what works today" and "wait for the ecosystem
to agree and consume $new_format".

> > In addition, there are two other reasons that have come up to at least
> > consider changing these keys:
> >
> > - the `depends` array may be large: removing it means that tools which
> > are just looking for `future-compile` information do not need to
> > parse and throw away these fields
> > - it does not match the semantics in batch scanning: what is really
> > wanted here is a top-level `inputs`, `outputs`, and `depends` set of
> > keys because the batch scanning step itself has just one set of
> > this information.
> >
> > Since compilers, build tools, etc. already have mechanisms for
> > communicating this information around, reusing those rather than
> > injecting them into the format is likely a better solution.
> >
> > # Moving `future-compile` keys up one level
> >
> > Now that the other keys here are gone with the prior point
> > (`future-link` was dropped in r1 due to its dubious usefulness[1]),
> > `future-compile`'s keys `outputs`, `provides`, and `requires` keys can
> > be moved up one level.
> >
> > # Clarification of header unit representations
> >
> > C++ has a difference between header units and module units. However,
> > this is a C++-ism and the format would be useful to use in other
> > languages as well (namely Fortran) where such a dichotomy does not
> > exist. As such, formalizing the use of `<>` and `""` wrapping in the
> > `logical-name` key should be noted for use for C++. Due to the multiple
> > ways of spelling various includes (e.g., `"boost/version.hpp"`,
> > `<boost/version.hpp>` or even `<Boost/Version.hpp>` on case-insensitive
> > filesystems), the `source-path` should be used for linking such uses
> > together instead.
> >
> > To do this, the `logical-name` should be the in-source name (for use in
> > module mapper or other such mechanisms for finding the BMI for a given
> > module by its name), `source-path` must be provided for header units.
> > I don't think specifying `<>` and `""` in the format itself is the
> > greatest, so an additional bool key named `unique-on-source-path` (open
> > to bikeshedding) would indicate that the `source-path` should be used
> > for linking between `provide` and `requires` lookups.
> >
> > Thoughts?
>
> I'm not too keen on the use of `<>` and `""` in the format itself
> either. It would be fine if we kept "boost/version.hpp" as not having
> to escape quotes. I'd be ok with the use of some optional boolean key
> to indicate that it was written as `<boost/version.hpp>`, but we
> shouldn't have to worry about properly escaping or roundtripping
> includes between JSON and... literally anything else.

The `logical` name is just the name the compiler will use to ask "where
is the BMI for this module". Build tools generally won't care except
that it matches what some other scan "provides" for that file. If the
compiler is going to resolve the include and *then* ask for the module
name, then `logical` can be the full path. If it's going to use the
literal given to `#include` (as GCC seems to do?), then `logical` should
be that. The `source-path` is used for verification that things actually
are the same when `-I` flags end up giving different in-source names for
the same module.

--Ben

Received on 2021-05-13 07:29:43