sg15: Re: [SG15] Scandeps format post-P1689R3 discussion

From: Isabella Muerte <imuerte_at_[hidden]>
Date: Fri, 21 May 2021 10:17:45 -0700

> Which is why I suggested "angled" and "quoted".

I prefer these names, though I'd prefer we stick to kebab-casing if we need to have more than one word attached :P

On Fri, May 21, 2021, at 10:12, Gabriel Dos Reis via SG15 wrote:
> > How about a `lookup-method` with a limited set of values would make
> > sense? `by-name`, `by-local-path` (""), `by-path` (<>)?
>
> I know the build system works primarily on filepaths, but from the source and compiler point-of-view, they may not be "by path". Which is why I suggested "angled" and "quoted".
>
> -- Gaby
>
> -----Original Message-----
> From: SG15 <sg15-bounces_at_[hidden] <mailto:sg15-bounces%40lists.isocpp.org>> On Behalf Of Ben Boeckel via SG15
> Sent: Friday, May 21, 2021 4:02 AM
> To: Olga Arkhipova via SG15 <sg15_at_[hidden] <mailto:sg15%40lists.isocpp.org>>
> Cc: Ben Boeckel <ben.boeckel_at_[hidden] <mailto:ben.boeckel%40kitware.com>>
> Subject: Re: [SG15] Scandeps format post-P1689R3 discussion
>
> On Fri, May 21, 2021 at 01:08:11 +0000, Olga Arkhipova via SG15 wrote:
> > I agree that figuring out all dependencies for various sources is not
> > a trivial task, and it would be great to have some sort of common
> > format which all tools would use.
> > But I believe this should be a different proposal/discussion, as it is
> > a generic problem and not specific to module dependency scanning.
>
> Agreed about it being a different proposal.
>
> > The "scan" json is used to build the dependency graph between the
> > sources. The info about source dependencies (includes, etc) is not
> > used there and can take significant time to read. The outputs info is
> > not needed to create the dependency graph (between the sources)
> > either. The BMI locations will be needed to construct the command
> > lines for the actual (not "scan") build, but this info is already
> > available for each source as a part of the command line, i.e. known to
> > the build system. Also, changing outputs locations would not change
> > the "logical" module dependencies in the source, so by not having
> > outputs in the "scan" json, we can avoid unnecessary scanning when
> > output locations change for one reason or another.
>
> The primary output is (generally) knowable to the build tool (`/Fo:dir`
> is a bit hokey, but the build tool *decided* to write that flag), but
> the discovered outputs are not. These can include .dwo, .dSYM, .pdb, or
> other such files.
>
> CMake certainly has zero idea about which BMI files come from where
> though, but it is something generally decideable by the compiler via
> module mapping mechanisms.
>
> > We are very open to format/field name changes, but we'd like to keep
> > "scan" json as small as possible to minimize its reading/writing time
> > in the perf critical IDE scenarios.
> >
> > We'd also like to ensure that we can distinguish named modules and
> > header units without any additional parsing/guessing, as their
> > handling is significantly different, at least for MSVC.
>
> The `unique-on-source-path` is meant to help there. Distinguishing
> between `<>` and `""` is a bit tougher without overflowing into "what
> the heck is this field for" in other languages.
>
> > I'd propose to have different fields for named modules and header
> > units to avoid any confusion there. If we want to use the same "scan"
> > format shared between Fortran and C++, it should be a union of
> > language capabilities, not an intersection.
> >
> > The Fortran (or other languages) can simply use only a subset of the
> > properties that make sense to them (same for C++).
> >
> > With Ben's proposal to remove `inputs`, `outputs`, and `depends`
> > arrays from the "scan" json (thanks Ben!) the formats are pretty close
> > from the info perspective. Currently proposed CMake format with the
> > removals looks like the following (please correct me if I am wrong
> > here):
>
> There was further discussion and we had come up with these changes as
> well:
>
> - `primary-output` -> optional in the format, but if the build tool
> mentions it, it should be put here. This is analogous to the `-MT`
> flag and is likely used to fit the output into existing
> infrastructure.
> - `work-directory` -> optional in the format, but should be
> controllable by the caller (a build tool that doesn't understand its
> work directory structure is very confused). This is probably needed
> for Intellisense, but is not strictly required by build tools
> themselves. If needed, the field can be added by the build tool.
> Note that this is because `getcwd()` is problematic because it
> doesn't understand `$PWD` with symlinks and will resolve them which
> can lead to mismatches if the build tool is using a `$PWD` name for
> a directory rather than the `realpath()`.[1]
>
> > We were discussing internally and were going add an analog of
> > "logical-name": "<header>" for header units which can potentially aid
> > in some scenarios, so no disagreement for having it (or probably two
> > different properties for "" and <>).
> >
> > I argue that obj and bmi output locations are already known to the
> > build system as part of the source command line and are irrelevant to
> > the "scan" compiler invocation which actually produces the json file
> > (as the only output of that invocation).
>
> Discovered outputs are very important for Fortran where the compiler is
> the only thing that gets to determine the output names. The build tool
> can only control the *directory* they get written to, but the filenames
> themselves are under compiler control.
>
> > So If we agree to not have obj and bmi output locations in the "scan"
> > json or at least make them optional, we should be able to reconcile
> > the formats.
>
> We can make the field optional, but scanners should have ways of writing
> them out if asked (this includes things like split debug files or other
> such things that the build tool does *not* generally know about).
>
> > I'm not too keen on the use of `<>` and `""` in the format itself
> > either. It would be fine if we kept "boost/version.hpp" as not having
> > to escape quotes. I'd be ok with the use of some optional boolean key
> > to indicate that it was written as `<boost/version.hpp>`, but we
> > shouldn't have to worry about properly escaping or roundtripping
> > includes between JSON and... literally anything else.
>
> How about a `lookup-method` with a limited set of values would make
> sense? `by-name`, `by-local-path` (""), `by-path` (<>)?
>
> --Ben
>
> [1] As a note, CMake does this in order to support HPC login nodes where
> `/home/user` is a symlink to `/pool/$randomid/home/user` which differs
> based on the login node in use. In this case, the *symlink* name is the
> stable one and is used instead of `realpath()`.
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15%40lists.isocpp.org>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&data=04%7C01%7Cgdr%40microsoft.com%7Ca15e866ca44e491872f008d91c47e92f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637571917449258962%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0C2ornFeG3pqjvgvdSvWMzKa1FDKHpx%2FBLIpT2dg5oU%3D&reserved=0
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15%40lists.isocpp.org>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>

Received on 2021-05-21 12:18:10