sg15: Re: [Tooling] Modules feedback

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 13 Feb 2019 12:26:05 -0500

On 2/11/19 3:30 PM, Matthew Woehlke wrote:
> On 11/02/2019 14.05, Ben Boeckel wrote:
>> On Mon, Feb 11, 2019 at 11:56:23 -0500, Matthew Woehlke wrote:
>>> I think there might be some communication confusion here. "Determining
>>> dependencies" can mean two things: determining the *identity* of
>>> dependencies, or determining the *location* of dependencies. Determining
>>> the *identity* is, indeed, as easy for modules as for includes. It's the
>>> *location* that's a problem. For includes, the compiler already has a
>>> set of include paths which it can scan (relatively quick; this is just
>>> directory listing) to determine location.
>>>
>>> For modules... I'm not sure how this is going to work. This actually
>>> relates to the open question of how we "ship" modules. But ignore that
>>> for a second.
>> In CMake, the identity is output by the `scan` step. `collate` puts this
>> together with output paths to create full paths to locations of modules.
> For project-internal stuff, yes. What about externals? Do those need to
> be scanned also? If so, how?
>
> I think/hope the answer is going to be "no", but while I think we
> generally have consensus how we would *like* that to work, I don't know
> that we have anything that *does* work yet.
>
> I'm not saying that needs to be a "blocker", just that it should be kept
> in mind.
>
>>> Compared to includes, we have more trouble with finding the location of
>>> modules provided by the project. To do that, we have to scan every
>>> source file in the project (or at least, in every library of the project
>>> that the current TU uses, including the TU's own component). This most
>>> assuredly *is* slower. (Unless we use module maps...)
>> The `collate` step handles modules generated by other source files and
>> outputting the information required by the build tool to get the
>> dependency graph correct.
> Right. That's the "scan[ning] every source file in [...] every library
> of the project that the current TU uses". The good news is we don't need
> to do that *per TU* (we can do it "once"¹ and reuse the result), but as
> you originally noted, it's a non-trivial difference compared to what we
> do now.
>
> Mostly I was picking at the claim that "It is no slower to determine the
> dependencies of a modular C++ file than a normal C++ file today".
> Strictly speaking, this is true iff you want the dependency *identities*
> but don't care about their *locations*. For a single TU, getting the
> *locations* is decidedly slower. In the context of a complete build,
> however, it's harder to say, since the extra work can be done in
> parallel and the cost is effectively spread out across multiple TU's.
>
> (¹ Modulo need to recompute when things change, but at worst, once per
> build.)
>
>>> What about modules *external* to the project? If they are shipped
>>> already built, then presumably things aren't much different than
>>> includes; we expect them to just be there already.
>> I think there is consensus that compiled modules do *not* get shipped.
>> Some other format (possibly IPR[1]) would be provided.
> Right... at which point we have a caching problem, but (despite being
> one of the two hard problems in programming) I think we can deal with
> that :-).
>
> Hmm... but, is it then the worst case that the build tool will have to
> find the module interface files for all modules that are needed but do
> *not* come from local sources, and generate build rules for those? (But
> at least that's back to being a directory-search operation and not a
> file-parse operation. At least I *assume*/hope that's the intent of IPR
> or whatever!) Or do we expect this to be the compiler's job?
>
> (In any case, we're straying from IS territory, though it's probably
> still useful to have an idea of the answers for such questions...)
>
>>> Where I could see this getting *really* ugly is if a) we have no module
>>> maps, and b) we have no solution for shipping already-built modules. If,
>>> in order to use the module from some external library, I first need to
>>> *build* that module for my own project... how do I do that? Scan every
>>> source file *on the entire system*? That's not just slower, it's not
>>> even feasible. Fortunately, I think we all want to not have to go there...
>> A module is what you need to consume the interface. You don't need the
>> source itself for that. The IPR would just say what symbols are provided
>> via that module (analogous to headers saying what symbols are available)
>> and the linker would do the same job it has always done and hook them up
>> when linking the providing library.
> Right... the point was more that we *need* such a thing :-).
>
> More specifically, we need *something* which is a) portable, and b)
> ensures that we can get BMI's from external packages. The ways I see to
> do that are:
>
> - Per (my) above, ship sources and *scan* sources to figure out how to
> build modules. I think we all *really don't* want this.
I *really do not* want to require scanning.
>
> - Ship module maps to cut out the scanning step of the above. These
> would be *build* artifacts, i.e. build/packaging produces them.
Yes, ok.
>
> - Per (your) above, ship something that has a 1:1 mapping to modules
> that is a portable representation from which BMI's can be generated. (Or
> which the compiler / tooling can consume directly. Some tooling
> applications may be able to use these directly and never need a BMI.) I
> think most (all?) of us prefer this.

I don't think this is feasible.

Different tools have different needs. Any portable representation will
make trade offs regarding what information is retained from the source
code. IPR for example, doesn't record source location column numbers
for AST nodes (just line numbers), nor does it record macro expansion
information. For some tools, e.g., Coverity, that information is
crucial; our analysis would be compromised if we were limited to what
IPR exposes.

Additionally, implementation defined behavior means that different
implementations might produce representations of the source code that,
while in a portable format, have different contents. For example, the
point of instantiation of templates is implementation defined in some
cases and that can produce observable behavior differences.

Tom.

>
> For the last, we *may* still want/need some sort of "module map" to map
> names to module representation artifacts, but as per the middle option,
> this would be generated, not user-maintained.
>

Received on 2019-02-13 18:26:12