sg15: Re: [Tooling] Modules feedback

From: Matthew Woehlke <mwoehlke.floss_at_[hidden]>
Date: Mon, 11 Feb 2019 15:30:16 -0500

On 11/02/2019 14.05, Ben Boeckel wrote:
> On Mon, Feb 11, 2019 at 11:56:23 -0500, Matthew Woehlke wrote:
>> I think there might be some communication confusion here. "Determining
>> dependencies" can mean two things: determining the *identity* of
>> dependencies, or determining the *location* of dependencies. Determining
>> the *identity* is, indeed, as easy for modules as for includes. It's the
>> *location* that's a problem. For includes, the compiler already has a
>> set of include paths which it can scan (relatively quick; this is just
>> directory listing) to determine location.
>>
>> For modules... I'm not sure how this is going to work. This actually
>> relates to the open question of how we "ship" modules. But ignore that
>> for a second.
>
> In CMake, the identity is output by the `scan` step. `collate` puts this
> together with output paths to create full paths to locations of modules.

For project-internal stuff, yes. What about externals? Do those need to
be scanned also? If so, how?

I think/hope the answer is going to be "no", but while I think we
generally have consensus how we would *like* that to work, I don't know
that we have anything that *does* work yet.

I'm not saying that needs to be a "blocker", just that it should be kept
in mind.

>> Compared to includes, we have more trouble with finding the location of
>> modules provided by the project. To do that, we have to scan every
>> source file in the project (or at least, in every library of the project
>> that the current TU uses, including the TU's own component). This most
>> assuredly *is* slower. (Unless we use module maps...)
>
> The `collate` step handles modules generated by other source files and
> outputting the information required by the build tool to get the
> dependency graph correct.

Right. That's the "scan[ning] every source file in [...] every library
of the project that the current TU uses". The good news is we don't need
to do that *per TU* (we can do it "once"¹ and reuse the result), but as
you originally noted, it's a non-trivial difference compared to what we
do now.

Mostly I was picking at the claim that "It is no slower to determine the
dependencies of a modular C++ file than a normal C++ file today".
Strictly speaking, this is true iff you want the dependency *identities*
but don't care about their *locations*. For a single TU, getting the
*locations* is decidedly slower. In the context of a complete build,
however, it's harder to say, since the extra work can be done in
parallel and the cost is effectively spread out across multiple TU's.

(¹ Modulo need to recompute when things change, but at worst, once per
build.)

>> What about modules *external* to the project? If they are shipped
>> already built, then presumably things aren't much different than
>> includes; we expect them to just be there already.
>
> I think there is consensus that compiled modules do *not* get shipped.
> Some other format (possibly IPR[1]) would be provided.

Right... at which point we have a caching problem, but (despite being
one of the two hard problems in programming) I think we can deal with
that :-).

Hmm... but, is it then the worst case that the build tool will have to
find the module interface files for all modules that are needed but do
*not* come from local sources, and generate build rules for those? (But
at least that's back to being a directory-search operation and not a
file-parse operation. At least I *assume*/hope that's the intent of IPR
or whatever!) Or do we expect this to be the compiler's job?

(In any case, we're straying from IS territory, though it's probably
still useful to have an idea of the answers for such questions...)

>> Where I could see this getting *really* ugly is if a) we have no module
>> maps, and b) we have no solution for shipping already-built modules. If,
>> in order to use the module from some external library, I first need to
>> *build* that module for my own project... how do I do that? Scan every
>> source file *on the entire system*? That's not just slower, it's not
>> even feasible. Fortunately, I think we all want to not have to go there...
>
> A module is what you need to consume the interface. You don't need the
> source itself for that. The IPR would just say what symbols are provided
> via that module (analogous to headers saying what symbols are available)
> and the linker would do the same job it has always done and hook them up
> when linking the providing library.

Right... the point was more that we *need* such a thing :-).

More specifically, we need *something* which is a) portable, and b)
ensures that we can get BMI's from external packages. The ways I see to
do that are:

- Per (my) above, ship sources and *scan* sources to figure out how to
build modules. I think we all *really don't* want this.

- Ship module maps to cut out the scanning step of the above. These
would be *build* artifacts, i.e. build/packaging produces them.

- Per (your) above, ship something that has a 1:1 mapping to modules
that is a portable representation from which BMI's can be generated. (Or
which the compiler / tooling can consume directly. Some tooling
applications may be able to use these directly and never need a BMI.) I
think most (all?) of us prefer this.

For the last, we *may* still want/need some sort of "module map" to map
names to module representation artifacts, but as per the middle option,
this would be generated, not user-maintained.

-- 
Matthew

Received on 2019-02-11 21:30:19