sg15: Re: [Tooling] Modules feedback

From: Scott Wardle <swardle_at_[hidden]>
Date: Sat, 9 Feb 2019 00:01:07 -0800

Nice work JF, I was looking for a paper like this one. Modules are moving faster than before. This summary is very useful.

I have just started to try to catch up a little and made this table that I thought maybe other people would like

Number of papers on modules over time.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/ <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/>
2012 - 2 papers/revisions
2014 - 2 papers/revisions
2015 - 3 papers/revisions
2016 - 12 papers/revisions
2017 - 31 papers/revisions
2018 - 44 papers/revisions
2019 - 6 papers/revisions (so far)

> On Feb 8, 2019, at 9:33 PM, JF Bastien <cxx_at_[hidden]> wrote:
>
> On Fri, Feb 8, 2019 at 1:59 PM Ben Boeckel <ben.boeckel_at_[hidden] <mailto:ben.boeckel_at_[hidden]>> wrote:
> On Fri, Feb 08, 2019 at 13:34:43 -0800, JF Bastien wrote:
> > Let us know if you have any feedback!
>
> §2
> > It is no slower to determine the dependencies of a modular C++ file
> > than a normal C++ file today, and current proposals aimed at
> > restricting the preamble wouldn’t make a scanner any faster.
>
> Header dependencies are subtly, but crucially, different than module
> dependencies. One can determine header dependencies *while compiling*
> the file (though is not necessarily how it is implemented) and can be
> done at any time once the source file is up-to-date. The first run does
> not need this information because we already know we need to run the
> output; storing discovered dependency information is a nice side effect.
>
> Module dependencies must be present *during compile*, so must be
> determined at build time (since source files may not be available before
> the build has started) *before* compilation starts.
>
> Maybe it would help anchor our discussion if we had a graph of "build activity over time" where we quantify parallelism versus sequentialism, and what each operation is doing? Put another way, a step which blocks many others is fine as long as it's pretty fast. How fast would dependency scanning have to be on a project of LLVM's size to make you comfortable?
>
> It might also be useful to separate concerns between clean build and incremental build. The costs won't be the same in both, so we should probably discuss them independently (or rather, in the same paper but not the same paragraph).

I think you are total right that is what I could use here. I would love a diagram that shows what processes or stages are needed for the use cases you are thinking of clean build vs incremental or maybe some others.

Here is what I was thinking of:
- Clean builds vs incremental builds
- Linux processes are cheap vs windows less process more threads
- Multi computer build, what data is pushed over the network what data is pulled over the network.
-Object/Module BMI/Binary/Module Map caching vs no caching.

Maybe even making this more concrete and talk about command lines of some of these use cases:
At least we should talk about:
-does modules change anything with:
-Include paths -I<dir> vs -isystem
        -Object/Library path -L<dir>
-are there overlap with:
        -module BMI paths (-fmodules-cache-path=<directory> vs -fprebuilt-module-path=<directory>)
        -Artifact Hashing (?? How do dependency work with this? see what the process writes out and assume dependency? maybe I don’t understand this.)
        -module map path/files -fmodule-map-file=

Note the use case I am trying to understand is EA uses include paths as a layer enforcement mechanism. IE lower layer rendering can’t include high level gameplay. But gameplay can include rendering. We currently have a different set of includes for each library. A game is built out of about 400 to 300 of these libraries. Since we know what library uses what other libraries we can use this to understand what includes path are necessary. These include path dependencies are different than a libraries linkage dependency. You might use a header from a library but as you only use inline functions you don’t need to link to it and therefore you don’t need to build it first. This can be a good speed up when building DLLs.

What I am worried about with the EA include path layering enforcement is:
-We are very close to running out of command line (on windows) as we will have 100s of include paths. (A high level, application level modules will need just about every library after all.). With modules I am not sure what is the equivalent of include paths are but it would seem like we need 2x the command line for module paths if not more.
-We have had this system for a long time so we probably have duplicate include files names. if we reduced the number of include paths we might hit these problems.
-We have 100s of include paths. I worry this is not very efficient. If the OS has a good directory cache maybe this is good enough however it could be be very slow otherwise. I am not sure if other company do this type of thing.

>
> §2.2
> > <clang-scan-deps>
>
> Assuming there is consensus on D1483§7, I think this tool can be worked
> to satisfy it. There may be issues around having the tool emulate other
> compilers, but we have experience with that as well[1]. I think we may
> even be able to drop some features from the list (e.g., pcm generation
> at scan time and "installed listeners") in your paper.
>
> We're not saying that this specific implementation should be the only one (or that it shouldn't!). I'd certainly be interested in seeing other projects implement similar tools.
>
>
> §2.3
> > <mapping files>
>
> The approach we describe doesn't require mapping files at all. Other
> tools may find them useful however. I'm thinking mainly static analysis
> tools (versus those that piggyback on the compiler like IWYU and
> `clang-tidy`) since they can't just say "run us with your build tool".
>
> That can certainly be an optional thing.
>
>
> §2.4
> > We believe modules should be built (not shipped or distributed) as
> > part of a build, and potentially shared in the same environment for
> > other compiler invocations that end up using non-conflicting setups.
>
> Linux distros aren't going to like this… Nor are ports-based systems
> (like Homebrew). Go, Rust, and other languages can get away with it
> because given source code, how to build it is dictated by convention and
> available tooling. C++ is a wild west of solutions for the "source ->
> binary" transformation and given a set of sources, there's no "good way"
> to just know how to compile it today.
>
> That said, I don't think it's something the language standard can
> dictate, but compilers can work together to provide something shippable
> beside compiled libraries.
>
> I don't think we're talking about the same thing: our paper talks about shipping something the compiler created between source code and a native binaries, and we don't think that's necessary.
>
> Linux, Homebrew, and other platforms (such as, say, the one I support) currently ship headers with native binaries. We believe that modules allows them to continue doing so, both for their own code which yields said native binaries, as well as for developers on these platforms which link to the native binaries (by referring to the headers) using modules for their own code.
>
> Agreed it's a bit of a wild west, but again modules aren't The C++ Savior, and they don't need to solve this particular problem in our opinion. There's plenty of people who are looking at a variety of solutions, including shipping LLVM IR, using WebAssembly, or putting your compiler's artifacts in the blockchain. I wouldn't want C++ modules to come in and remove all the fun innovation.
>
> I agree that we could standardize some C++ module format that every toolchain agrees to, and somehow fix the problem with multiple configurations (the one we describe with -D, -Werror, optimization levels, etc). That would indeed be an easy distribution format. It would, however, severely restrict what implementations can do. I don't think it's a valuable tradeoff to make at this point in time.
>
> I'll draw a parallel with LLVM IR: it *can* be stabilized in some way, but that has a bunch of issues (some solvable!). Were LLVM IR actually stabilized we'd lose plenty of flexibility as a compiler. It's not a silly idea, it's been tried plenty of times, but so far LLVM has seen many valid reasons to change its IR over time. We can figure things out like auto-upgrading from older versions to newer ones, how to handle semantic changes, how to encode information which isn't fully relevant (such as debug info and various metadata), but... Well my experience tells me we won't get it right, and it'll take quite a while to get something somewhat sensible.
>
>
> Thanks,
>
> --Ben
>
> [1]https://github.com/CastXML/CastXML <https://github.com/CastXML/CastXML>
> _______________________________________________
> Tooling mailing list
> Tooling_at_[hidden] <mailto:Tooling_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/tooling <http://www.open-std.org/mailman/listinfo/tooling>
> _______________________________________________
> Tooling mailing list
> Tooling_at_[hidden] <mailto:Tooling_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/tooling <http://www.open-std.org/mailman/listinfo/tooling>

Received on 2019-02-09 09:01:13