sg15: Re: [Tooling] Modules feedback

From: Scott Wardle <swardle_at_[hidden]>
Date: Sat, 9 Feb 2019 16:53:02 -0800

> On Feb 9, 2019, at 2:12 PM, Ben Boeckel <ben.boeckel_at_[hidden]> wrote:
>
> On Sat, Feb 09, 2019 at 00:01:07 -0800, Scott Wardle wrote:
>> I think you are total right that is what I could use here. I would
>> love a diagram that shows what processes or stages are needed for the
>> use cases you are thinking of clean build vs incremental or maybe some
>> others.
>
> I'll look at adding two possible implementation (1:1:1 source:scan:ddi
> versus N:1:N where N can be all-at-once or per-target) graphs.
>
>> Here is what I was thinking of:
>> - Clean builds vs incremental builds
>> - Linux processes are cheap vs windows less process more threads
>> - Multi computer build, what data is pushed over the network what data is pulled over the network.
>> -Object/Module BMI/Binary/Module Map caching vs no caching.
>
> These are probably good as notes on when one approach might be preferred
> over the other. I'm more than willing to go over build graphs on a
> whiteboard at Kona, but I don't know that multiple pages of incremental
> diagrams while trying to describe execution strategies of them is going
> to be easy to digest.

You might be right it is hard to say how much detail to write here. I think we are all stuck in our own little worlds it is hard to see how this works for everyone. Maybe one good example is better than a lot of small 1/2 done ones.

>
>> Maybe even making this more concrete and talk about command lines of some of these use cases:
>> At least we should talk about:
>> -does modules change anything with:
>> -Include paths -I<dir> vs -isystem
>> -Object/Library path -L<dir>
>> -are there overlap with:
>> -module BMI paths (-fmodules-cache-path=<directory> vs -fprebuilt-module-path=<directory>)
>> -module map path/files -fmodule-map-file=
>

I was hopping if we enumerated the different ways of making modules and their different command lines I would get data I was thinking of.

I am trying to understand where we are with the merged module proposal. Last time I looked at things was module TS. From my reading of the merged module proposal it sounds like we can do what we did in module TS and what we could do with older clang modules. These both work very differently but now maybe we can do both of these at once with import "some-header.h”; vs import foo;?

What I want to know is: what do we think the posable inputs and outputs are for each phase of the build process. In trying to figure out what styles of inputs and outputs we like for our build tools enumerating the current set off posable inputs and outputs seems like a good idea. At the very least some day we will have to teach some or all of these possibilities.

The kind of issue I am looking for is for example with module TS why did we type "export module foo;" why do we need the “foo” would this not be the name of the file? IE there was a command line where you needed to write the name of the modules BMI file /module:output obj\foo.ifc what is the name of the module? The name of the BMI file or the name in the .cppm/ixx. Why do we need both? (I think 2.3 seems to call out some reasons, but with a module map in clangs doc talks about making many modules out of many headers with one map. As very different thing then a one to one relation I was playing with in module TS anyways).

Sorry I am asking so many questions. This is great stuff.

> Eh, these might be to low-level for what we're describing. We list what
> is important for compilers to provide in §7.1. The actual flag spellings
> aren't (that) important. In any case, I would hope that it would not.
> Changing semantics of flags as fundamental as `-I` or `-L` based on
> `-fmodules` or `-std=c++2a` is not going to be fun for build tools to
> implement.
>
>> -Artifact Hashing (?? How do dependency work with this? see
>> what the process writes out and assume dependency? maybe I
>> don’t understand this.)
>
> This is strictly a dependency detection strategy. `mtime` is common, but
> hashing before saying "dirty" is another viable strategy.
>
> The rest of this is off-topic here, but I'll add my 2¢.

I see, I was thinking you would name the output BMI based on the hash of the input or something. So this is like dependency in SCons.

>
>> Note the use case I am trying to understand is EA uses include paths
>> as a layer enforcement mechanism. IE lower layer rendering can’t
>> include high level gameplay. But gameplay can include rendering. We
>> currently have a different set of includes for each library. A game is
>> built out of about 400 to 300 of these libraries. Since we know what
>> library uses what other libraries we can use this to understand what
>> includes path are necessary. These include path dependencies are
>> different than a libraries linkage dependency. You might use a header
>> from a library but as you only use inline functions you don’t need to
>> link to it and therefore you don’t need to build it first. This can be
>> a good speed up when building DLLs.
>
> For CMake, there is a potential to add `$<COMPILE_ONLY>` genex to add
> usage requirements, but ignore the target at link time. Other build
> tools could certainly implement analogous semantics.
>
> https://gitlab.kitware.com/cmake/cmake/issues/18049#note_496112
>
>> What I am worried about with the EA include path layering enforcement
>> is:
>> -We are very close to running out of command line (on windows) as we
>> will have 100s of include paths. (A high level, application level
>> modules will need just about every library after all.). With modules
>> I am not sure what is the equivalent of include paths are but it would
>> seem like we need 2x the command line for module paths if not more.
>> -We have had this system for a long time so we probably have duplicate
>> include files names. if we reduced the number of include paths we
>> might hit these problems.
>
> Response files help a lot here.

Yes this seems like an easy problem at first glance. The problem is we generate visual studio SLN files and these do not really support huge projects that we are trying to build very well. If EA moved away from SLN files it would be easy to fix. We have done this before then then came back visual studio SLN files as then we get a GUI to do non-permanent changes to the SLN files. (Since we use glob source files we regen our sln on a sync.)

>
>> -We have 100s of include paths. I worry this is not very efficient.
>> If the OS has a good directory cache maybe this is good enough however
>> it could be be very slow otherwise. I am not sure if other company do
>> this type of thing.
>
> VTK's build can have ~100 `-I` flags for parts using "lots" of VTK. It
> certainly has *an* effect, but I don't know how much number-wise.
>
> --Ben

It is nice to hear that other people are hitting similar issues here.

Scott

Received on 2019-02-10 01:53:08