Date: Sat, 9 Feb 2019 19:17:14 -0800
Nice story. Was this on Linux?
We sorted our include paths by the number of hits to that path and it was a 10% gain to clean build times. But this was a long time ago before windows 10 even. So I was not sure if the better caching in Windows 10 might have helped since. We have lost the tech to do this however and not been bothered to write this.
This is a pain as really having one include per library works very well as you can’t include things you should not. Keep your layering enforced by your build system.
I am not sure how many modules we will have per library yet but if it is close to 1 to 1 understanding this problem might big enough to want to fix it. If we had a directory system caching service that could very quickly tell you what of the 100s of include directory you really needed to check. The way I look at it right now is we will have the same includes as we will with module search paths (as this would just be the easiest thing to do in the build system) so what ever this time is now will double.
Scott
> On Feb 9, 2019, at 17:31, Ben Craig <ben.craig_at_[hidden]> wrote:
>
> The project I build most often has roughly ~150 include paths. We've got a crazy conglomeration of gnu make generating FASTBuild projects, held together with python, perl, and developer tears.
>
> I recently improved the preprocessing time of that project from 7 minutes 30 seconds to 4 minutes 30 seconds by moving the boost include from position ~100 in the include path to position ~4. So yeah, long include paths can have an impact on build times.
>
>> -----Original Message-----
>> From: tooling-bounces_at_[hidden] <tooling-bounces_at_[hidden]> On
>> Behalf Of Scott Wardle
>> Sent: Saturday, February 9, 2019 6:53 PM
>> To: ben.boeckel_at_[hidden]
>> Cc: WG21 Tooling Study Group SG15 <tooling_at_[hidden]>;
>> michael_spencer_at_[hidden]
>> Subject: [EXTERNAL] Re: [Tooling] Modules feedback
>>
>>
>>
>>> On Feb 9, 2019, at 2:12 PM, Ben Boeckel <ben.boeckel_at_[hidden]>
>> wrote:
>>>
>>>> On Sat, Feb 09, 2019 at 00:01:07 -0800, Scott Wardle wrote:
>>>> I think you are total right that is what I could use here. I would
>>>> love a diagram that shows what processes or stages are needed for the
>>>> use cases you are thinking of clean build vs incremental or maybe
>>>> some others.
>>>
>>> I'll look at adding two possible implementation (1:1:1 source:scan:ddi
>>> versus N:1:N where N can be all-at-once or per-target) graphs.
>>>
>>>> Here is what I was thinking of:
>>>> - Clean builds vs incremental builds
>>>> - Linux processes are cheap vs windows less process more threads
>>>> - Multi computer build, what data is pushed over the network what data is
>> pulled over the network.
>>>> -Object/Module BMI/Binary/Module Map caching vs no caching.
>>>
>>> These are probably good as notes on when one approach might be
>>> preferred over the other. I'm more than willing to go over build
>>> graphs on a whiteboard at Kona, but I don't know that multiple pages
>>> of incremental diagrams while trying to describe execution strategies
>>> of them is going to be easy to digest.
>>
>> You might be right it is hard to say how much detail to write here. I think we
>> are all stuck in our own little worlds it is hard to see how this works for
>> everyone. Maybe one good example is better than a lot of small 1/2 done
>> ones.
>>
>>>
>>>> Maybe even making this more concrete and talk about command lines of
>> some of these use cases:
>>>> At least we should talk about:
>>>> -does modules change anything with:
>>>> -Include paths -I<dir> vs -isystem
>>>> -Object/Library path -L<dir>
>>>> -are there overlap with:
>>>> -module BMI paths (-fmodules-cache-path=<directory> vs -fprebuilt-
>> module-path=<directory>)
>>>> -module map path/files -fmodule-map-file=
>>>
>>
>> I was hopping if we enumerated the different ways of making modules and
>> their different command lines I would get data I was thinking of.
>>
>> I am trying to understand where we are with the merged module proposal.
>> Last time I looked at things was module TS. From my reading of the merged
>> module proposal it sounds like we can do what we did in module TS and what
>> we could do with older clang modules. These both work very differently but
>> now maybe we can do both of these at once with import "some-header.h”;
>> vs import foo;?
>>
>> What I want to know is: what do we think the posable inputs and outputs are
>> for each phase of the build process. In trying to figure out what styles of
>> inputs and outputs we like for our build tools enumerating the current set off
>> posable inputs and outputs seems like a good idea. At the very least some
>> day we will have to teach some or all of these possibilities.
>>
>> The kind of issue I am looking for is for example with module TS why did we
>> type "export module foo;" why do we need the “foo” would this not be the
>> name of the file? IE there was a command line where you needed to write
>> the name of the modules BMI file /module:output obj\foo.ifc what is the
>> name of the module? The name of the BMI file or the name in the .cppm/ixx.
>> Why do we need both? (I think 2.3 seems to call out some reasons, but with
>> a module map in clangs doc talks about making many modules out of many
>> headers with one map. As very different thing then a one to one relation I
>> was playing with in module TS anyways).
>>
>> Sorry I am asking so many questions. This is great stuff.
>>
>>
>>> Eh, these might be to low-level for what we're describing. We list what
>>> is important for compilers to provide in §7.1. The actual flag spellings
>>> aren't (that) important. In any case, I would hope that it would not.
>>> Changing semantics of flags as fundamental as `-I` or `-L` based on
>>> `-fmodules` or `-std=c++2a` is not going to be fun for build tools to
>>> implement.
>>>
>>>> -Artifact Hashing (?? How do dependency work with this? see
>>>> what the process writes out and assume dependency? maybe I
>>>> don’t understand this.)
>>>
>>> This is strictly a dependency detection strategy. `mtime` is common, but
>>> hashing before saying "dirty" is another viable strategy.
>>>
>>> The rest of this is off-topic here, but I'll add my 2¢.
>>
>> I see, I was thinking you would name the output BMI based on the hash of
>> the input or something. So this is like dependency in SCons.
>>
>>>
>>>> Note the use case I am trying to understand is EA uses include paths
>>>> as a layer enforcement mechanism. IE lower layer rendering can’t
>>>> include high level gameplay. But gameplay can include rendering. We
>>>> currently have a different set of includes for each library. A game is
>>>> built out of about 400 to 300 of these libraries. Since we know what
>>>> library uses what other libraries we can use this to understand what
>>>> includes path are necessary. These include path dependencies are
>>>> different than a libraries linkage dependency. You might use a header
>>>> from a library but as you only use inline functions you don’t need to
>>>> link to it and therefore you don’t need to build it first. This can be
>>>> a good speed up when building DLLs.
>>>
>>> For CMake, there is a potential to add `$<COMPILE_ONLY>` genex to add
>>> usage requirements, but ignore the target at link time. Other build
>>> tools could certainly implement analogous semantics.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__gitlab.kitware.com_cmake_cmake_issues_18049-23note-
>> 5F496112&d=DwIGaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r
>> =y8mub81SfUi-
>> UCZRX0Vl1g&m=gu_Ht4Rd0JlRtQbUKN0ry8naMpc1KMe31hB33VWz3eM&s=
>> GGDmnCHAvmQL4UQYZ2insqoSkpF1G28Jdc2cxklZuP4&e=
>>>
>>>> What I am worried about with the EA include path layering enforcement
>>>> is:
>>>> -We are very close to running out of command line (on windows) as we
>>>> will have 100s of include paths. (A high level, application level
>>>> modules will need just about every library after all.). With modules
>>>> I am not sure what is the equivalent of include paths are but it would
>>>> seem like we need 2x the command line for module paths if not more.
>>>> -We have had this system for a long time so we probably have duplicate
>>>> include files names. if we reduced the number of include paths we
>>>> might hit these problems.
>>>
>>> Response files help a lot here.
>>
>> Yes this seems like an easy problem at first glance. The problem is we
>> generate visual studio SLN files and these do not really support huge projects
>> that we are trying to build very well. If EA moved away from SLN files it
>> would be easy to fix. We have done this before then then came back visual
>> studio SLN files as then we get a GUI to do non-permanent changes to the
>> SLN files. (Since we use glob source files we regen our sln on a sync.)
>>
>>>
>>>> -We have 100s of include paths. I worry this is not very efficient.
>>>> If the OS has a good directory cache maybe this is good enough however
>>>> it could be be very slow otherwise. I am not sure if other company do
>>>> this type of thing.
>>>
>>> VTK's build can have ~100 `-I` flags for parts using "lots" of VTK. It
>>> certainly has *an* effect, but I don't know how much number-wise.
>>>
>>> --Ben
>>
>> It is nice to hear that other people are hitting similar issues here.
>>
>> Scott
>>
>> _______________________________________________
>> Tooling mailing list
>> Tooling_at_[hidden]
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open-
>> 2Dstd.org_mailman_listinfo_tooling&d=DwIGaQ&c=I_0YwoKy7z5LMTVdyO6
>> YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-
>> UCZRX0Vl1g&m=gu_Ht4Rd0JlRtQbUKN0ry8naMpc1KMe31hB33VWz3eM&s=
>> 13at1MUpgj53U8STD-4214YgePWHYOqWgBZ_5Ne3GQc&e=
> _______________________________________________
> Tooling mailing list
> Tooling_at_[hidden]
> http://www.open-std.org/mailman/listinfo/tooling
We sorted our include paths by the number of hits to that path and it was a 10% gain to clean build times. But this was a long time ago before windows 10 even. So I was not sure if the better caching in Windows 10 might have helped since. We have lost the tech to do this however and not been bothered to write this.
This is a pain as really having one include per library works very well as you can’t include things you should not. Keep your layering enforced by your build system.
I am not sure how many modules we will have per library yet but if it is close to 1 to 1 understanding this problem might big enough to want to fix it. If we had a directory system caching service that could very quickly tell you what of the 100s of include directory you really needed to check. The way I look at it right now is we will have the same includes as we will with module search paths (as this would just be the easiest thing to do in the build system) so what ever this time is now will double.
Scott
> On Feb 9, 2019, at 17:31, Ben Craig <ben.craig_at_[hidden]> wrote:
>
> The project I build most often has roughly ~150 include paths. We've got a crazy conglomeration of gnu make generating FASTBuild projects, held together with python, perl, and developer tears.
>
> I recently improved the preprocessing time of that project from 7 minutes 30 seconds to 4 minutes 30 seconds by moving the boost include from position ~100 in the include path to position ~4. So yeah, long include paths can have an impact on build times.
>
>> -----Original Message-----
>> From: tooling-bounces_at_[hidden] <tooling-bounces_at_[hidden]> On
>> Behalf Of Scott Wardle
>> Sent: Saturday, February 9, 2019 6:53 PM
>> To: ben.boeckel_at_[hidden]
>> Cc: WG21 Tooling Study Group SG15 <tooling_at_[hidden]>;
>> michael_spencer_at_[hidden]
>> Subject: [EXTERNAL] Re: [Tooling] Modules feedback
>>
>>
>>
>>> On Feb 9, 2019, at 2:12 PM, Ben Boeckel <ben.boeckel_at_[hidden]>
>> wrote:
>>>
>>>> On Sat, Feb 09, 2019 at 00:01:07 -0800, Scott Wardle wrote:
>>>> I think you are total right that is what I could use here. I would
>>>> love a diagram that shows what processes or stages are needed for the
>>>> use cases you are thinking of clean build vs incremental or maybe
>>>> some others.
>>>
>>> I'll look at adding two possible implementation (1:1:1 source:scan:ddi
>>> versus N:1:N where N can be all-at-once or per-target) graphs.
>>>
>>>> Here is what I was thinking of:
>>>> - Clean builds vs incremental builds
>>>> - Linux processes are cheap vs windows less process more threads
>>>> - Multi computer build, what data is pushed over the network what data is
>> pulled over the network.
>>>> -Object/Module BMI/Binary/Module Map caching vs no caching.
>>>
>>> These are probably good as notes on when one approach might be
>>> preferred over the other. I'm more than willing to go over build
>>> graphs on a whiteboard at Kona, but I don't know that multiple pages
>>> of incremental diagrams while trying to describe execution strategies
>>> of them is going to be easy to digest.
>>
>> You might be right it is hard to say how much detail to write here. I think we
>> are all stuck in our own little worlds it is hard to see how this works for
>> everyone. Maybe one good example is better than a lot of small 1/2 done
>> ones.
>>
>>>
>>>> Maybe even making this more concrete and talk about command lines of
>> some of these use cases:
>>>> At least we should talk about:
>>>> -does modules change anything with:
>>>> -Include paths -I<dir> vs -isystem
>>>> -Object/Library path -L<dir>
>>>> -are there overlap with:
>>>> -module BMI paths (-fmodules-cache-path=<directory> vs -fprebuilt-
>> module-path=<directory>)
>>>> -module map path/files -fmodule-map-file=
>>>
>>
>> I was hopping if we enumerated the different ways of making modules and
>> their different command lines I would get data I was thinking of.
>>
>> I am trying to understand where we are with the merged module proposal.
>> Last time I looked at things was module TS. From my reading of the merged
>> module proposal it sounds like we can do what we did in module TS and what
>> we could do with older clang modules. These both work very differently but
>> now maybe we can do both of these at once with import "some-header.h”;
>> vs import foo;?
>>
>> What I want to know is: what do we think the posable inputs and outputs are
>> for each phase of the build process. In trying to figure out what styles of
>> inputs and outputs we like for our build tools enumerating the current set off
>> posable inputs and outputs seems like a good idea. At the very least some
>> day we will have to teach some or all of these possibilities.
>>
>> The kind of issue I am looking for is for example with module TS why did we
>> type "export module foo;" why do we need the “foo” would this not be the
>> name of the file? IE there was a command line where you needed to write
>> the name of the modules BMI file /module:output obj\foo.ifc what is the
>> name of the module? The name of the BMI file or the name in the .cppm/ixx.
>> Why do we need both? (I think 2.3 seems to call out some reasons, but with
>> a module map in clangs doc talks about making many modules out of many
>> headers with one map. As very different thing then a one to one relation I
>> was playing with in module TS anyways).
>>
>> Sorry I am asking so many questions. This is great stuff.
>>
>>
>>> Eh, these might be to low-level for what we're describing. We list what
>>> is important for compilers to provide in §7.1. The actual flag spellings
>>> aren't (that) important. In any case, I would hope that it would not.
>>> Changing semantics of flags as fundamental as `-I` or `-L` based on
>>> `-fmodules` or `-std=c++2a` is not going to be fun for build tools to
>>> implement.
>>>
>>>> -Artifact Hashing (?? How do dependency work with this? see
>>>> what the process writes out and assume dependency? maybe I
>>>> don’t understand this.)
>>>
>>> This is strictly a dependency detection strategy. `mtime` is common, but
>>> hashing before saying "dirty" is another viable strategy.
>>>
>>> The rest of this is off-topic here, but I'll add my 2¢.
>>
>> I see, I was thinking you would name the output BMI based on the hash of
>> the input or something. So this is like dependency in SCons.
>>
>>>
>>>> Note the use case I am trying to understand is EA uses include paths
>>>> as a layer enforcement mechanism. IE lower layer rendering can’t
>>>> include high level gameplay. But gameplay can include rendering. We
>>>> currently have a different set of includes for each library. A game is
>>>> built out of about 400 to 300 of these libraries. Since we know what
>>>> library uses what other libraries we can use this to understand what
>>>> includes path are necessary. These include path dependencies are
>>>> different than a libraries linkage dependency. You might use a header
>>>> from a library but as you only use inline functions you don’t need to
>>>> link to it and therefore you don’t need to build it first. This can be
>>>> a good speed up when building DLLs.
>>>
>>> For CMake, there is a potential to add `$<COMPILE_ONLY>` genex to add
>>> usage requirements, but ignore the target at link time. Other build
>>> tools could certainly implement analogous semantics.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__gitlab.kitware.com_cmake_cmake_issues_18049-23note-
>> 5F496112&d=DwIGaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r
>> =y8mub81SfUi-
>> UCZRX0Vl1g&m=gu_Ht4Rd0JlRtQbUKN0ry8naMpc1KMe31hB33VWz3eM&s=
>> GGDmnCHAvmQL4UQYZ2insqoSkpF1G28Jdc2cxklZuP4&e=
>>>
>>>> What I am worried about with the EA include path layering enforcement
>>>> is:
>>>> -We are very close to running out of command line (on windows) as we
>>>> will have 100s of include paths. (A high level, application level
>>>> modules will need just about every library after all.). With modules
>>>> I am not sure what is the equivalent of include paths are but it would
>>>> seem like we need 2x the command line for module paths if not more.
>>>> -We have had this system for a long time so we probably have duplicate
>>>> include files names. if we reduced the number of include paths we
>>>> might hit these problems.
>>>
>>> Response files help a lot here.
>>
>> Yes this seems like an easy problem at first glance. The problem is we
>> generate visual studio SLN files and these do not really support huge projects
>> that we are trying to build very well. If EA moved away from SLN files it
>> would be easy to fix. We have done this before then then came back visual
>> studio SLN files as then we get a GUI to do non-permanent changes to the
>> SLN files. (Since we use glob source files we regen our sln on a sync.)
>>
>>>
>>>> -We have 100s of include paths. I worry this is not very efficient.
>>>> If the OS has a good directory cache maybe this is good enough however
>>>> it could be be very slow otherwise. I am not sure if other company do
>>>> this type of thing.
>>>
>>> VTK's build can have ~100 `-I` flags for parts using "lots" of VTK. It
>>> certainly has *an* effect, but I don't know how much number-wise.
>>>
>>> --Ben
>>
>> It is nice to hear that other people are hitting similar issues here.
>>
>> Scott
>>
>> _______________________________________________
>> Tooling mailing list
>> Tooling_at_[hidden]
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open-
>> 2Dstd.org_mailman_listinfo_tooling&d=DwIGaQ&c=I_0YwoKy7z5LMTVdyO6
>> YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-
>> UCZRX0Vl1g&m=gu_Ht4Rd0JlRtQbUKN0ry8naMpc1KMe31hB33VWz3eM&s=
>> 13at1MUpgj53U8STD-4214YgePWHYOqWgBZ_5Ne3GQc&e=
> _______________________________________________
> Tooling mailing list
> Tooling_at_[hidden]
> http://www.open-std.org/mailman/listinfo/tooling
Received on 2019-02-10 04:17:19