Date: Wed, 4 Dec 2024 21:11:09 +0500
>
> I don't know that HMake solves the distributes build challenges.
HMake design is not a hindrance in this either. It can be perfected with
practice.
There are mismatches with this kind of distributed build tooling and
> designs that require compilation processes to block on communication with
> other compilation processes and/or build system processes because:
>
I don't want to underestimate the challenge. But I don't see a
theoretical/technical reason for this kind of model to not work with
distributed computing and caching. We break down this problem in 2.
1) central caching
This is comparatively easy. We store in the central caching hash of
(compile-command plus file content) mapped to its output (object file or
object file plus BMI) plus the same for all its dependencies, if it was not
found when matched with the hash otherwise we just use this hash and
discover all the dependency relationships and fetches output files
including dependencies output files.
2) distributed computing
This is more difficult. The remote machine can coordinate the build of an
imported module but if it hits an imported header-file or header-unit, then
it needs to not resolve the filepath itself. But this would be the
build-system responsibility to do the header importation semantics and get
this file built from somewhere, send back the output, and then receive the
final output. However, the need for this would be less in the presence of 1.
The need for both of the above will be less with scanning-less C++20
modules and header-units support. These are significant features and will
be pursued sometime in the future.
Best,
Hassan Sajjad
On Wed, Dec 4, 2024 at 5:24 PM Bret Brown <mail_at_[hidden]> wrote:
> I don't know that HMake solves the distributes build challenges.
>
> To be clear on the distributed build use case being referenced, it is that
> one can wrap calls to the compiler. Then the inputs to that command are
> provided to a distributed compilation service that provides features like
> scheduling, caching, build environment provisioning, compute capacity, etc.
> This is how sccache and recc work in particular.
>
> Here are some docs describing how to get started with recc:
> https://buildgrid.gitlab.io/buildgrid/user/using_recc.html
>
> There are mismatches with this kind of distributed build tooling and
> designs that require compilation processes to block on communication with
> other compilation processes and/or build system processes because:
>
> 1. The approach requires a scanning process that can up front know what
> inputs need to be provided in a distributed build environment and digested
> into a content-addressable hash for caching. A result format for this kind
> of scanning process is what P1689 describes.
>
> 2. If you skip that, you would need a way for a distributed compiler
> instance to gracefully block and communicate at distance to whatever other
> services are needed to complete the compilation process.
>
> In short, do not assume any compilation processes or build system
> processes are necessarily running on the same machine. Or running at all.
> In the case of a distributed cache hit, the wrapper behaves like a compiler
> by retrieving previous compilation results and writing them to disk. Not
> actually invoking the compiler via caching for a given translation unit is
> a primary motivating feature for this workflow.
>
> I hope that helps.
> Bret
>
> On Wed, Dec 4, 2024, 11:21 Hassan Sajjad via SG15 <sg15_at_[hidden]>
> wrote:
>
>> It is quite hard (explained why by several people in this thread).
>>> This is the reason why pretty much everyone gave up on supporting
>>> header units.
>>
>> It's good that you wrote "pretty much" with everyone. Now the second
>> part of your statement is quite accurate. But, I disagree with the first
>> part. If a build-system is better designed (dynamic targets support most
>> importantly) like HMake, then supporting header-units is not "quite hard".
>>
>> Best,
>> Hassan Sajjad
>>
>> On Wed, Dec 4, 2024 at 3:38 PM Boris Kolpackov via SG15 <
>> sg15_at_[hidden]> wrote:
>>
>>> Mathias Stearn via SG15 <sg15_at_[hidden]> writes:
>>>
>>> > Preface: I'm not saying we _should_ do this, just that it isn't quite
>>> as
>>> > hard on the build system.
>>>
>>> It is quite hard (explained why by several people in this thread).
>>> This is the reason why pretty much everyone gave up on supporting
>>> header units.
>>>
>>>
>>> > the second scan pass could treat them just like header units (which
>>> > don't require a BMI before scanning)
>>>
>>> They don't require BMI before scanning only if you can recreate
>>> the header importation semantics with the textual "import". To
>>> my knowledge, only Clang has such a mode and the MSVC folks told
>>> me they have no plans to provide anything like this.
>>> _______________________________________________
>>> SG15 mailing list
>>> SG15_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>>
>> _______________________________________________
>> SG15 mailing list
>> SG15_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>
>
> I don't know that HMake solves the distributes build challenges.
HMake design is not a hindrance in this either. It can be perfected with
practice.
There are mismatches with this kind of distributed build tooling and
> designs that require compilation processes to block on communication with
> other compilation processes and/or build system processes because:
>
I don't want to underestimate the challenge. But I don't see a
theoretical/technical reason for this kind of model to not work with
distributed computing and caching. We break down this problem in 2.
1) central caching
This is comparatively easy. We store in the central caching hash of
(compile-command plus file content) mapped to its output (object file or
object file plus BMI) plus the same for all its dependencies, if it was not
found when matched with the hash otherwise we just use this hash and
discover all the dependency relationships and fetches output files
including dependencies output files.
2) distributed computing
This is more difficult. The remote machine can coordinate the build of an
imported module but if it hits an imported header-file or header-unit, then
it needs to not resolve the filepath itself. But this would be the
build-system responsibility to do the header importation semantics and get
this file built from somewhere, send back the output, and then receive the
final output. However, the need for this would be less in the presence of 1.
The need for both of the above will be less with scanning-less C++20
modules and header-units support. These are significant features and will
be pursued sometime in the future.
Best,
Hassan Sajjad
On Wed, Dec 4, 2024 at 5:24 PM Bret Brown <mail_at_[hidden]> wrote:
> I don't know that HMake solves the distributes build challenges.
>
> To be clear on the distributed build use case being referenced, it is that
> one can wrap calls to the compiler. Then the inputs to that command are
> provided to a distributed compilation service that provides features like
> scheduling, caching, build environment provisioning, compute capacity, etc.
> This is how sccache and recc work in particular.
>
> Here are some docs describing how to get started with recc:
> https://buildgrid.gitlab.io/buildgrid/user/using_recc.html
>
> There are mismatches with this kind of distributed build tooling and
> designs that require compilation processes to block on communication with
> other compilation processes and/or build system processes because:
>
> 1. The approach requires a scanning process that can up front know what
> inputs need to be provided in a distributed build environment and digested
> into a content-addressable hash for caching. A result format for this kind
> of scanning process is what P1689 describes.
>
> 2. If you skip that, you would need a way for a distributed compiler
> instance to gracefully block and communicate at distance to whatever other
> services are needed to complete the compilation process.
>
> In short, do not assume any compilation processes or build system
> processes are necessarily running on the same machine. Or running at all.
> In the case of a distributed cache hit, the wrapper behaves like a compiler
> by retrieving previous compilation results and writing them to disk. Not
> actually invoking the compiler via caching for a given translation unit is
> a primary motivating feature for this workflow.
>
> I hope that helps.
> Bret
>
> On Wed, Dec 4, 2024, 11:21 Hassan Sajjad via SG15 <sg15_at_[hidden]>
> wrote:
>
>> It is quite hard (explained why by several people in this thread).
>>> This is the reason why pretty much everyone gave up on supporting
>>> header units.
>>
>> It's good that you wrote "pretty much" with everyone. Now the second
>> part of your statement is quite accurate. But, I disagree with the first
>> part. If a build-system is better designed (dynamic targets support most
>> importantly) like HMake, then supporting header-units is not "quite hard".
>>
>> Best,
>> Hassan Sajjad
>>
>> On Wed, Dec 4, 2024 at 3:38 PM Boris Kolpackov via SG15 <
>> sg15_at_[hidden]> wrote:
>>
>>> Mathias Stearn via SG15 <sg15_at_[hidden]> writes:
>>>
>>> > Preface: I'm not saying we _should_ do this, just that it isn't quite
>>> as
>>> > hard on the build system.
>>>
>>> It is quite hard (explained why by several people in this thread).
>>> This is the reason why pretty much everyone gave up on supporting
>>> header units.
>>>
>>>
>>> > the second scan pass could treat them just like header units (which
>>> > don't require a BMI before scanning)
>>>
>>> They don't require BMI before scanning only if you can recreate
>>> the header importation semantics with the textual "import". To
>>> my knowledge, only Clang has such a mode and the MSVC folks told
>>> me they have no plans to provide anything like this.
>>> _______________________________________________
>>> SG15 mailing list
>>> SG15_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>>
>> _______________________________________________
>> SG15 mailing list
>> SG15_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>>
>
Received on 2024-12-04 16:11:21