Date: Fri, 26 May 2023 13:06:20 +0200
On Wed, May 24, 2023 at 5:06 PM Tom Honermann via SG15 <
sg15_at_[hidden]> wrote:
> On 5/24/23 10:14 AM, Daniel Ruoso via SG15 wrote:
>
> Em ter., 23 de mai. de 2023 às 21:54, Tom Honermann<tom_at_[hidden]> <tom_at_[hidden]> escreveu:
>
> What subset of environments is it not implementable in? The paper doesn't actually say as far as I can tell. What is a concrete example?
>
> It is not implementable in open-ended build systems, such as what we
> have at Bloomberg.
>
> I still don't see an issue. It would be very helpful if, as Jens
> suggested, you provide a concrete example of what you perceive to be the
> problem.
>
> Concretely, what would happen is that if I do an `apt-get update &&
> apt-get dist-upgrade` it could result in a full clean build in my
> incremental build because the list of importable headers or the
> arguments to those could change.
>
> We seem to have a clear disagreement regarding this. Let's try to separate
> the possibilities:
>
> *1: A header file that was not previously importable becomes importable.*
>
> You have been claiming that this situation requires that dependency
> information for all TUs in the build be re-generated. I claim that only the
> TUs that previously included that header file need to be re-scanned (and
> rebuilt if dependencies actually changed). We aren't both right. What is
> your argument for why I am wrong? Please provide an example that
> illustrates the problem. In particular, please show an example of a TU that
> did not previously include the header file but that requires re-scanning
> because that header file is now importable.
>
> *2: A previously importable header becomes non-importable.*
>
> The issues here are similar to the situation in which a previously
> deprecated/obsoleted header file is removed. At least some build systems
> will report an error in this case because a dependent file (e.g., a BMI
> file for the no longer importable header file) might no longer be found
> (and no rules are present to generate it). The solution in these cases is
> to regenerate dependency information (for the affected TUs). If such an
> error does not occur (e.g., because a BMI file is still present), then
> build systems that use the common approach of using dependency information
> that is generated as a side effect of compilation will be fine (because
> prior dependencies will be satisfied and re-compilation will produce
> dependencies that no longer reference the BMI).
>
> Note that I am assuming that the BMI file corresponding to an importable
> header is tracked with the dependency information (a build system that does
> not track BMI files is not sound because it won't behave correctly when BMI
> files are modified from outside the build system).
>
> *3: A new (public) header file (that might or might not be importable) is
> added.*
>
> This technically requires a full re-scan of the project because of the
> possibility of conditional inclusion based on __has_include. And TUs that
> gain a new dependency must be rebuilt.
>
I think basically every buildsystem currently screws this up today.
Compilers lack a way of saying which files the current build depends on
*NOT* existing. In theory they could be included in the deps output,
however current build systems don't work correctly when files that don't
exist are in there, so that isn't a viable strategy. Not saying that it is
a _good_ thing, but we do tolerate this right now and it isn't often a
problem in practice. I think we are mostly saved by changes of
_has_include() also touching some other file that *is* actually included.
That said, there is a similar problem of what to do when a new header is
introduced earlier in the include path, and isn't detected because it
wasn't there when the TU was built previously. This *has* bitten us a few
times.
> I readily admit that many build systems do not handle these kind of
> changes as efficiently as they could. But these are not new problems; as
> I've previously stated, these situations are very similar to those for
> generated header files.
>
They differ from generated headers in some important ways, especially if
you never have compile tasks that are inputs to generated headers, eg
because you use an interpreted language (or anything other than C++) to
generate your headers. That allows a very simple technique of just saying
that anything that preprocesses (which would include a dep scanner) simply
has an order-only dependency on all generated headers. This works fine
today, but the same technique can't be used for
> I'll also state that I don't find the example of an incremental rebuild
> following an apt-get update or apt-get dist-upgrade compelling. I have
> yet to use a build system that I would trust to handle such situations
> (primarily because timestamp information between the distribution provided
> header files and previously built objects are no longer correlated in a
> meaningful way; assuming the package manager installs header files with a
> preserved timestamp; a system that matches file contents is necessary in
> that case).
>
1) @Daniel: In almost every case a dist-upgrade *should* result in a full
rebuild! If the dist-upgrade upgraded either your compiler or libc (which I
think dist-upgrade usually does) then any build system that doesn't rebuild
everything is broken. That said, the default mtime checking approach won't
always notice that change (which Tom was pointing out), however...
2) @Tom: It *is* possible to teach at least ninja to correctly handle
backdated files, although admittedly it is a massive pain in the ass, and
it probably isn't possible to make it do that correctly for dynamic deps
like from headers, since they are handled specially. But I have used this
technique
<https://github.com/RedBeard0531/mongo_module_ninja/blob/6853091b0a9c2e0ec926df44ed6e0d4a53ca6b2c/build.py#L328-L355>
(which
uses this helper program
<https://github.com/RedBeard0531/mongo_module_ninja/blob/master/touch_compiler_timestamps.py>)
to do that at least for the compiler binary itself. It works by having a
task that depends on the compiler that will output 2 files: one with a
timestamp that matches the compiler's mtime (the "thenfile") and another
that uses the current time (the "nowfile"). All build tasks that depend on
the compiler should also depend on the nowfile. If the compiler is
upgraded, even if install backdates it to a time before the nowfile (which
I think is a terrible mistake by distros, but not one that I can fix!), it
will still be newer than the thenfile, so the timestamping task will be
considered dirty and the nowfile will be updated which results in
everything that depends on it also being considered dirty. This was
critical for us because the buildsystem is responsible for generating the
tarball for chrooting into on remote builds, and it would be catastrophic
for correctness if the compiler used on remote builds was different from
the one used locally. I suspect it is possible to do something similar for
make but I am much less of an expert at (ab)using make. Really I wish make
would die off now that ninja exists since everything make can do, ninja can
do better (although admittedly sometimes requiring contortions)
> The paper doesn't quantify costs in any way. The closest it comes is (correctly) noting that dependency scanning for header units requires computing imported macros in a bottom up way that is not required for named modules. But the paper doesn't quantify that cost. Is it a 5% hit? A 95% hit? Linear with respect to the number of header units? Other papers have offered quantification; see P1441 (Are modules fast?) for example.
>
> The cost is quite binary, actually. Every time the list of importable
> headers or the arguments to those change, it results in a full clean
> build.
>
> Why a clean build? If the dependency information didn't actually change
> for a given TU, why would its previously built objects have to be rebuilt
> (assuming its dependent header files weren't actually modified)?
>
> Tom.
>
> daniel
> _______________________________________________
> SG15 mailing listSG15_at_[hidden]://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
sg15_at_[hidden]> wrote:
> On 5/24/23 10:14 AM, Daniel Ruoso via SG15 wrote:
>
> Em ter., 23 de mai. de 2023 às 21:54, Tom Honermann<tom_at_[hidden]> <tom_at_[hidden]> escreveu:
>
> What subset of environments is it not implementable in? The paper doesn't actually say as far as I can tell. What is a concrete example?
>
> It is not implementable in open-ended build systems, such as what we
> have at Bloomberg.
>
> I still don't see an issue. It would be very helpful if, as Jens
> suggested, you provide a concrete example of what you perceive to be the
> problem.
>
> Concretely, what would happen is that if I do an `apt-get update &&
> apt-get dist-upgrade` it could result in a full clean build in my
> incremental build because the list of importable headers or the
> arguments to those could change.
>
> We seem to have a clear disagreement regarding this. Let's try to separate
> the possibilities:
>
> *1: A header file that was not previously importable becomes importable.*
>
> You have been claiming that this situation requires that dependency
> information for all TUs in the build be re-generated. I claim that only the
> TUs that previously included that header file need to be re-scanned (and
> rebuilt if dependencies actually changed). We aren't both right. What is
> your argument for why I am wrong? Please provide an example that
> illustrates the problem. In particular, please show an example of a TU that
> did not previously include the header file but that requires re-scanning
> because that header file is now importable.
>
> *2: A previously importable header becomes non-importable.*
>
> The issues here are similar to the situation in which a previously
> deprecated/obsoleted header file is removed. At least some build systems
> will report an error in this case because a dependent file (e.g., a BMI
> file for the no longer importable header file) might no longer be found
> (and no rules are present to generate it). The solution in these cases is
> to regenerate dependency information (for the affected TUs). If such an
> error does not occur (e.g., because a BMI file is still present), then
> build systems that use the common approach of using dependency information
> that is generated as a side effect of compilation will be fine (because
> prior dependencies will be satisfied and re-compilation will produce
> dependencies that no longer reference the BMI).
>
> Note that I am assuming that the BMI file corresponding to an importable
> header is tracked with the dependency information (a build system that does
> not track BMI files is not sound because it won't behave correctly when BMI
> files are modified from outside the build system).
>
> *3: A new (public) header file (that might or might not be importable) is
> added.*
>
> This technically requires a full re-scan of the project because of the
> possibility of conditional inclusion based on __has_include. And TUs that
> gain a new dependency must be rebuilt.
>
I think basically every buildsystem currently screws this up today.
Compilers lack a way of saying which files the current build depends on
*NOT* existing. In theory they could be included in the deps output,
however current build systems don't work correctly when files that don't
exist are in there, so that isn't a viable strategy. Not saying that it is
a _good_ thing, but we do tolerate this right now and it isn't often a
problem in practice. I think we are mostly saved by changes of
_has_include() also touching some other file that *is* actually included.
That said, there is a similar problem of what to do when a new header is
introduced earlier in the include path, and isn't detected because it
wasn't there when the TU was built previously. This *has* bitten us a few
times.
> I readily admit that many build systems do not handle these kind of
> changes as efficiently as they could. But these are not new problems; as
> I've previously stated, these situations are very similar to those for
> generated header files.
>
They differ from generated headers in some important ways, especially if
you never have compile tasks that are inputs to generated headers, eg
because you use an interpreted language (or anything other than C++) to
generate your headers. That allows a very simple technique of just saying
that anything that preprocesses (which would include a dep scanner) simply
has an order-only dependency on all generated headers. This works fine
today, but the same technique can't be used for
> I'll also state that I don't find the example of an incremental rebuild
> following an apt-get update or apt-get dist-upgrade compelling. I have
> yet to use a build system that I would trust to handle such situations
> (primarily because timestamp information between the distribution provided
> header files and previously built objects are no longer correlated in a
> meaningful way; assuming the package manager installs header files with a
> preserved timestamp; a system that matches file contents is necessary in
> that case).
>
1) @Daniel: In almost every case a dist-upgrade *should* result in a full
rebuild! If the dist-upgrade upgraded either your compiler or libc (which I
think dist-upgrade usually does) then any build system that doesn't rebuild
everything is broken. That said, the default mtime checking approach won't
always notice that change (which Tom was pointing out), however...
2) @Tom: It *is* possible to teach at least ninja to correctly handle
backdated files, although admittedly it is a massive pain in the ass, and
it probably isn't possible to make it do that correctly for dynamic deps
like from headers, since they are handled specially. But I have used this
technique
<https://github.com/RedBeard0531/mongo_module_ninja/blob/6853091b0a9c2e0ec926df44ed6e0d4a53ca6b2c/build.py#L328-L355>
(which
uses this helper program
<https://github.com/RedBeard0531/mongo_module_ninja/blob/master/touch_compiler_timestamps.py>)
to do that at least for the compiler binary itself. It works by having a
task that depends on the compiler that will output 2 files: one with a
timestamp that matches the compiler's mtime (the "thenfile") and another
that uses the current time (the "nowfile"). All build tasks that depend on
the compiler should also depend on the nowfile. If the compiler is
upgraded, even if install backdates it to a time before the nowfile (which
I think is a terrible mistake by distros, but not one that I can fix!), it
will still be newer than the thenfile, so the timestamping task will be
considered dirty and the nowfile will be updated which results in
everything that depends on it also being considered dirty. This was
critical for us because the buildsystem is responsible for generating the
tarball for chrooting into on remote builds, and it would be catastrophic
for correctness if the compiler used on remote builds was different from
the one used locally. I suspect it is possible to do something similar for
make but I am much less of an expert at (ab)using make. Really I wish make
would die off now that ninja exists since everything make can do, ninja can
do better (although admittedly sometimes requiring contortions)
> The paper doesn't quantify costs in any way. The closest it comes is (correctly) noting that dependency scanning for header units requires computing imported macros in a bottom up way that is not required for named modules. But the paper doesn't quantify that cost. Is it a 5% hit? A 95% hit? Linear with respect to the number of header units? Other papers have offered quantification; see P1441 (Are modules fast?) for example.
>
> The cost is quite binary, actually. Every time the list of importable
> headers or the arguments to those change, it results in a full clean
> build.
>
> Why a clean build? If the dependency information didn't actually change
> for a given TU, why would its previously built objects have to be rebuilt
> (assuming its dependent header files weren't actually modified)?
>
> Tom.
>
> daniel
> _______________________________________________
> SG15 mailing listSG15_at_[hidden]://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
Received on 2023-05-26 11:06:36