On Fri, May 26, 2023 at 3:32 PM Daniel Ruoso <daniel@ruoso.com> wrote:
Em sex., 26 de mai. de 2023 às 06:36, Mathias Stearn via SG15 <sg15@lists.isocpp.org> escreveu:
Many of your messages seem to assume that there is a single dependency scan. In this case by the use of the singular "the dependency scan" noun phrase, but other messages have made it more explicitly. That isn't the only model. You can also have a per-TU dependency scan.

I'm not sure what you mean there.

Megascan: clang-scan-deps compile-commands.json | script-that-converts-its-output-into-ninja-dyndeps-files

Per TU scan: hypothetical-scan-deps my-file.cpp.or.h  | script-that-converts-its-output-into-ninja-dyndeps-files

With the "megascan" approach you do a single scan for the entire build that outputs all of the dyndep files. Hopefully it internally does sufficient caching so that incremental builds are very fast.

With the per-TU approach, each compile job (including for header units) has its own scan task. This means that you only need to scan files that are transitively imported by some root set of files. The moderately tricky thing is that when you import a header, you need to be able to internally handle that as-if the header is preprocessed in its own independent context with its own set of flags. And because once you do scan my-file.cpp, you will then need to compute the dyndeps for imported-header.hpp (which again, is only scanned if anything ended up importing it) this relies on a lot of data sharing between independent runs of hypothetical-scan-deps, which to me, implies they are implemented by calling into a server (discussed below)

I will note that many of the late-breaking papers (eg https://wg21.link/p1857r3) for modules in C++20 were intended to ensure that a variety of very important optimizations and caching strategies would still be correct-by-spec. For example, we were very careful to ensure that it is possible to find anything that can affect the list of imports, #includes and the TU's module name looking only at preprocessor directives, and that the scanner can either quickly skip over non-directives and/or cache a "skeleton" consisting of just those directives that could impact the scan results.
In my mind, the best way to do it is to have a persistent (local!) server that caches the important information about each file (including non-importable headers), and each per-TU scan task is really just a command that calls into the server to evaluate the scan rooted at that TU.

I'm also not sure what you mean there.

Not sure if it would be super informative to you but this is the scanner I started on a while ago, but haven't touched in years. While implementing a (partial) preprocessor was a lot of fun, one of my conclusions is that it is basically impossible to exactly match the behavior of a given compiler with a separate impl, and that is critical. In addition to all of the IDB and UB in [cpp], compilers don't offer any way to query for their supported __has_cpp_attributes() and similar things. I also haven't spent much time on it due to a change in life priorities since becoming a father. I have another child due soon, which is why I will not be able to make it to Varna even though it is close to home.

That said, I still think the general approach would be the best, but it would require that each compiler make their own scanner server (or that someone who has already invested significant resources in making a preprocessor compatible with other compilers do so). The general idea was to have a cache of two maps. One is cache from file path to an "AST" of the preprocessor directives in that file, eg with an #if node with a condition to evaluate and the then and else cases as children*. This means once all of its files are in the first cache that you can very quickly "scan" a TU by walking this cache of parsed ASTs without needing to do any I/O or parsing large files where the vast majority of bytes are not preprocessor directives. Additionally you would have a second cache of {file path, input defines} -> {output defines, included headers, imported headers and modules, declared module name and kind} (This is from 2-3 year old memory, so I'm sure I'm forgetting some inputs and outputs, so please forgive any inaccuracy). This would let you only process each header once for a given set of input defines across the whole build. For header units and a well-behaved build that means exactly once. And because this would be done in a persistent server, you can take advantage of the shared caching across "independent" per-TU scan tasks within a single build and across multiple incremental builds.

Of course as with any cache, the devil is in the details. You would need to know how and when to purge old and unneeded data, and you would need to be very careful to detect when files had changed between builds. (If you are editing or generating source files within a build, the onus is on you to ensure that your dependencies ensure that the file is in a stable state prior to the first command reading it in the build)

*I recognize that the standard doesn't require an #included file to be well balanced internally, it only requires that when you finish a full TU that it is well balanced. That said all files in practice tend to be well balanced so it seems reasonable for an implementation to simply detect this case and punt to a much slower implementation for TU that do nonsense like that.
 2. Doesn't require the build system to dynamically generate new nodes on the build graph. The full set of nodes needs to be prepared beforehand, but edges can be added after the fact. The dependency scanning runs as part of the build and dynamically add those edges (e.g.: CMake does this with ninja's dyndep)
 3. Doesn't require the build system to compare the output of a command to see if it changed before deciding if a downstream target needs to be redone.
I don't think most build systems require *both* 2 and 3 at the same time. For example ninja requires 2, and not 3 (sort of*), and make requires 3 and not 2. I don't see anything wrong with requiring different approaches for different build systems that are adapted to the strengths and weaknesses of the underlying build system.

Would you mind explaining how an implementation with ninja would work? It would need to be able to perform the correct dependency scan (which depends on the list of importable headers and their arguments) for both the clean build (when we don't know what importable headers could be used by a TU) and for incremental builds afterwards, without a change in the list of importable headers and their arguments resulting in an invalidation of all translation units?

Sure. My POC modules build system using a per-TU scan is at https://github.com/RedBeard0531/cxx_modules_builder. I built it in a week as a skunkworks project. It has been untouched for even longer than the scanner, because there were no working scanners so it needed to use a very hacky "scanner" implementation (I shouldn't even call it that), which is what convinced me that I needed to try building a real one, which was much more work. While the docs mention a necessary patch to ninja, it was eventually merged and is now available in stock ninja.

For each header unit, which needed to be listed in the build.yml, it would emit 3 tasks: 1) a SCAN to scan it, 2) a HEADER_UNIT to build the .pcm file (I still think 2-phase is the best compilation strategy FWIW), and 3) a HEADER_UNIT_CXX to produce its .o file. (Again, the command used for "scanning" is very much not production grade, but I think that is the only part of this strategy that isn't viable)

(In case email makes a mess of the ninja file, or if you'd prefer to read with syntax highlighting, I put the full example file up at https://gist.github.com/RedBeard0531/53a21a01bff067ebe68bd509ae67fe8e)

rule SCAN
    command = $CXX $CXXFLAGS -w -E -x $KIND $in -o /dev/null -MD -MF $out.d -MT $out && $MAKE_NINJA --scan $out.d $in $out $PCMFLAGS_FILE
    description = SCAN $in
    deps = gcc
    depfile = $out.d
    restat = 1

    command = $CXX $CXXFLAGS @$PCMFLAGS_FILE $MODULE_CODEGEN -c -x c++-header -fmodule-name=$in --precompile $$(realpath $in) -o $out -MD -MF $out.d
    description = HEADER_UNIT $in
    deps = gcc
    depfile = $out.d
    restat = 1

    command = $CXX $DEBUGFLAGS -c $in -o $out
    description = HEADER_UNIT_CXX $out

build build/_usr_include_c++_v1_algorithm.pcm.dd: SCAN $
    /usr/include/c++/v1/algorithm | $CXX $MAKE_NINJA make_ninja.yml
  KIND = c++-header
  PCM_FILE = build/_usr_include_c++_v1_algorithm.pcm
  PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
build build/_usr_include_c++_v1_algorithm.pcm: HEADER_UNIT $
    /usr/include/c++/v1/algorithm | $CXX || $
    build/_usr_include_c++_v1_algorithm.pcm.dd maybe_mod_scans
  dyndep = build/_usr_include_c++_v1_algorithm.pcm.dd
  PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
build build/_usr_include_c++_v1_algorithm.o: HEADER_UNIT_CXX $
    build/_usr_include_c++_v1_algorithm.pcm | $CXX

You can find the logic used for generating the .dd and .flags files here

And for each declared source file, it would generate 3 similar but slightly different tasks: 1) MAYBE_MODULES_SCAN to scan, 2) CXXPRE to go from .cpp -> .pcm and 3) CXX to go from *either* .cpp or .pcm  -> .o.

    # Please don't look too closely at this command. Your eyes may bleed...
    command = $
        echo '#line 1 "$in"' > $out.rewrite.cpp && $
        LC_ALL=C sed $MODULE_SCAN_SED_SCRIPT < $in >> $out.rewrite.cpp && $
        $CXX $CXXFLAGS -w -E -x $KIND -iquote $$(dirname $in) -Xclang -main-file-name -Xclang $in $out.rewrite.cpp -o $out.ii -MD -MF $out.d -MT $out && $
        LC_ALL=C gawk $MODULE_SCAN_AWK_SCRIPT < $out.ii > $out.ii.post && $
        $MAKE_NINJA --maybe-module-scan $out.d $out.ii.post $in $out $PCMFLAGS_FILE
    description = MAYBE_MODULE_SCAN $in
    deps = gcc
    depfile = $out.d
    restat = 1

    command = $CXX $CXXFLAGS --precompile -x c++-module -c @$out.flags $in -MD -MF $out.d -MT $out && touch $out
    description = CXXPRE $in
    deps = gcc
    depfile = $out.d

# TODO split out cpp->o and pcm->o into separate rules and remove -Wno-unused-command-line-argument
rule CXX
    command = rm -f $out.d; $CXX $CXXFLAGS -Wno-unused-command-line-argument -c @$FLAGS_FILE -MD -MF $out.d -o $out && if [ ! -e $out.d ]; then echo '$out : ' > $out.d; fi
    description = CXX $in
    deps = gcc
    depfile = $out.d

build build/simple_duplicate.mpp.o.dd: MAYBE_MODULE_SCAN $
    simple/duplicate.mpp | $CXX $MAKE_NINJA make_ninja.yml
  KIND = c++-module
  PCM_FILE = build/simple_duplicate.mpp.o
  PCMFLAGS_FILE = build/simple_duplicate.mpp.o.flags
build build/simple_duplicate.mpp.o: CXX simple/duplicate.mpp | $CXX || $
    build/simple_duplicate.mpp.o.dd maybe_mod_scans
  dyndep = build/simple_duplicate.mpp.o.dd
  FLAGS_FILE = build/simple_duplicate.mpp.o.flags
build build/simple_duplicate.mpp.pcm: CXXPRE simple/duplicate.mpp | $CXX || $
    build/simple_duplicate.mpp.o.dd maybe_mod_scans
  dyndep = build/simple_duplicate.mpp.o.dd
  FLAGS_FILE = build/simple_duplicate.mpp.o.flags

The "maybe module scanner" used a bit of trickery to set up the .o.dd and .o.flags files to automatically use the 2-step build for non-module units and non-importable module units (ie `module blah;` without export or a partition), but use the 3 step build for other types of module units (this logic is at the bottom of the function).  Another interesting detail is that in order to allow all scanning (and flag generation) to happen in parallel is to make use of a "build/mod_links/" directory where the pcm file for a named module lives a "build/mod_links/module.name.pcm". This is why the CXXPRE and CXX commands don't have the "real" pcm file as $out/$in (respectively) in the flags passed to clang in the command. Instead, the scanner writes the actual pcm path as the output in the .pcm.flags file and the .pcm file (or raw .cpp file for 2-step) into the .o.flags file.

Admittedly there is a LOT going here, some of which might seem like black magic if isn't already understood, and some of it (especially the scanners!) is certainly hacky. I also think it would be a lot cleaner if ninja allowed dyndeps files to contribute variables to be filled in when evaluating the node's command line. But I think this approach will both work and provide ideal parallelism along with minimal and correct rebuilds (modulo a few cases called out in comments that I decided not to handle out of lazyness/time limits even though they would be possible).  Of course if you can find a counter example, I'd love to hear about it.
To be clear, this is an honest question, not rhetorical or confrontational. If someone can show me how to make this work, I'm ready to change my mind*, I am just not seeing how this can work.


* I am even open to ditching make if ninja accepts the patch for integrating with the make job server upstream. We currently can't use ninja in production because the outside orchestration uses the make job server to control parallelism across different projects.

I would also love it if that patch lands...