ISOCPP SG15 List: Re: P2898R0: Importable Headers are Not Universally Implementable

From: Mathias Stearn <redbeard0531+isocpp_at_[hidden]>
Date: Fri, 26 May 2023 17:57:00 +0200

On Fri, May 26, 2023 at 5:39 PM Mathias Stearn <
redbeard0531+isocpp_at_[hidden]> wrote:

>
>
> On Fri, May 26, 2023 at 3:32 PM Daniel Ruoso <daniel_at_[hidden]> wrote:
>
>> Em sex., 26 de mai. de 2023 às 06:36, Mathias Stearn via SG15 <
>> sg15_at_[hidden]> escreveu:
>>
>>> Many of your messages seem to assume that there is a single dependency
>>> scan. In this case by the use of the singular "the dependency scan" noun
>>> phrase, but other messages have made it more explicitly. That isn't the
>>> only model. You can also have a per-TU dependency scan.
>>>
>>
>> I'm not sure what you mean there.
>>
>
> Megascan: clang-scan-deps compile-commands.json |
> script-that-converts-its-output-into-ninja-dyndeps-files
>
> Per TU scan: hypothetical-scan-deps my-file.cpp.or.h |
> script-that-converts-its-output-into-ninja-dyndeps-files
>
> With the "megascan" approach you do a single scan for the entire build
> that outputs all of the dyndep files. Hopefully it internally does
> sufficient caching so that incremental builds are very fast.
>
> With the per-TU approach, each compile job (including for header units)
> has its own scan task. This means that you only need to scan files that are
> transitively imported by some root set of files. The moderately tricky
> thing is that when you import a header, you need to be able to internally
> handle that as-if the header is preprocessed in its own independent context
> with its own set of flags. And because once you do scan my-file.cpp, you
> will then need to compute the dyndeps for imported-header.hpp (which again,
> is only scanned if anything ended up importing it) this relies on a lot of
> data sharing between independent runs of hypothetical-scan-deps, which to
> me, implies they are implemented by calling into a server (discussed below)
>
> I will note that many of the late-breaking papers (eg
> https://wg21.link/p1857r3) for modules in C++20 were intended to ensure
> that a variety of very important optimizations and caching strategies would
> still be correct-by-spec. For example, we were very careful to ensure that
> it is possible to find anything that can affect the list of imports,
> #includes and the TU's module name looking only at preprocessor directives,
> and that the scanner can either quickly skip over non-directives and/or
> cache a "skeleton" consisting of just those directives that could impact
> the scan results.
>
>
>> In my mind, the best way to do it is to have a persistent (local!) server
>>> that caches the important information about each file (including
>>> non-importable headers), and each per-TU scan task is really just a command
>>> that calls into the server to evaluate the scan rooted at that TU.
>>>
>>
>> I'm also not sure what you mean there.
>>
>
> Not sure if it would be super informative to you but this
> <https://github.com/RedBeard0531/cxx_scanner> is the scanner I started on
> a while ago, but haven't touched in years. While implementing a (partial)
> preprocessor was a lot of fun, one of my conclusions is that it is
> basically impossible to exactly match the behavior of a given compiler with
> a separate impl, and that is critical. In addition to all of the IDB and UB
> in [cpp], compilers don't offer any way to query for their supported
> __has_cpp_attributes() and similar things. I also haven't spent much time
> on it due to a change in life priorities since becoming a father. I have
> another child due soon, which is why I will not be able to make it to Varna
> even though it is close to home.
>
> That said, I still think the general approach would be the best, but it
> would require that each compiler make their own scanner server (or that
> someone who has already invested significant resources in making a
> preprocessor compatible with other compilers do so). The general idea was
> to have a cache of two maps. One is cache from file path to an "AST" of the
> preprocessor directives in that file, eg with an #if node with a condition
> to evaluate and the then and else cases as children*. This means once all
> of its files are in the first cache that you can very quickly "scan" a TU
> by walking this cache of parsed ASTs without needing to do any I/O or
> parsing large files where the vast majority of bytes are not preprocessor
> directives. Additionally you would have a second cache of {file path, input
> defines} -> {output defines, included headers, imported headers and
> modules, declared module name and kind} (This is from 2-3 year old memory,
> so I'm sure I'm forgetting some inputs and outputs, so please forgive any
> inaccuracy). This would let you only process each header once for a given
> set of input defines across the whole build. For header units and a
> well-behaved build that means exactly once. And because this would be done
> in a persistent server, you can take advantage of the shared caching across
> "independent" per-TU scan tasks within a single build and across multiple
> incremental builds.
>
> Of course as with any cache, the devil is in the details. You would need
> to know how and when to purge old and unneeded data, and you would need to
> be very careful to detect when files had changed between builds. (If you
> are editing or generating source files within a build, the onus is on you
> to ensure that your dependencies ensure that the file is in a stable state
> prior to the first command reading it in the build)
>
> *I recognize that the standard doesn't require an #included file to be
> well balanced internally, it only requires that when you finish a full TU
> that it is well balanced. That said all files in practice tend to be well
> balanced so it seems reasonable for an implementation to simply detect this
> case and punt to a much slower implementation for TU that do nonsense like
> that.
>
>
>> 2. Doesn't require the build system to dynamically generate new nodes on
>>>> the build graph. The full set of nodes needs to be prepared beforehand, but
>>>> edges can be added after the fact. The dependency scanning runs as part of
>>>> the build and dynamically add those edges (e.g.: CMake does this with
>>>> ninja's dyndep)
>>>> 3. Doesn't require the build system to compare the output of a command
>>>> to see if it changed before deciding if a downstream target needs to be
>>>> redone.
>>>>
>>> I don't think most build systems require *both* 2 and 3 at the same
>>> time. For example ninja requires 2, and not 3 (sort of*), and make requires
>>> 3 and not 2. I don't see anything wrong with requiring different approaches
>>> for different build systems that are adapted to the strengths and
>>> weaknesses of the underlying build system.
>>>
>>
>> Would you mind explaining how an implementation with ninja would work? It
>> would need to be able to perform the correct dependency scan (which depends
>> on the list of importable headers and their arguments) for both the clean
>> build (when we don't know what importable headers could be used by a TU)
>> and for incremental builds afterwards, without a change in the list of
>> importable headers and their arguments resulting in an invalidation of all
>> translation units?
>>
>
Sorry, there was a detail that I forgot to mention explicitly that is
important to understanding how this answers your questions. The key is that
this leans *very* heavily on ninja's restat=1 reature and a write_if_changed
function
<https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L92-L99>
in
the scanner to be sure to only touch files if their contents change. This
means that even if we need to redo a scan (which we would because all scans
depend on the yaml file holding the list of importable headers), we *don't*
need to recompile unless it actually would affect that compile.

It is possible that I could teach the build system to avoid re-scans if
just the list of header units changed in a way that didn't affect that TU's
scan. However, I haven't thought much about that because a) the list of HUs
changing should be moderately rare relative to the normal edit-compile-test
cycle, b) I expect the scanner to do enough of its own caching and
work-sharing that rescanning isn't catastrophic, and c) this was a 1-week
POC and there where much more important problems to solve first.

>
> Sure. My POC modules build system using a per-TU scan is at
> https://github.com/RedBeard0531/cxx_modules_builder. I built it in a week
> as a skunkworks project. It has been untouched for even longer than the
> scanner, because there were no working scanners so it needed to use a very
> hacky "scanner" implementation
> <https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/header.ninja#L8-L46>
> (I shouldn't even call it that), which is what convinced me that I needed
> to try building a real one, which was much more work. While the docs
> mention a necessary patch to ninja, it was eventually merged
> <https://github.com/ninja-build/ninja/pull/1534> and is now available in
> stock ninja.
>
> For each header unit
> <https://github.com/RedBeard0531/cxx_modules_builder/blob/master/modules_builder.py#L127-L160>,
> which needed to be listed in the build.yml, it would emit 3 tasks: 1) a
> SCAN to scan it, 2) a HEADER_UNIT to build the .pcm file (I still think
> 2-phase is the best compilation strategy FWIW), and 3) a HEADER_UNIT_CXX to
> produce its .o file. (Again, the command used for "scanning" is very much
> not production grade, but I think that is the only part of this strategy
> that isn't viable)
>
> (In case email makes a mess of the ninja file, or if you'd prefer to read
> with syntax highlighting, I put the full example file up at
> https://gist.github.com/RedBeard0531/53a21a01bff067ebe68bd509ae67fe8e)
>
> rule SCAN
> command = $CXX $CXXFLAGS -w -E -x $KIND $in -o /dev/null -MD -MF
> $out.d -MT $out && $MAKE_NINJA --scan $out.d $in $out $PCMFLAGS_FILE
> description = SCAN $in
> deps = gcc
> depfile = $out.d
> restat = 1
>
> rule HEADER_UNIT
> command = $CXX $CXXFLAGS @$PCMFLAGS_FILE $MODULE_CODEGEN -c -x
> c++-header -fmodule-name=$in --precompile $$(realpath $in) -o $out -MD -MF
> $out.d
> description = HEADER_UNIT $in
> deps = gcc
> depfile = $out.d
> restat = 1
>
> rule HEADER_UNIT_CXX
> command = $CXX $DEBUGFLAGS -c $in -o $out
> description = HEADER_UNIT_CXX $out
>
> build build/_usr_include_c++_v1_algorithm.pcm.dd: SCAN $
> /usr/include/c++/v1/algorithm | $CXX $MAKE_NINJA make_ninja.yml
> KIND = c++-header
> PCM_FILE = build/_usr_include_c++_v1_algorithm.pcm
> PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
> build build/_usr_include_c++_v1_algorithm.pcm: HEADER_UNIT $
> /usr/include/c++/v1/algorithm | $CXX || $
> build/_usr_include_c++_v1_algorithm.pcm.dd maybe_mod_scans
> dyndep = build/_usr_include_c++_v1_algorithm.pcm.dd
> PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
> build build/_usr_include_c++_v1_algorithm.o: HEADER_UNIT_CXX $
> build/_usr_include_c++_v1_algorithm.pcm | $CXX
>
> You can find the logic used for generating the .dd and .flags files here
> <https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L245-L283>
>
> And for each declared source file
> <https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L162-L198>,
> it would generate 3 similar but slightly different tasks: 1)
> MAYBE_MODULES_SCAN to scan, 2) CXXPRE to go from .cpp -> .pcm and 3) CXX to
> go from *either* .cpp or .pcm -> .o.
>
> rule MAYBE_MODULE_SCAN
> # Please don't look too closely at this command. Your eyes may bleed...
> command = $
> echo '#line 1 "$in"' > $out.rewrite.cpp && $
> LC_ALL=C sed $MODULE_SCAN_SED_SCRIPT < $in >> $out.rewrite.cpp && $
> $CXX $CXXFLAGS -w -E -x $KIND -iquote $$(dirname $in) -Xclang
> -main-file-name -Xclang $in $out.rewrite.cpp -o $out.ii -MD -MF $out.d -MT
> $out && $
> LC_ALL=C gawk $MODULE_SCAN_AWK_SCRIPT < $out.ii > $out.ii.post &&
> $
> $MAKE_NINJA --maybe-module-scan $out.d $out.ii.post $in $out
> $PCMFLAGS_FILE
> description = MAYBE_MODULE_SCAN $in
> deps = gcc
> depfile = $out.d
> restat = 1
>
> rule CXXPRE
> command = $CXX $CXXFLAGS --precompile -x c++-module -c @$out.flags $in
> -MD -MF $out.d -MT $out && touch $out
> description = CXXPRE $in
> deps = gcc
> depfile = $out.d
>
> # TODO split out cpp->o and pcm->o into separate rules and remove
> -Wno-unused-command-line-argument
> rule CXX
> command = rm -f $out.d; $CXX $CXXFLAGS
> -Wno-unused-command-line-argument -c @$FLAGS_FILE -MD -MF $out.d -o $out &&
> if [ ! -e $out.d ]; then echo '$out : ' > $out.d; fi
> description = CXX $in
> deps = gcc
> depfile = $out.d
>
> build build/simple_duplicate.mpp.o.dd: MAYBE_MODULE_SCAN $
> simple/duplicate.mpp | $CXX $MAKE_NINJA make_ninja.yml
> KIND = c++-module
> PCM_FILE = build/simple_duplicate.mpp.o
> PCMFLAGS_FILE = build/simple_duplicate.mpp.o.flags
> build build/simple_duplicate.mpp.o: CXX simple/duplicate.mpp | $CXX || $
> build/simple_duplicate.mpp.o.dd maybe_mod_scans
> dyndep = build/simple_duplicate.mpp.o.dd
> FLAGS_FILE = build/simple_duplicate.mpp.o.flags
> build build/simple_duplicate.mpp.pcm: CXXPRE simple/duplicate.mpp | $CXX
> || $
> build/simple_duplicate.mpp.o.dd maybe_mod_scans
> dyndep = build/simple_duplicate.mpp.o.dd
> FLAGS_FILE = build/simple_duplicate.mpp.o.flags
>
> The "maybe module scanner
> <https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L320-L392>"
> used a bit of trickery to set up the .o.dd and .o.flags files to
> automatically use the 2-step build for non-module units and non-importable
> module units (ie `module blah;` without export or a partition), but use the
> 3 step build for other types of module units (this logic is at the bottom
> of the function). Another interesting detail is that in order to allow all
> scanning (and flag generation) to happen in parallel is to make use of a
> "build/mod_links/" directory where the pcm file for a named module lives a
> "build/mod_links/module.name.pcm". This is why the CXXPRE and CXX commands
> don't have the "real" pcm file as $out/$in (respectively) in the flags
> passed to clang in the command. Instead, the scanner writes the actual pcm
> path as the output in the .pcm.flags file and the .pcm file (or raw .cpp
> file for 2-step) into the .o.flags file.
>
> Admittedly there is a LOT going here, some of which might seem like black
> magic if isn't already understood, and some of it (especially the
> scanners!) is certainly hacky. I also think it would be a lot cleaner if
> ninja allowed dyndeps files to contribute variables to be filled in when
> evaluating the node's command line. But I think this approach will both
> work and provide ideal parallelism along with minimal and correct rebuilds
> (modulo a few cases called out in comments that I decided not to handle out
> of lazyness/time limits even though they would be possible). Of course if
> you can find a counter example, I'd love to hear about it.
>
>
>> To be clear, this is an honest question, not rhetorical or
>> confrontational. If someone can show me how to make this work, I'm ready to
>> change my mind*, I am just not seeing how this can work.
>>
>> daniel
>>
>> * I am even open to ditching make if ninja accepts the patch for
>> integrating with the make job server upstream. We currently can't use ninja
>> in production because the outside orchestration uses the make job server to
>> control parallelism across different projects.
>>
>
> I would also love it if that patch lands...
>

Received on 2023-05-26 15:57:14