ISOCPP SG15 List: Re: P2898R0: Importable Headers are Not Universally Implementable

From: Mathias Stearn <redbeard0531+isocpp_at_[hidden]>
Date: Fri, 26 May 2023 17:39:46 +0200

On Fri, May 26, 2023 at 3:32 PM Daniel Ruoso <daniel_at_[hidden]> wrote:

> Em sex., 26 de mai. de 2023 às 06:36, Mathias Stearn via SG15 <
> sg15_at_[hidden]> escreveu:
>
>> Many of your messages seem to assume that there is a single dependency
>> scan. In this case by the use of the singular "the dependency scan" noun
>> phrase, but other messages have made it more explicitly. That isn't the
>> only model. You can also have a per-TU dependency scan.
>>
>
> I'm not sure what you mean there.
>

Megascan: clang-scan-deps compile-commands.json |
script-that-converts-its-output-into-ninja-dyndeps-files

Per TU scan: hypothetical-scan-deps my-file.cpp.or.h |
script-that-converts-its-output-into-ninja-dyndeps-files

With the "megascan" approach you do a single scan for the entire build that
outputs all of the dyndep files. Hopefully it internally does sufficient
caching so that incremental builds are very fast.

With the per-TU approach, each compile job (including for header units) has
its own scan task. This means that you only need to scan files that are
transitively imported by some root set of files. The moderately tricky
thing is that when you import a header, you need to be able to internally
handle that as-if the header is preprocessed in its own independent context
with its own set of flags. And because once you do scan my-file.cpp, you
will then need to compute the dyndeps for imported-header.hpp (which again,
is only scanned if anything ended up importing it) this relies on a lot of
data sharing between independent runs of hypothetical-scan-deps, which to
me, implies they are implemented by calling into a server (discussed below)

I will note that many of the late-breaking papers (eg
https://wg21.link/p1857r3) for modules in C++20 were intended to ensure
that a variety of very important optimizations and caching strategies would
still be correct-by-spec. For example, we were very careful to ensure that
it is possible to find anything that can affect the list of imports,
#includes and the TU's module name looking only at preprocessor directives,
and that the scanner can either quickly skip over non-directives and/or
cache a "skeleton" consisting of just those directives that could impact
the scan results.

> In my mind, the best way to do it is to have a persistent (local!) server
>> that caches the important information about each file (including
>> non-importable headers), and each per-TU scan task is really just a command
>> that calls into the server to evaluate the scan rooted at that TU.
>>
>
> I'm also not sure what you mean there.
>

Not sure if it would be super informative to you but this
<https://github.com/RedBeard0531/cxx_scanner> is the scanner I started on a
while ago, but haven't touched in years. While implementing a (partial)
preprocessor was a lot of fun, one of my conclusions is that it is
basically impossible to exactly match the behavior of a given compiler with
a separate impl, and that is critical. In addition to all of the IDB and UB
in [cpp], compilers don't offer any way to query for their supported
__has_cpp_attributes() and similar things. I also haven't spent much time
on it due to a change in life priorities since becoming a father. I have
another child due soon, which is why I will not be able to make it to Varna
even though it is close to home.

That said, I still think the general approach would be the best, but it
would require that each compiler make their own scanner server (or that
someone who has already invested significant resources in making a
preprocessor compatible with other compilers do so). The general idea was
to have a cache of two maps. One is cache from file path to an "AST" of the
preprocessor directives in that file, eg with an #if node with a condition
to evaluate and the then and else cases as children*. This means once all
of its files are in the first cache that you can very quickly "scan" a TU
by walking this cache of parsed ASTs without needing to do any I/O or
parsing large files where the vast majority of bytes are not preprocessor
directives. Additionally you would have a second cache of {file path, input
defines} -> {output defines, included headers, imported headers and
modules, declared module name and kind} (This is from 2-3 year old memory,
so I'm sure I'm forgetting some inputs and outputs, so please forgive any
inaccuracy). This would let you only process each header once for a given
set of input defines across the whole build. For header units and a
well-behaved build that means exactly once. And because this would be done
in a persistent server, you can take advantage of the shared caching across
"independent" per-TU scan tasks within a single build and across multiple
incremental builds.

Of course as with any cache, the devil is in the details. You would need to
know how and when to purge old and unneeded data, and you would need to be
very careful to detect when files had changed between builds. (If you are
editing or generating source files within a build, the onus is on you to
ensure that your dependencies ensure that the file is in a stable state
prior to the first command reading it in the build)

*I recognize that the standard doesn't require an #included file to be well
balanced internally, it only requires that when you finish a full TU that
it is well balanced. That said all files in practice tend to be well
balanced so it seems reasonable for an implementation to simply detect this
case and punt to a much slower implementation for TU that do nonsense like
that.

> 2. Doesn't require the build system to dynamically generate new nodes on
>>> the build graph. The full set of nodes needs to be prepared beforehand, but
>>> edges can be added after the fact. The dependency scanning runs as part of
>>> the build and dynamically add those edges (e.g.: CMake does this with
>>> ninja's dyndep)
>>> 3. Doesn't require the build system to compare the output of a command
>>> to see if it changed before deciding if a downstream target needs to be
>>> redone.
>>>
>> I don't think most build systems require *both* 2 and 3 at the same time.
>> For example ninja requires 2, and not 3 (sort of*), and make requires 3 and
>> not 2. I don't see anything wrong with requiring different approaches for
>> different build systems that are adapted to the strengths and weaknesses of
>> the underlying build system.
>>
>
> Would you mind explaining how an implementation with ninja would work? It
> would need to be able to perform the correct dependency scan (which depends
> on the list of importable headers and their arguments) for both the clean
> build (when we don't know what importable headers could be used by a TU)
> and for incremental builds afterwards, without a change in the list of
> importable headers and their arguments resulting in an invalidation of all
> translation units?
>

Sure. My POC modules build system using a per-TU scan is at
https://github.com/RedBeard0531/cxx_modules_builder. I built it in a week
as a skunkworks project. It has been untouched for even longer than the
scanner, because there were no working scanners so it needed to use a very
hacky "scanner" implementation
<https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/header.ninja#L8-L46>
(I shouldn't even call it that), which is what convinced me that I needed
to try building a real one, which was much more work. While the docs
mention a necessary patch to ninja, it was eventually merged
<https://github.com/ninja-build/ninja/pull/1534> and is now available in
stock ninja.

For each header unit
<https://github.com/RedBeard0531/cxx_modules_builder/blob/master/modules_builder.py#L127-L160>,
which needed to be listed in the build.yml, it would emit 3 tasks: 1) a
SCAN to scan it, 2) a HEADER_UNIT to build the .pcm file (I still think
2-phase is the best compilation strategy FWIW), and 3) a HEADER_UNIT_CXX to
produce its .o file. (Again, the command used for "scanning" is very much
not production grade, but I think that is the only part of this strategy
that isn't viable)

(In case email makes a mess of the ninja file, or if you'd prefer to read
with syntax highlighting, I put the full example file up at
https://gist.github.com/RedBeard0531/53a21a01bff067ebe68bd509ae67fe8e)

rule SCAN
    command = $CXX $CXXFLAGS -w -E -x $KIND $in -o /dev/null -MD -MF $out.d
-MT $out && $MAKE_NINJA --scan $out.d $in $out $PCMFLAGS_FILE
    description = SCAN $in
    deps = gcc
    depfile = $out.d
    restat = 1

rule HEADER_UNIT
    command = $CXX $CXXFLAGS @$PCMFLAGS_FILE $MODULE_CODEGEN -c -x
c++-header -fmodule-name=$in --precompile $$(realpath $in) -o $out -MD -MF
$out.d
    description = HEADER_UNIT $in
    deps = gcc
    depfile = $out.d
    restat = 1

rule HEADER_UNIT_CXX
    command = $CXX $DEBUGFLAGS -c $in -o $out
    description = HEADER_UNIT_CXX $out

build build/_usr_include_c++_v1_algorithm.pcm.dd: SCAN $
    /usr/include/c++/v1/algorithm | $CXX $MAKE_NINJA make_ninja.yml
  KIND = c++-header
  PCM_FILE = build/_usr_include_c++_v1_algorithm.pcm
  PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
build build/_usr_include_c++_v1_algorithm.pcm: HEADER_UNIT $
    /usr/include/c++/v1/algorithm | $CXX || $
    build/_usr_include_c++_v1_algorithm.pcm.dd maybe_mod_scans
  dyndep = build/_usr_include_c++_v1_algorithm.pcm.dd
  PCMFLAGS_FILE = build/_usr_include_c++_v1_algorithm.pcm.flags
build build/_usr_include_c++_v1_algorithm.o: HEADER_UNIT_CXX $
    build/_usr_include_c++_v1_algorithm.pcm | $CXX

You can find the logic used for generating the .dd and .flags files here
<https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L245-L283>

And for each declared source file
<https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L162-L198>,
it would generate 3 similar but slightly different tasks: 1)
MAYBE_MODULES_SCAN to scan, 2) CXXPRE to go from .cpp -> .pcm and 3) CXX to
go from *either* .cpp or .pcm -> .o.

rule MAYBE_MODULE_SCAN
    # Please don't look too closely at this command. Your eyes may bleed...
    command = $
        echo '#line 1 "$in"' > $out.rewrite.cpp && $
        LC_ALL=C sed $MODULE_SCAN_SED_SCRIPT < $in >> $out.rewrite.cpp && $
        $CXX $CXXFLAGS -w -E -x $KIND -iquote $$(dirname $in) -Xclang
-main-file-name -Xclang $in $out.rewrite.cpp -o $out.ii -MD -MF $out.d -MT
$out && $
        LC_ALL=C gawk $MODULE_SCAN_AWK_SCRIPT < $out.ii > $out.ii.post && $
        $MAKE_NINJA --maybe-module-scan $out.d $out.ii.post $in $out
$PCMFLAGS_FILE
    description = MAYBE_MODULE_SCAN $in
    deps = gcc
    depfile = $out.d
    restat = 1

rule CXXPRE
    command = $CXX $CXXFLAGS --precompile -x c++-module -c @$out.flags $in
-MD -MF $out.d -MT $out && touch $out
    description = CXXPRE $in
    deps = gcc
    depfile = $out.d

# TODO split out cpp->o and pcm->o into separate rules and remove
-Wno-unused-command-line-argument
rule CXX
    command = rm -f $out.d; $CXX $CXXFLAGS
-Wno-unused-command-line-argument -c @$FLAGS_FILE -MD -MF $out.d -o $out &&
if [ ! -e $out.d ]; then echo '$out : ' > $out.d; fi
    description = CXX $in
    deps = gcc
    depfile = $out.d

build build/simple_duplicate.mpp.o.dd: MAYBE_MODULE_SCAN $
    simple/duplicate.mpp | $CXX $MAKE_NINJA make_ninja.yml
  KIND = c++-module
  PCM_FILE = build/simple_duplicate.mpp.o
  PCMFLAGS_FILE = build/simple_duplicate.mpp.o.flags
build build/simple_duplicate.mpp.o: CXX simple/duplicate.mpp | $CXX || $
    build/simple_duplicate.mpp.o.dd maybe_mod_scans
  dyndep = build/simple_duplicate.mpp.o.dd
  FLAGS_FILE = build/simple_duplicate.mpp.o.flags
build build/simple_duplicate.mpp.pcm: CXXPRE simple/duplicate.mpp | $CXX ||
$
    build/simple_duplicate.mpp.o.dd maybe_mod_scans
  dyndep = build/simple_duplicate.mpp.o.dd
  FLAGS_FILE = build/simple_duplicate.mpp.o.flags

The "maybe module scanner
<https://github.com/RedBeard0531/cxx_modules_builder/blob/e13174805992c7aa10e8fbc9b1452cbfc0f66ecc/modules_builder.py#L320-L392>"
used a bit of trickery to set up the .o.dd and .o.flags files to
automatically use the 2-step build for non-module units and non-importable
module units (ie `module blah;` without export or a partition), but use the
3 step build for other types of module units (this logic is at the bottom
of the function). Another interesting detail is that in order to allow all
scanning (and flag generation) to happen in parallel is to make use of a
"build/mod_links/" directory where the pcm file for a named module lives a
"build/mod_links/module.name.pcm". This is why the CXXPRE and CXX commands
don't have the "real" pcm file as $out/$in (respectively) in the flags
passed to clang in the command. Instead, the scanner writes the actual pcm
path as the output in the .pcm.flags file and the .pcm file (or raw .cpp
file for 2-step) into the .o.flags file.

Admittedly there is a LOT going here, some of which might seem like black
magic if isn't already understood, and some of it (especially the
scanners!) is certainly hacky. I also think it would be a lot cleaner if
ninja allowed dyndeps files to contribute variables to be filled in when
evaluating the node's command line. But I think this approach will both
work and provide ideal parallelism along with minimal and correct rebuilds
(modulo a few cases called out in comments that I decided not to handle out
of lazyness/time limits even though they would be possible). Of course if
you can find a counter example, I'd love to hear about it.

> To be clear, this is an honest question, not rhetorical or
> confrontational. If someone can show me how to make this work, I'm ready to
> change my mind*, I am just not seeing how this can work.
>
> daniel
>
> * I am even open to ditching make if ninja accepts the patch for
> integrating with the make job server upstream. We currently can't use ninja
> in production because the outside orchestration uses the make job server to
> control parallelism across different projects.
>

I would also love it if that patch lands...

Received on 2023-05-26 15:40:03