Really appreciate the write up!
Thanks!
* The terms "source to source" and "source to binary" seem a bit confusing to me
Yes, it is really hard to communicate the whole concept with a few words, I'll think more about this terminology and see how to make that more clear.
* Cheap Module Discoverability: The second paragraph maybe could benefit from a few more words [...] at least to my knowledge there's either a full pre-scan to discover the dependencies
Yes, but you would be scanning files that you are expecting to build anyway.
A bit, this might be getting a bit jumbled up with a few different directions that exist/are being implemented.
Google's Bazel-like monorepo (Clang Header Modules, rather than C++20 modules) starts with an explicit dependency graph from BUILD files, but then also does a prescan (it did this pre-modules too, to reduce the set of header files that needed to be used as input to a given compilation (one library might depend on libraries providing thousands of headers - but maybe only a handful of those headers are transitively used by a specific translation unit in the library)) restricted to that scope (as you say "files that you are expecting to build anyway" - not the whole monorepo.
clang-scan-deps (
https://llvm.org/devmtg/2019-04/slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf ) on the other hand works, as far as I understand it, by using the existing header include path search order (& then modulemap discovery from there) to figure out which modules are required - and can handle "source to binary" builds this way.
But that's only applicable when the build flags are the same & we have an established search path for headers and discovery process for finding the modulemap from the header - which is what motivates R2 to provide the equivalent thing for C++20 modules as is already available for headers and Clang Header Modules.
However, when you have unrelated modules files available in the system, you'd need to read all of them, just to throw it away because they were not actually going to be used by your build.
In other words, each libfoo-dev you install in your Debian system would make the build more expensive.
* Cheap Module Discoverability R2: [...] You're suggesting "within the scope of ABI uniformity" - so one set of configs for all the unixy, itanium compilers that already try to be command line compatible, another set for, say MSVC+clang-cl, etc.
Right, since we're trying to focus on requirements, I can't really say the need goes beyond that.
However, my engineering brain would very much appreciate if the interoperability became universal.
* Dependency Graph of Modules External to the Build: [...] My understanding is the spec was designed to make this cheap - to ensure that not a lot of C++ parsing complexity would be required to read the start of a module to discovery its dependencies. Do you find this not to be the case, or to be inadequate for practical purposes?
I'll review that part of the spec one more time, and either expand or remove the section.
* Compiler-indepnedent module parsing: [...] though I think my generally understanding has been that this would still be viable, but these BMI artefacts would be local build product in some build-specific cache directory.
Exactly, which works well for "source to source" builds, however how would libblabla-dev on a Debian system provide it when the BMI is not interoperable even on different minor versions of the same compiler?
It wouldn't - note that the BMI isn't even interoperable over different flags, potentially (optimizations, #defines, etc) - so it's not feasible to ship BMIs for all possible user configurations. (I think MSVC is shipping their BMI for some basic user configurations - makes building something like "hello world" a bit easier if the basic flag set is available precompiled - but still if you pass a few different compiler flags you'll have to build your own BMIs).
Pretty sure this has been the documented/explicit plan for vendors (certainly my understanding of the plan for Clang and GCC) so far: it's not supported to do a BMI-only distribution, build systems will always need to be able to build new BMIs under some configurations, even if in precompiled BMIs could be provided to cover a subset of cases.
a module may need particular flags to be compiled correctly and those flags may differ from those used to build the consumer of the module (I'm not 100% sure that supporting this situation is necessary
In "source to source" builds this happens naturally, because CMake would know that this module belongs to this target, and therefore would know to do the module parsing in the context of that target.
In "source to binary" we don't have the context of how that module is meant to be parsed, we only have the context for this particular translation unit.
Other than in something like a pkg-config file, yeah?
* Cheap Parsing of Modules External to the Build: [...] but this is talking about a feature of both the output format of the compiler (BMI) and also about where these output files are stored/accessed/how they can be concurrently generated, I take it? So that they could be shared between different builds happening on the same machine?
Correct, I think it's reasonable to have an "as optimized as possible" representation for the context of the current build. What I'm advocating for is an additional format that provides a middle ground between the BMI and having to do a complete parse. One example could be having the module file completely preprocessed already.
*nod* I can see how that's a nice to have - I don't think anyone's signed up/planning to build something like that as yet and I don't think it's necessary to the success of the feature - but like the LSP and DAP have come about and are making headway in simplifying debugger/IDE integration with a variety of tools, there's probably room for something similar with modules to facilitate certain amounts of tooling that couldn't otherwise parse modules or would significantly benefit from some summary information. Exactly how much/what use case sit supports is a bit trickier when it's an on-disk representation that must be generated ahead of time without live knowledge of the particular needs of these thin clients (whereas the LSP and DAP don't cause extra work for the implementation when features are unused).
- Dave