Two broad points:
First, Tom refers to singular dependency scanner, build system, and target implementation.
Not quite. My intent was to state that they must behave
consistently; for example, the dependency scanner, build system,
and compiler must all be in sync when it comes to how #include directives are mapped to source
files. Likewise, they must be in sync when it comes to whether a
particular #include directive is
rewritten to an import declaration
or whether an import declaration
names an importable header. Basically, they all require the same
input with regard to (a subset of) the parameters of the abstract
machine.
I can't speak to use of C++20 header units or explicit builds of Clang modules. What I can say is that, while at Coverity, I triaged many bug reports from customers that were using Clang modules with implicit module builds (the bug reports were unrelated to use of modules in most cases). I can't offer a report count or a percentage. (I can also say that the inability to use preprocessed output to reproduce a problem encountered in the field is a major PITA).In practice, projects can and do target multiples of each, and build systems might not even exist. I also consider static analysis tools, interactive editing environments, remote build workflows, language servers from other toolchains, and so on. I cannot bring myself to hand-wave that this is a simple matter of programming when these tools all need to support professional development workflows by presenting coherent understanding of a codebase to the end user. I can see how treating importable headers just an optimization on textual inclusion provides coherency. Alternatively, I can see how consistent and portable rules about implicit importation are coherent as well. I'm not seeing other designs that work, just statements of faith that they should and do in cases where there's a sufficiently smart build system. Speaking of evidence, I think evidence that importable headers work well outside of well-governed and comprehensive build systems (i.e., monorepos) is thin so far. Specifics would help a lot here.
Maybe that lack of support is because, in practice, implicitly built modules, despite their limitations, actually work pretty well.
Second, asking for that evidence is partly tongue in cheek because I think the biggest point against importable headers is the paucity of investment to support them at the tooling and training level. I'm willing to be proven wrong though. For instance, maybe parties who find this feature straightforward and supportable can find ways to contribute build system enhancements, importability linters, packaging metadata, how-to guides, and so on to bring them closer to being settled and teachable technology. Otherwise, we might end up with a feature that is disappointingly niche (or dead) for practical reasons if not technical ones. At some point, that's reason enough to destandardize or at least narrow the scope of the feature, at least in my mind.
In other words, why are we making significant progress [1] on making named modules widely useful but not making the same headway on importable headers?
Cultural concerns are a possibility; the people working on this might just have a preference for named modules.
Tom.
I do appreciate all the engagement on this issue, by the way. It's clear to me that the path forward is by improving our collective understanding of the opportunities and challenges here.
Bret
[1] I feel like investment in named modules is also below what is justified, but at least that feature is getting somewhere.
On Tue, May 23, 2023, 21:54 Tom Honermann via SG15 <sg15@lists.isocpp.org> wrote:
_______________________________________________On 5/23/23 11:54 AM, Daniel Ruoso wrote:
What subset of environments is it not implementable in? The paper doesn't actually say as far as I can tell. What is a concrete example?Em seg., 22 de mai. de 2023 às 16:17, Tom Honermann <tom@honermann.net> escreveu:First, I don't know what is meant by "universally implementable". The paper doesn't offer a definition of the term and its use within the paper doesn't present an intuitive meaning, at least not for me (note that "universally implementable" appears only in the title and in an assertion in section 2.1 in the discussion of #pragma once).The Abstract covers what I mean. Essentially, the C++ specification needs to be implementable in all the places where C++ is currently used, and the specification of Importable Headers currently fails that criteria. It is only implementable in a subset of environments where C++ is used.
The paper doesn't quantify costs in any way. The closest it comes is (correctly) noting that dependency scanning for header units requires computing imported macros in a bottom up way that is not required for named modules. But the paper doesn't quantify that cost. Is it a 5% hit? A 95% hit? Linear with respect to the number of header units? Other papers have offered quantification; see P1441 (Are modules fast?) for example.If the concern is that dependency scanning for importable headers is dependent on which implementation the result of the dependency scanning will be used to construct a build system for, then I agree that dependency scanning does not produce an implementation independent result.It's not about it being implementation-independent or not, it's about the costs being prohibitive in a lot of environments where C++ is used. Everything is technically possible, but every decision has costs, some of which may be unaffordable to some use cases. IMHO, the costs of Importable Headers are unaffordable for environments that use open-ended build systems, such as systems that have dependencies expressed entirely by producing binary artifacts that are used as dependencies of other builds. This is the case at Bloomberg, but it's also the case for systems like Conan, vcpkg, as well as most GNU/Linux distributions.
Indeed; I believe it is equivalent to requirements for support of generated headers.But I also don't find that to be particularly concerning either; absent a (possibly implementation dependent and often complicated) include search path, it is not possible, in general, to universally map #include directives to source files either.Correct. Prior to modules, that had a specific cost. The build system needs to incorporate the source inclusions into the dependency graph after the initial compilation such that incremental builds are done correctly. With named modules, the build system needs to be able to incorporate dependencies *prior* to a clean build. This required a change in ninja, for instance. And this is what has allowed us to have a proper plan on how we're going to get that implemented.
With Importable Headers, however, the dependency scanning has itself a dependency on additional information before any source file is even read, I'll dive into this in the next paragraphs.Yes. This is what I meant by "a dependency scanner or build system must match the implementation-defined behavior for the targeted implementations."Section 2.1 states:This identity problem has always been a complicated topic for the C++ specification, the `#pragma once` directive has been supported by various implementations in varying degrees of compatibility, but it cannot be universally implemented because we don’t have a way of specifying what is the thing that should be included only “once” given the way that header search works.This issue does exist (and has for a very long time) but I don't see how it is relevant. Since header to source file mapping is implementation-defined, a dependency scanner or build system must match the implementation-defined behavior for the targeted implementations.The dependency scanning needs to be able to map how the importable header is specified in order to understand which `#include` directives can be turned into imports. That means we need to establish an identity between the files actually opened by the dependency scanning process to the tokens used in both `#include` and `import` directives before the dependency scanning even runs.
However, the impacted TUs are only those that already had a #include directive for the header. TUs that didn't are not affected. I do appreciate that optimizing for this case requires understanding precisely what changed and that the simplest approach is to just perform the whole scan. This seems like a QoI issue to me.Section 2.2 states:The cost of that approach, however, is that we create a significant bottleneck in the dependency chain of any given object. Changing the list of Importable Headers or the Local Preprocessor Arguments for any one of them will result in a complete invalidation of the dependency scanning for all translation units in the project.I think the bold text is not quite correct. Changing whether a header is importable or not only invalidates the TUs that include/import it.Not quite, it invalidates the dependency scanning itself, because the switch from source inclusion to import (even if it's still spelled as `#include` in the source) can change the way the preprocessor handles the code after the import, which can change the output of the dependency scanning. The problem, again, is that this information is needed before the dependency scanning even runs. If that information is an input to the dependency scanning, it means changing it invalidates it.
Likewise, changing how a header unit is built only invalidates the TUs that import it (which may affect additional TUs if the importing TU is also an importable one).Again, changing the initial preprocessor state of the importable header unit invalidates the dependency scanning itself for the entire project, since the dependency scanning has to emulate the behavior of the import. And the dependency scanning needs to run before we know anything else about the code.Per above, it seems that we disagree on this point.
The problem is equivalent to an update to a generated header file. Generated headers likewise need to be built and scanned as part of dependency generation. But updating one of them (via an update to the generator or another of the inputs) doesn't invalidate the dependency information for the entire project; it just invalidates the dependencies for the TUs that include the generated header.
That sounds right given the restrictions on module units in [module.import]p1.So, instead of "will result", I would substitute "might result". I don't see any reason why a build system can't cache these results and update them when they are found to be violated;I'm not sure what you mean by that. The way that this works for named modules is: 1. You start with the compiler command line for the translation unit, and use that to calculate the command line for dependency scanning (input is only the build configuration) 2. The dependency scanning for that translation unit produces P1689. The inputs at this point are just the command line options and the contents of the file. The output is only module-level dependencies, because source inclusion is only needed to augment the incremental build. 3. The dependency information is collated by the build system in order to generate the input module map for each TU. This, again, does not depend on source inclusion information. And if the build system uses a predictable naming scheme for modules, it doesn't need to know how the other TU was produced, it can just leave the dependency edge to be tracked by the build system.
As well as any generated header files that are included in the TU (typically none of course).The end result is that for the initial build, the only inputs for a translation unit are the command line, the sources for the translation unit and the files provided by other targets in the build system that are listed in the module map.
No, the invalidation only occurs if the TU actually uses that input.When Importable Headers are introduced, we have an input to the dependency scanning process, which means that the rule to produce the module mapper depends on the list of importable headers and the arguments to those. The module mapper data is an input to the actual translation, which will invalidate the translation unit itself.
Per prior statements, I disagree.The end result is that any change to that input invalidates the builds of all translation units in the project.
MSVC and Clang with explicit modules assumes that, yes. Clang modules is smarter about it when modules are implicitly built. I haven't played with Clang with explicitly built modules to know if it diagnoses incorrect assumptions.Some build systems may be able to work around that issue in various ways. The way Clang Header Modules and the early MSVC adoption works around it is by assuming that the dependency scanning doing a plain source inclusion is equivalent to the actual dependencies that will be needed later. It is considered a user error if that is not true.
I would argue that is a QoI concern, but I otherwise agree.IMHO, that is not a "correct" implementation of the specification. The specification requires a correct dependency scanning to emulate the import, by starting a new preprocessor context based on the arguments used when translating the header unit and merging the final state into the state of the TU doing the import. Which means it needs to know the list of header units and the arguments for each of those.
Ah, that we agree on.Doing the correct thing will mean any environment using a build system that doesn't know how to optimize based on the contents of the output of each command will have to invalidate the entire build whenever information about importable headers change.
Come on. You have to do better than that. It certainly isn't catastrophically bad if the scanning step only takes 2 seconds.I would like to see some real numbers before concluding that there is a problem to be solved though.A change in the information about importable headers causing the entire build to be invalidated is catastrophically bad, and will be completely unaffordable in many environments where C++ is used.
That's the thing. Named modules do require reinventing build systems. None of the major projects I've worked on except for Clang has used a build system that can handle named modules or be reasonably extended to do so. But they can all handle (implicitly built) header units just fine and get an advantage by doing so.Header units don't have to perform as well as named modules; they just have to perform better than source inclusion and have an adoption cost less than named modules to be a potentially attractive and viable solution.The comparison is not on Header Units versus Source Inclusion, the comparison is on what happens in the real adoption on existing build systems without the need to completely reinvent how those work.
Agreed. The dependency scanner and build system must be aligned with the behavior of the implementation they target.Section 2.3 states:This is going to be particularly challenging if the ecosystem ends up in a situation where different compilers make different choices about how to handle the implicit replacement of `#include` by the equivalent `import`.Every implementation will need to be told which header files are importable.I don't mean how they take it as input. But it is implementation-dependent whether an `#include` is or isn't replaced by the equivalent `import`, which means that the same project using the same information about what are the importable headers to be used could still end up with different results.
If used correctly, that is true. But I think it is just as true for header units when used correctly.Section 3.1 states:The main restriction that enables a interoperable use of pre-compiled headers is that the translation unit has to use it as a preamble to the translation unit, meaning the precompiled header is guaranteed not to be influenced by any other code in the translation unitI think I understand what you are trying to say here, but I find the wording awkward.The important bit about pre-compiled headers is that the source inclusion is fundamentally equivalent to using the pre-compiled header.
Kind of. The intent is that, if a header file is importable, then #include and import will behave the same and, if the header file is not importable, #include does source inclusion and import is ill-formed. So, when used correctly and consistently, there is only one semantic that is expressed.This is not true for Clang Header Modules (in principle, it's assumed the user will only choose to declare clang header modules for cases where that should be true). It is also not true for Importable Headers in general. There are two entirely different semantics which can result in entirely different programs.
The first sentence is not quite true. Clang modules allow for BMI creation and selection to be dependent on macro definitions defined for the importing TU. See the config_macros declaration.The important point is that the working of the preprocessor is fundamentally different from that of source inclusion. And that so far all early experiences rely on the user not selecting the wrong header to be importable, and there is no mechanism to detect whether the user chose the wrong thing or not. It will just fail to compile if you're lucky, or result in ODR violations otherwise.Agreed, much like source inclusion results in ODR violations when a header file is included in multiple TUs that have distinct preprocessor state that the header file is sensitive to. Header units don't (fully) solve that problem (they do a little bit thanks to some of the semantic differences such as not being sensitive to the preprocessor state of the importing TU and preventing use of macros that are differently defined between the importing TU and an imported TU).
Tom.
daniel
SG15 mailing list
SG15@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg15