ISOCPP SG15 List: Re: P2898R0: Importable Headers are Not Universally Implementable

From: Bret Brown <mail_at_[hidden]>
Date: Tue, 23 May 2023 22:44:35 -0400

Two broad points:

First, Tom refers to singular dependency scanner, build system, and target
implementation. In practice, projects can and do target multiples of each,
and build systems might not even exist. I also consider static analysis
tools, interactive editing environments, remote build workflows, language
servers from other toolchains, and so on. I cannot bring myself to
hand-wave that this is a simple matter of programming when these tools all
need to support professional development workflows by presenting coherent
understanding of a codebase to the end user. I can see how treating
importable headers just an optimization on textual inclusion provides
coherency. Alternatively, I can see how consistent and portable rules about
implicit importation are coherent as well. I'm not seeing other designs
that work, just statements of faith that they should and do in cases where
there's a sufficiently smart build system. Speaking of evidence, I think
evidence that importable headers work well outside of well-governed and
comprehensive build systems (i.e., monorepos) is thin so far. Specifics
would help a lot here.

Second, asking for that evidence is partly tongue in cheek because I think
the biggest point against importable headers is the paucity of investment
to support them at the tooling and training level. I'm willing to be proven
wrong though. For instance, maybe parties who find this feature
straightforward and supportable can find ways to contribute build system
enhancements, importability linters, packaging metadata, how-to guides, and
so on to bring them closer to being settled and teachable technology.
Otherwise, we might end up with a feature that is disappointingly niche (or
dead) for practical reasons if not technical ones. At some point, that's
reason enough to destandardize or at least narrow the scope of the feature,
at least in my mind.

In other words, why are we making significant progress [1] on making named
modules widely useful but not making the same headway on importable headers?

I do appreciate all the engagement on this issue, by the way. It's clear to
me that the path forward is by improving our collective understanding of
the opportunities and challenges here.

Bret

[1] I feel like investment in named modules is also below what is
justified, but at least that feature is getting somewhere.

On Tue, May 23, 2023, 21:54 Tom Honermann via SG15 <sg15_at_[hidden]>
wrote:

> On 5/23/23 11:54 AM, Daniel Ruoso wrote:
>
> Em seg., 22 de mai. de 2023 às 16:17, Tom Honermann<tom_at_[hidden]> <tom_at_[hidden]> escreveu:
>
> First, I don't know what is meant by "universally implementable". The paper doesn't offer a definition of the term and its use within the paper doesn't present an intuitive meaning, at least not for me (note that "universally implementable" appears only in the title and in an assertion in section 2.1 in the discussion of #pragma once).
>
> The Abstract covers what I mean. Essentially, the C++ specification
> needs to be implementable in all the places where C++ is currently
> used, and the specification of Importable Headers currently fails that
> criteria. It is only implementable in a subset of environments where
> C++ is used.
>
> What subset of environments is it not implementable in? The paper doesn't
> actually say as far as I can tell. What is a concrete example?
>
> If the concern is that dependency scanning for importable headers is dependent on which implementation the result of the dependency scanning will be used to construct a build system for, then I agree that dependency scanning does not produce an implementation independent result.
>
> It's not about it being implementation-independent or not, it's about
> the costs being prohibitive in a lot of environments where C++ is
> used. Everything is technically possible, but every decision has
> costs, some of which may be unaffordable to some use cases. IMHO, the
> costs of Importable Headers are unaffordable for environments that use
> open-ended build systems, such as systems that have dependencies
> expressed entirely by producing binary artifacts that are used as
> dependencies of other builds. This is the case at Bloomberg, but it's
> also the case for systems like Conan, vcpkg, as well as most GNU/Linux
> distributions.
>
> The paper doesn't quantify costs in any way. The closest it comes is
> (correctly) noting that dependency scanning for header units requires
> computing imported macros in a bottom up way that is not required for named
> modules. But the paper doesn't quantify that cost. Is it a 5% hit? A 95%
> hit? Linear with respect to the number of header units? Other papers have
> offered quantification; see P1441 (Are modules fast?)
> <https://wg21.link/p1441> for example.
>
> But I also don't find that to be particularly concerning either; absent a (possibly implementation dependent and often complicated) include search path, it is not possible, in general, to universally map #include directives to source files either.
>
> Correct. Prior to modules, that had a specific cost. The build system
> needs to incorporate the source inclusions into the dependency graph
> after the initial compilation such that incremental builds are done
> correctly.
>
> With named modules, the build system needs to be able to incorporate
> dependencies *prior* to a clean build. This required a change in
> ninja, for instance. And this is what has allowed us to have a proper
> plan on how we're going to get that implemented.
>
> Indeed; I believe it is equivalent to requirements for support of
> generated headers.
>
> With Importable Headers, however, the dependency scanning has itself a
> dependency on additional information before any source file is even
> read, I'll dive into this in the next paragraphs.
>
> Section 2.1 states:
>
> This identity problem has always been a complicated topic for the C++ specification, the `#pragma once` directive has been supported by various implementations in varying degrees of compatibility, but it cannot be universally implemented because we don’t have a way of specifying what is the thing that should be included only “once” given the way that header search works.
>
> This issue does exist (and has for a very long time) but I don't see how it is relevant. Since header to source file mapping is implementation-defined, a dependency scanner or build system must match the implementation-defined behavior for the targeted implementations.
>
> The dependency scanning needs to be able to map how the importable
> header is specified in order to understand which `#include` directives
> can be turned into imports.
>
> That means we need to establish an identity between the files actually
> opened by the dependency scanning process to the tokens used in both
> `#include` and `import` directives before the dependency scanning even
> runs.
>
> Yes. This is what I meant by "a dependency scanner or build system must
> match the implementation-defined behavior for the targeted implementations."
>
> Section 2.2 states:
>
> The cost of that approach, however, is that we create a significant bottleneck in the dependency chain of any given object. Changing the list of Importable Headers or the Local Preprocessor Arguments for any one of them will result in a complete invalidation of the dependency scanning for all translation units in the project.
>
> I think the bold text is not quite correct. Changing whether a header is importable or not only invalidates the TUs that include/import it.
>
> Not quite, it invalidates the dependency scanning itself, because the
> switch from source inclusion to import (even if it's still spelled as
> `#include` in the source) can change the way the preprocessor handles
> the code after the import, which can change the output of the
> dependency scanning.
>
> The problem, again, is that this information is needed before the
> dependency scanning even runs. If that information is an input to the
> dependency scanning, it means changing it invalidates it.
>
> However, the impacted TUs are only those that already had a #include
> directive for the header. TUs that didn't are not affected. I do appreciate
> that optimizing for this case requires understanding precisely what changed
> and that the simplest approach is to just perform the whole scan. This
> seems like a QoI issue to me.
>
> Likewise, changing how a header unit is built only invalidates the TUs that import it (which may affect additional TUs if the importing TU is also an importable one).
>
> Again, changing the initial preprocessor state of the importable
> header unit invalidates the dependency scanning itself for the entire
> project, since the dependency scanning has to emulate the behavior of
> the import. And the dependency scanning needs to run before we know
> anything else about the code.
>
> Per above, it seems that we disagree on this point.
>
> The problem is equivalent to an update to a generated header file.
> Generated headers likewise need to be built and scanned as part of
> dependency generation. But updating one of them (via an update to the
> generator or another of the inputs) doesn't invalidate the dependency
> information for the entire project; it just invalidates the dependencies
> for the TUs that include the generated header.
>
> So, instead of "will result", I would substitute "might result". I don't see any reason why a build system can't cache these results and update them when they are found to be violated;
>
> I'm not sure what you mean by that. The way that this works for named
> modules is:
>
> 1. You start with the compiler command line for the translation unit,
> and use that to calculate the command line for dependency scanning
> (input is only the build configuration)
> 2. The dependency scanning for that translation unit produces P1689.
> The inputs at this point are just the command line options and the
> contents of the file. The output is only module-level dependencies,
> because source inclusion is only needed to augment the incremental
> build.
> 3. The dependency information is collated by the build system in order
> to generate the input module map for each TU. This, again, does not
> depend on source inclusion information. And if the build system uses a
> predictable naming scheme for modules, it doesn't need to know how the
> other TU was produced, it can just leave the dependency edge to be
> tracked by the build system.
>
> That sounds right given the restrictions on module units in
> [module.import]p1 <http://eel.is/c++draft/module.import#1>.
>
> The end result is that for the initial build, the only inputs for a
> translation unit are the command line, the sources for the translation
> unit and the files provided by other targets in the build system that
> are listed in the module map.
>
> As well as any generated header files that are included in the TU
> (typically none of course).
>
> When Importable Headers are introduced, we have an input to the
> dependency scanning process, which means that the rule to produce the
> module mapper depends on the list of importable headers and the
> arguments to those. The module mapper data is an input to the actual
> translation, which will invalidate the translation unit itself.
>
> No, the invalidation only occurs if the TU actually uses that input.
>
> The end result is that any change to that input invalidates the builds
> of all translation units in the project.
>
> Per prior statements, I disagree.
>
> Some build systems may be able to work around that issue in various
> ways. The way Clang Header Modules and the early MSVC adoption works
> around it is by assuming that the dependency scanning doing a plain
> source inclusion is equivalent to the actual dependencies that will be
> needed later. It is considered a user error if that is not true.
>
> MSVC and Clang with explicit modules assumes that, yes. Clang modules is
> smarter about it when modules are implicitly built. I haven't played with
> Clang with explicitly built modules to know if it diagnoses incorrect
> assumptions.
>
> IMHO, that is not a "correct" implementation of the specification. The
> specification requires a correct dependency scanning to emulate the
> import, by starting a new preprocessor context based on the arguments
> used when translating the header unit and merging the final state into
> the state of the TU doing the import. Which means it needs to know the
> list of header units and the arguments for each of those.
>
> I would argue that is a QoI concern, but I otherwise agree.
>
> Doing the correct thing will mean any environment using a build system
> that doesn't know how to optimize based on the contents of the output
> of each command will have to invalidate the entire build whenever
> information about importable headers change.
>
> Ah, that we agree on.
>
> I would like to see some real numbers before concluding that there is a problem to be solved though.
>
> A change in the information about importable headers causing the
> entire build to be invalidated is catastrophically bad, and will be
> completely unaffordable in many environments where C++ is used.
>
> Come on. You have to do better than that. It certainly isn't
> catastrophically bad if the scanning step only takes 2 seconds.
>
> Header units don't have to perform as well as named modules; they just have to perform better than source inclusion and have an adoption cost less than named modules to be a potentially attractive and viable solution.
>
> The comparison is not on Header Units versus Source Inclusion, the
> comparison is on what happens in the real adoption on existing build
> systems without the need to completely reinvent how those work.
>
> That's the thing. Named modules *do* require reinventing build systems.
> None of the major projects I've worked on except for Clang has used a build
> system that can handle named modules or be reasonably extended to do so.
> But they can all handle (implicitly built) header units just fine and get
> an advantage by doing so.
>
> Section 2.3 states:
>
> This is going to be particularly challenging if the ecosystem ends up in a situation where different compilers make different choices about how to handle the implicit replacement of `#include` by the equivalent `import`.
>
> Every implementation will need to be told which header files are importable.
>
> I don't mean how they take it as input. But it is
> implementation-dependent whether an `#include` is or isn't replaced by
> the equivalent `import`, which means that the same project using the
> same information about what are the importable headers to be used
> could still end up with different results.
>
> Agreed. The dependency scanner and build system must be aligned with the
> behavior of the implementation they target.
>
> Section 3.1 states:
>
> The main restriction that enables a interoperable use of pre-compiled headers is that the translation unit has to use it as a preamble to the translation unit, meaning the precompiled header is guaranteed not to be influenced by any other code in the translation unit
>
> I think I understand what you are trying to say here, but I find the wording awkward.
>
> The important bit about pre-compiled headers is that the source
> inclusion is fundamentally equivalent to using the pre-compiled
> header.
>
> If used correctly, that is true. But I think it is just as true for header
> units when used correctly.
>
> This is not true for Clang Header Modules (in principle, it's
> assumed the user will only choose to declare clang header modules for
> cases where that should be true). It is also not true for Importable
> Headers in general. There are two entirely different semantics which
> can result in entirely different programs.
>
> Kind of. The intent is that, if a header file is importable, then #include
> and import will behave the same and, if the header file is not
> importable, #include does source inclusion and import is ill-formed. So,
> when used correctly and consistently, there is only one semantic that is
> expressed.
>
> The first sentence is not quite true. Clang modules allow for BMI creation and selection to be dependent on macro definitions defined for the importing TU. See the config_macros declaration.
>
> The important point is that the working of the preprocessor is
> fundamentally different from that of source inclusion. And that so far
> all early experiences rely on the user not selecting the wrong header
> to be importable, and there is no mechanism to detect whether the user
> chose the wrong thing or not. It will just fail to compile if you're
> lucky, or result in ODR violations otherwise.
>
> Agreed, much like source inclusion results in ODR violations when a header
> file is included in multiple TUs that have distinct preprocessor state that
> the header file is sensitive to. Header units don't (fully) solve that
> problem (they do a little bit thanks to some of the semantic differences
> such as not being sensitive to the preprocessor state of the importing TU
> and preventing use of macros that are differently defined between the
> importing TU and an imported TU).
>
> Tom.
>
> daniel
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>

Received on 2023-05-24 02:44:49