ISOCPP SG15 List: Re: P2898R0: Importable Headers are Not Universally Implementable

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Fri, 26 May 2023 07:46:57 -0400

On Fri, May 26, 2023 at 11:23:18 +0200, Mathias Stearn via SG15 wrote:
> On Wed, May 24, 2023 at 8:07 PM Jens Maurer via SG15 <sg15_at_[hidden]>
> wrote:
> > On 24/05/2023 16.23, Daniel Ruoso wrote:
> > > Now, let's imagine that we declare `foo.h` to be an importable header.
> >
> > And that's a user bug right there. foo.h is simply not supposed to
> > be treated as an importable header.
>
>
> I disagree pretty strongly for both philosophical and practical reasons. I
> think that the vast majority of headers should be treated as importable,
> the only exceptions are weird cases that really need textual inclusion like
> X-macros and the like.
>
> Philosophical: My take is that by declaring foo.h to be an importable
> header (and one where #includes should be treaded as imports) is that you
> want foo.h to have a single consistant meaning for an entire build (or at
> least for each program image). In this world you would be saying that qux.h
> doesn't get to change the meaning of foo.h, it must take the meaning
> defined by the builder of the program image. It is free to define FOO_ARG1
> for _its own use_ but that is explicitly prevented from affecting foo.h.
> And that is a good thing!

Note that we currently are in "flag soup" land where flags just kind of
"exist" and any semantic meaning is left up to anything caring to look
at them "long enough" (usually only the compiler cares; build systems
will look for specific flags that "matter" enough to them but generally
pass everything else on through after a quoting pass). If I have
TU-local flags, should they affect imported headers? It's hard to say
for any given flag without context. `-Dsome_local_feature_toggle=1`:
yeah, probably local. `-DEXPORT_MY_SYMBOLS`: depends…is the header
"mine" or "someone else's"? `-mtune=core2`: eh…maybe? `-mno-avx512`:
only if something checks intrinsics/relevant preprocessor state?
`-ffast-math`: ABI-affecting in principle, but maybe these APIs don't
talk floating point to each other.

So the core issue (to me) is separating out what flags apply where. We
have, historically, only had a set of strings to communicate these
things to the compiler. We've mostly just said "any sufficiently bad ODR
violation is detectable at link/test time" and swept it under the rug.
Now we need to figure out, when scanning, "does this flag matter to this
header import?" and, if so, consider whether this requires a new BMI for
each imported header. IIUC, Clang basically says a blanket "yes" to
this for anything but the absolutely most trivial of flags (flags like
`-v`, `-save-*`, `-print-*`, `-Q*`, `-time` fit here, but I have a hard
time seeing anything of "more than compiler introspection" importance
being in this set).

Now on the build side, we have a problem to solve here. We don't know
what headers are imported when generating the build graph (though we
know which are *importable*). We discover this when scanning…which is
done during the build. Now I know `llbuild` and `build2` are fancier and
can craft new "things to do" on the fly, but `make` and `ninja` are far
less capable here[1]. If we don't have a command prepared when we wrote
out the graph, we can't make it appear magically (without generating
new build graph descriptions and reporting "please insert token^W^Wrun
the build tool to continue"). So we need to know up-front what
*possible* BMIs may be needed and prepare commands for each. With
scanning, we'll know which we actually need and only run those.

The only way I can see this working is if there is some way of
determining which flags are "mine" versus "what the header wants" versus
"toolchain". CMake has some notion of the first two but the last is kind
of "whatever is in `CMAKE_CXX_FLAGS`" hoping that those flags and any
flags hidden in compiler wrappers don't do nasty things behind your
back. I can do this in CMake, but how do I tell a scanner about all of
these things:

- a list of importable headers
- flags to use for each of those headers (if they are indeed imported)
- flags for the TU we're scanning
- which of those flags are "local" versus "give to everything"
  (toolchain)

Note that *any* of this changing means that the scan needs to be
performed again[2] (this is the "if your set of importable headers
changes, it invalidates the whole build" claim because the scan
rerunning means you rerun the build unless you use something like
fingerprinting rather than timestamps). So…great, we can now properly
scan at least. But what are we scanning for? To know which BMIs to
actually build. For any given "header may be imported by a TU", I need
to construct a command to make a suitable BMI. If I cannot scrub
TU-local flags from affecting a BMI, I need to schedule a BMI per
importable header per target[3] that can "see" (and therefore import)
that header (this is using CMake visibility rules; nothing ISO C++
related). Maybe I can collapse things down and say "targets A and B can
share imported headers X, Y, and Z", but to do this I need to know if
some flag "matters" or not (and I have to assume that they all do,
including down to any relative ordering); see p2581r2 "Built Module
Interface Compatibility Identifiers".

So the state of "how header units can work in practice"[4] (AFAIK) is
currently something along the lines of:

- schedule a BMI build for every importable header "visible" to each
  target (because we have no idea which imports which when we write out
  these commands…unless you want to insert "remake the build graph" into
  the build path for any edit to a file which can use `import`);
- scan everything (importable headers per flag set that "sees" them
  (whether actually imported anywhere or not), TUs which may use
  `import`, and TUs which may use `export`) to discover the import
  graph;
- collate the scanning output from all of this to compute dependencies
  to BMI builds; and
- pass along this dependency information to the build tool and let it do
  the actual work.

The scariest parts, to me, are the first two points. There are a *lot*
of commands that need to be written out (most of which are never used),
and a *lot* of useless scanning (though we can't know ahead of time that
any of it is useless). What does this do to build times? No idea. I
haven't built it to find out hard numbers (not that I have compilers to
test it with anyways). But it doesn't have a good outlook to me.

Does it mean header units are impossible or broken? No. But it does, to
me, mean that they are far less useful because getting this
"transitional" bit working in a reliable way seems to be more work and
not actually improve build times (individual TU compilation may indeed
improve, but we're scheduling so much work to try and make it faster
that it probably washes out in the end) than just transitioning directly
to named modules.

--Ben

[1] `make` can do it with generated rules, but unifying consistent
usages is a race condition so you end up with "assume everything is
consistent", "every single import gets its own BMI", or "serialize
post-scan and unify consistent usages in…some way and have a really
fancy build-wide collator (phased appropriately for codegen from built
tools) to figure this out".

[2] I suppose one could remember whether a file was "important" to the
scan and elide it on changes to such things, but I have no idea how to
express "dependencies/flags which may be pruned if the primary
dependency doesn't change between runs" in `make` or `ninja` build
graphs.

[3] Technically per-file even, but I think I'm comfortable saying that
any source-specific flags are TU-local and therefore "ignorable" in
practice for BMI purposes even if compilers may be upset about it at the
moment.

[4] Outside of highly-controlled environments where flag consistency is
kept under control and given an expansive enough build graph (basically
"no external dependencies"). This basically becomes "Google-like
monorepos" which excludes all standard Linux distro patterns and any
project that doesn't vendor every C++ tidbit above the standard library;
good luck integrating two of these such projects into your own codebase
if they use different versions of Boost or some other common dependency.

Received on 2023-05-26 11:47:00