ISOCPP SG15 List: Re: [isocpp-ext] Can we expect that all C++ source files can have the same suffix?

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Mon, 25 Apr 2022 21:19:05 -0400

[ What follows is a personal opinion, not that of my role on CMake. I am
  also not an implementor, but hopefully I can at least clear some
  things up from my experience as a build systems guy. ]

On Mon, Apr 25, 2022 at 18:41:46 -0400, Patrice Roy via Ext wrote:
> I think a piece from this discussion is missing : there seems to be strong
> resistance from some implementers as to supporting Tom's "Congrats Gaby"
> hello-world-style program that would only depend on a modularized standard
> library (let's leave Boost and other well-known but non-std libraries for
> the moment). This resistance would be hard to explain to users without
> knowing more about the reasons for this resistance.
>
> Would an implementer care to explain why this seems so unreasonable without
> a build system? Ideally, comparing the "old-style" (lexically included
> headers) approach to the modules-based approach.
>
> From this, it would at least be easier to explain to beginners why just
> compiling their simple, standard-library-only programs requires more
> tooling than it used to. Everyone would benefit from that knowledge, or so
> it seems to me. My users are game programmers; they are experienced, they
> use build systems, but they also compile small test code manually at the
> command line and if they cannot use modules for this, they will ask why and
> I would really like to have an answer. It's not a sand-castle vs skyscraper
> issue; it's something they will need to know to integrate it in their
> workflow.

Note that I go further than just "standard-library-only" here, but the
standard library is not immune to flags passed on the command line and
can transform itself based on things like `-ffast-math` and other
ABI-affecting flags that put it right back into the "cannot treat as
built-in" that are common enough that shipping prebuilts per
configuration is infeasible. Not to mention Linux setups where `clang`
uses `libstdc++` or Apple where `gcc` uses `libc++` where the stdlib is
suddenly *not* trivially "associated with the compiler" and is much
closer to "just another external dependency". Or that that some
toolchains have historically reused platform standard libraries rather
than bringing their own (IIRC, pre-OneAPI `icc` and `pgi` have done this
and though their direct applicability to C++20 is likely "low") or
projects such as STLPort which have been standalone standard library
implementations.

I find the "requires more tooling" to be because the standard refuses to
talk about code in anything other than in abstract "TU" components.
While this has merits, it does have its costs. Because the standard does
not talk about what `import foo;` means other than through verbiage like
"makes names reachable", handling such code isn't grounded in anything
beyond "implementers will provide mechanisms to make such imports have
meaning". It has no relation to filesystems (be it conventional or
"archive as filesystem" FUSE-like interfaces to other things that can be
treated as filesystems in some way), so there needs to be some mechanism
to translate `import foo;` into "here's what that means to this TU".
Right now, we only have flags like `-reference` (MSVC) or
`-fmodule-mapper=` (GCC) to specify these things, but filling these out
is the hard part. Now, the compiler can certainly try to answer this on
its own with some to-be-decided-upon rules, but C++ projects
historically end up throwing all kinds of semantically meaningful
metadata (read: compiler flags) on top of what *their* module means that
any default setup that make any such guess unsuitable for some
substantial portion of the userbase (cf. `FOO_IS_SHARED` defines for
library visibility macro expansion, `BUILT_WITH_SOME_OPTIONAL_DEP`
defines altering available APIs, `-Ofast` for some performance-sensitive
component, `WITH_DEBUG_MEMBERS`, etc.).

I'll also note that backwards compatibility has a *lot* of value in
minimizing churn of known-working code, but it also ends up welding
doors shut that one might want to use. Just as an example, the list
representation in CMake makes the `;` an absolute landmine and
complicates safely passing CMake values around). Would it be nice if one
could just do `cmake -Dfoo=${foo}` to pass it along to some build
command? Sure, but breaking every non-playground CMake project in the
process is not worth that price.

Could C++ have said things like "the source encoding must be compatible
with module lookup namespaces" or "filenames must correlate with module
names"? Sure. But then folks on non-utf-8 platforms or in non-Unicode
locales get upset. Could C++ have then said "modules must be self
contained" and allowed compilers to figure out what to do just based on
the source? Sure, but then there'd be things like `#pragma flag`
ifdeffery preludes or `/** semantic comment */` to do what is possible
today without some other more structured mechanism also being available
(for prior art, see Rust's `#![feature()]` and `#![cfg()]` attributes,
Haskell's `{-# LANGUAGE #-}` syntax, Python's magic `from __future__`
mechanism, or CMake's policy scopes).

But it didn't. Did everyone understand that C++ chose a module system
isomorphic to Fortran's instead of something like Python or Rust?
Unlikely. But it's what we have. I can foresee projects being built
using tools that cobble together a Rust-like or Haskell-like "here's a
pile of sources and high-level dependency metadata, please build it"
experience, but the problem happens when a project wants to interface
with external code *not* using this pattern. There has been all manners
of digital ink spilled about Cargo not "playing well" with
non-Rust-centered build systems (Cabal is largely the same, but,
rounding, "no one" is using Haskell in this way). Sure, Python and Rust
both have "here's some C or C++ code, please build it" helper tools, but
trying to use these to build existing projects that have long leveraged
the flexibility C and C++ builds have offered (say, HDF5) is like
bringing a squeaky toy toolbox to a construction site: it's just not
going to cut it for many widely-used existing projects. Consuming and
understanding extant external code is fraught under such a model and
that's where a lot of C++'s value is to large projects.

Now, what do I think it would take to make this stuff much more possible
within the limits we have? SG15 is discussing it. What has been proposed
(though I am not aware of a paper number as yet) is basically some
sidecar metadata to say something to the effect of "here is what C++
information is important to consume this project". Rust has this as
crate metadata (not typically distributed) and just needs to be told
"here is a compiled crate, please use", Python has some mechanism for
its packages as well (including `.pth` files and other things that have
accumulated over the years) that can supplement available packages.
These tools know how to handle this and consume it natively. Given that
there's already have an implementation out there, and the sidecar
metadata hasn't even been formally proposed, trying to mandate any such
metadata at this point is like starting to build a cart for a horse that
is already standing at the starting gate. So, C++ build systems are the
level at which this is dealt with at this point (though that doesn't
preclude some basic support from compilers themselves, it is not trivial
and I don't foresee implementers chomping at the bit to put even more
fractally detailed work onto their plates).

In short, I would describe it as "with great power comes great
responsibility". The power of modules to consume APIs more precisely
beyond being equivalent a fancy:

    xargs -a included-files cat | $(CC) /dev/stdin

and hoping everything seeing the same content gets the same idea of
what's going on now comes with the responsibility to tell the compiler
more about dependencies beyond "look here for API descriptions and this
file to the linker" and hoping none of the following have occurred:

  - specify flags that modify the headers in some meaningful way
  - gave the wrong library to the linker
  - gave different directories to different TUs for the same include
  - disagree on what other dependencies used in the API mean
    (`_ITERATOR_DEBUG_LEVEL`, Boost's `NDEBUG`-optional members, etc.)
  - give the wrong headers for the library (e.g., macOS's SDK Python
    headers with a Homebrew Python library)

Would it be ideal to just say something along the lines of:

    $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
      -c -o uses-boost.o \
      uses-boost.cpp
    $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
      -fdependency-metadata-output=uses-boost.1.0.0.json \
      -shared -o uses-boost.so \
      uses-boost.o

Yes, I'd love it. But we're not there yet and until then, we'll need
build systems to dig into any such `boost.latest.json` and translate
that into flags to pass to compilers that exist today. Unfortunately,
given the state of the standard library's sensitivity to consumer
patterns, it is also subject to such things. It could be supported in
the absolute simplest situations, but the path for that is *very* narrow
and people will stray off the beaten path far too easily
(`clang`/`clang-tidy` on Linux and `gcc` on macOS being the most common
I can think of without even considering compiler flag interactions).

--Ben

Received on 2022-04-26 01:19:08

sg15