sg15: Re: [SG15] Draft: Requirements for Usage of C++ Modules at Bloomberg

From: David Blaikie <dblaikie_at_[hidden]>
Date: Sat, 12 Jun 2021 15:29:23 -0700

On Sat, Jun 12, 2021 at 2:18 PM Daniel Ruoso <daniel_at_[hidden]> wrote:

> On Sat, Jun 12, 2021, 15:31 David Blaikie <dblaikie_at_[hidden]> wrote:
>
>> Really appreciate the write up!
>>
>
> Thanks!
>
> * The terms "source to source" and "source to binary" seem a bit confusing
>> to me
>>
>
> Yes, it is really hard to communicate the whole concept with a few words,
> I'll think more about this terminology and see how to make that more clear.
>
> * Cheap Module Discoverability: The second paragraph maybe could benefit
>> from a few more words [...] at least to my knowledge there's either a full
>> pre-scan to discover the dependencies
>>
>
> Yes, but you would be scanning files that you are expecting to build
> anyway.
>

A bit, this might be getting a bit jumbled up with a few different
directions that exist/are being implemented.

Google's Bazel-like monorepo (Clang Header Modules, rather than C++20
modules) starts with an explicit dependency graph from BUILD files, but
then also does a prescan (it did this pre-modules too, to reduce the set of
header files that needed to be used as input to a given compilation (one
library might depend on libraries providing thousands of headers - but
maybe only a handful of those headers are transitively used by a specific
translation unit in the library)) restricted to that scope (as you say
"files that you are expecting to build anyway" - not the whole monorepo.

clang-scan-deps (
https://llvm.org/devmtg/2019-04/slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf
) on the other hand works, as far as I understand it, by using the existing
header include path search order (& then modulemap discovery from there) to
figure out which modules are required - and can handle "source to binary"
builds this way.

But that's only applicable when the build flags are the same & we have an
established search path for headers and discovery process for finding the
modulemap from the header - which is what motivates R2 to provide the
equivalent thing for C++20 modules as is already available for headers and
Clang Header Modules.

> However, when you have unrelated modules files available in the system,
> you'd need to read all of them, just to throw it away because they were not
> actually going to be used by your build.
>
> In other words, each libfoo-dev you install in your Debian system would
> make the build more expensive.
>

> * Cheap Module Discoverability R2: [...] You're suggesting "within the
>> scope of ABI uniformity" - so one set of configs for all the unixy, itanium
>> compilers that already try to be command line compatible, another set for,
>> say MSVC+clang-cl, etc.
>>
>
> Right, since we're trying to focus on requirements, I can't really say the
> need goes beyond that.
>
> However, my engineering brain would very much appreciate if the
> interoperability became universal.
>
> * Dependency Graph of Modules External to the Build: [...] My
>> understanding is the spec was designed to make this cheap - to ensure that
>> not a lot of C++ parsing complexity would be required to read the start of
>> a module to discovery its dependencies. Do you find this not to be the
>> case, or to be inadequate for practical purposes?
>>
>
> I'll review that part of the spec one more time, and either expand or
> remove the section.
>
> * Compiler-indepnedent module parsing: [...] though I think my generally
>> understanding has been that this would still be viable, but these BMI
>> artefacts would be local build product in some build-specific cache
>> directory.
>>
>
> Exactly, which works well for "source to source" builds, however how would
> libblabla-dev on a Debian system provide it when the BMI is not
> interoperable even on different minor versions of the same compiler?
>

It wouldn't - note that the BMI isn't even interoperable over different
flags, potentially (optimizations, #defines, etc) - so it's not feasible to
ship BMIs for all possible user configurations. (I think MSVC is shipping
their BMI for some basic user configurations - makes building something
like "hello world" a bit easier if the basic flag set is available
precompiled - but still if you pass a few different compiler flags you'll
have to build your own BMIs).

Pretty sure this has been the documented/explicit plan for vendors
(certainly my understanding of the plan for Clang and GCC) so far: it's not
supported to do a BMI-only distribution, build systems will always need to
be able to build new BMIs under some configurations, even if in precompiled
BMIs could be provided to cover a subset of cases.

> a module may need particular flags to be compiled correctly and those
>> flags may differ from those used to build the consumer of the module (I'm
>> not 100% sure that supporting this situation is necessary
>>
>
> In "source to source" builds this happens naturally, because CMake would
> know that this module belongs to this target, and therefore would know to
> do the module parsing in the context of that target.
>
> In "source to binary" we don't have the context of how that module is
> meant to be parsed, we only have the context for this particular
> translation unit.
>

Other than in something like a pkg-config file, yeah?

* Cheap Parsing of Modules External to the Build: [...] but this is talking
>> about a feature of both the output format of the compiler (BMI) and also
>> about where these output files are stored/accessed/how they can be
>> concurrently generated, I take it? So that they could be shared between
>> different builds happening on the same machine?
>>
>
> Correct, I think it's reasonable to have an "as optimized as possible"
> representation for the context of the current build. What I'm advocating
> for is an additional format that provides a middle ground between the BMI
> and having to do a complete parse. One example could be having the module
> file completely preprocessed already.
>

*nod* I can see how that's a nice to have - I don't think anyone's signed
up/planning to build something like that as yet and I don't think it's
necessary to the success of the feature - but like the LSP and DAP have
come about and are making headway in simplifying debugger/IDE integration
with a variety of tools, there's probably room for something similar with
modules to facilitate certain amounts of tooling that couldn't otherwise
parse modules or would significantly benefit from some summary information.
Exactly how much/what use case sit supports is a bit trickier when it's an
on-disk representation that must be generated ahead of time without live
knowledge of the particular needs of these thin clients (whereas the LSP
and DAP don't cause extra work for the implementation when features are
unused).

- Dave

Received on 2021-06-12 17:29:37