sg15: Re: [SG15] Draft: Requirements for Usage of C++ Modules at Bloomberg

From: David Blaikie <dblaikie_at_[hidden]>
Date: Sat, 12 Jun 2021 12:31:01 -0700

Really appreciate the write up!

Here's a few thoughts/minor details:

* The terms "source to source" and "source to binary" seem a bit confusing
to me (when I see "source to source" I think of "source to source"
conversions (like source to source optimizations that change source code to
be better source code)) - I take it the unwritten extra word is
"dependency" (eg: "source to source dependency" and "source to binary
dependency") which reads well enough/makes sense in my head - though maybe
"monorepo" (or "full source build" or the like) and "prebuilt dependencies"
would also capture this in more common terminology? Not sure
* In the "Compilation Database" subsection, all the text after the
quotation "tools based on the C++ abstract Syntax Tree" is lighter grey,
rather than returning to black after the quotation (or perhaps the
quotation is meant to be black too but picked up the colouring from
wherever it was copy/pasted from).
* Cheap Module Discoverability: There's an extraneous "do" at the end of
the second paragraph
* Cheap Module Discoverability: The second paragraph maybe could benefit
from a few more words about "as there’s no expectation that this will
represent additional parsing beyond what would be necessary for building
the project anyway." - at least to my knowledge there's either a full
pre-scan to discover the dependencies, or extra build files that explicitly
state the dependencies so that the search is more restricted. Still some
extra work - but it's bounded/feasible compared to the "source to binary"
case where there's no way to answer "where are the modules".
* Cheap Module Discoverability R2: Anything to be said about the
portability (or lack thereof) of such discovery? I guess the intent would
be that it be similar to something like pkg-config, so using the standards
cited earlier in the document with regard to compiler command line
compatibility, etc (where those are available - and where they aren't,
these things could be non-portable - having separate filesystem hierarchies
for each set of instructions) - but yeah, perhaps that's getting too into
the details. I appreciate that the document tries to avoid the
weeds/details & seems to make a good choice about how much detail to go
into, and how much not to. Ah, I see this comes up in R4 - so you're
suggesting "within the scope of ABI uniformity" - so one set of configs for
all the unixy, itanium compilers that already try to be command line
compatible, another set for, say MSVC+clang-cl, etc.
* Dependency Graph of Modules External to the Build: "Having to perform
the discoverability of module dependencies topologically for modules
outside of the current build will result in an undesirable performance
penalty. Therefore it’s important that the module dependency graph should
be exposed in a way that is cheaper to parse."
   My understanding is the spec was designed to make this cheap - to ensure
that not a lot of C++ parsing complexity would be required to read the
start of a module to discovery its dependencies. Do you find this not to be
the case, or to be inadequate for practical purposes?
* Compiler-indepnedent module parsing: This might've conflated a few
different things. The first paragraph talks about/suggests that an
implementation-specific BMI would not be feasible for "source to binary"
builds - though I think my generally understanding has been that this would
still be viable, but these BMI artefacts would be local build product in
some build-specific cache directory. Regardless (& why it might be more
clear to omit this paragraph/detail from the document), the rest of this
section seems relevant - yes, a module may need particular flags to be
compiled correctly and those flags may differ from those used to build the
consumer of the module (I'm not 100% sure that supporting this situation is
necessary - for example for -D flags could be #defines inside the module
rather than required to be passed externally - but arguably the
pkg-config-like-thing could also disable or enable certain warning flags
(but that could also be done with #pragmas - but may be tidier to be done
in these separate pkg-config-like files))
* Cheap Parsing of Modules External to the Build: "However, when you are
consuming a significant amount of modules from outside your build system,
having to parse the original source code for all those modules can easily
become a scalability problem."
  The goal is that modules reduces the redundancy within a given build or
tool (ie: that build or tool can reuse module descriptions across a whole
build of a given project) - but not beyond that scope.
  While some set of tools/compilers might interoperate on a single BMI -
such as clang, clang-tidy, etc, could all read the same BMIs and so be
cheaper than, say, a project that uses gcc to compile, but clang-tidy to do
static analysis - beyond that it's not really feasible owing to the
complexity and variability of the compiler internal representations.
  Perhaps with time folks will find some different tradeoffs in that space
- but I don't think it'll happen in the medium term at least.
  (but perhaps I'm confusing the point here - since R7 talks about the
"file-based module discoverability" having this feature, but this is
talking about a feature of both the output format of the compiler (BMI) and
also about where these output files are stored/accessed/how they can be
concurrently generated, I take it? So that they could be shared between
different builds happening on the same machine?)

On Wed, Jun 9, 2021 at 3:57 PM Daniel Ruoso via SG15 <sg15_at_[hidden]>
wrote:

> Hello,
>
> This is still just a draft, and therefore it doesn't reflect an official
> position. But I'm looking for some early feedback.
>
> The draft is attached as PDF, but here's a snippet from the introduction,
> which describes the paper:
>
> ---
>
> Bloomberg has a code base with tens of thousands of independent C++
> projects, which are integrated together with a package-manager approach
> with aggressive dependency rebuilds to ensure coherency of all those
> projects into what we call a “distribution snapshot”.
>
> That distribution snapshot contains prebuilt artifacts. Most developers
> will create a build context that contains only the source code they’re
> expecting to change, and build against an installed location where
> artifacts (headers, archives, pkg-config files) are deployed.
>
> This stands in contrast with the practice of a “monorepo”. At Bloomberg we
> explain that distinction by the categories “Source-to-Source Builds” versus
> “Source-to-Binary Builds”. We will start by exploring those concepts, and
> how they reflect real-world practice.
>
> It is our understanding that Bloomberg’s experience is not dissimilar to
> most Free/Libre Open Source Software communities, so while this document is
> heavily influenced by our particular experience, this document tries to
> frame it in a way that is not Bloomberg-specific.
>
> After we describe the state of the world before the adoption of modules,
> we will break down different requirements for the successful adoption of
> modules for organizations that follow similar patterns.
>
> ---
>
> Let me know what you folks think,
>
> daniel
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>

Received on 2021-06-12 14:31:15