Really appreciate the write up!

Here's a few thoughts/minor details:

* The terms "source to source" and "source to binary" seem a bit confusing to me (when I see "source to source" I think of "source to source" conversions (like source to source optimizations that change source code to be better source code)) - I take it the unwritten extra word is "dependency" (eg: "source to source dependency" and "source to binary dependency") which reads well enough/makes sense in my head - though maybe "monorepo" (or "full source build" or the like) and "prebuilt dependencies" would also capture this in more common terminology? Not sure
* In the "Compilation Database" subsection, all the text after the quotation "tools based on the C++ abstract Syntax Tree" is lighter grey, rather than returning to black after the quotation (or perhaps the quotation is meant to be black too but picked up the colouring from wherever it was copy/pasted from).
* Cheap Module Discoverability: There's an extraneous "do" at the end of the second paragraph
* Cheap Module Discoverability: The second paragraph maybe could benefit from a few more words about "as there’s no expectation that this will represent additional parsing beyond what would be necessary for building the project anyway." - at least to my knowledge there's either a full pre-scan to discover the dependencies, or extra build files that explicitly state the dependencies so that the search is more restricted. Still some extra work - but it's bounded/feasible compared to the "source to binary" case where there's no way to answer "where are the modules".
* Cheap Module Discoverability R2: Anything to be said about the portability (or lack thereof) of such discovery? I guess the intent would be that it be similar to something like pkg-config, so using the standards cited earlier in the document with regard to compiler command line compatibility, etc (where those are available - and where they aren't, these things could be non-portable - having separate filesystem hierarchies for each set of instructions) - but yeah, perhaps that's getting too into the details. I appreciate that the document tries to avoid the weeds/details & seems to make a good choice about how much detail to go into, and how much not to. Ah, I see this comes up in R4 - so you're suggesting "within the scope of ABI uniformity" - so one set of configs for all the unixy, itanium compilers that already try to be command line compatible, another set for, say MSVC+clang-cl, etc.
*  Dependency Graph of Modules External to the Build: "Having to perform the discoverability of module dependencies topologically for modules outside of the current build will result in an undesirable performance penalty. Therefore it’s important that the module dependency graph should be exposed in a way that is cheaper to parse."
   My understanding is the spec was designed to make this cheap - to ensure that not a lot of C++ parsing complexity would be required to read the start of a module to discovery its dependencies. Do you find this not to be the case, or to be inadequate for practical purposes?
* Compiler-indepnedent module parsing: This might've conflated a few different things. The first paragraph talks about/suggests that an implementation-specific BMI would not be feasible for "source to binary" builds - though I think my generally understanding has been that this would still be viable, but these BMI artefacts would be local build product in some build-specific cache directory. Regardless (& why it might be more clear to omit this paragraph/detail from the document), the rest of this section seems relevant - yes, a module may need particular flags to be compiled correctly and those flags may differ from those used to build the consumer of the module (I'm not 100% sure that supporting this situation is necessary - for example for -D flags could be #defines inside the module rather than required to be passed externally - but arguably the pkg-config-like-thing could also disable or enable certain warning flags (but that could also be done with #pragmas - but may be tidier to be done in these separate pkg-config-like files))
* Cheap Parsing of Modules External to the Build: "However, when you are consuming a significant amount of modules from outside your build system, having to parse the original source code for all those modules can easily become a scalability problem."
  The goal is that modules reduces the redundancy within a given build or tool (ie: that build or tool can reuse module descriptions across a whole build of a given project) - but not beyond that scope.
  While some set of tools/compilers might interoperate on a single BMI - such as clang, clang-tidy, etc, could all read the same BMIs and so be cheaper than, say, a project that uses gcc to compile, but clang-tidy to do static analysis - beyond that it's not really feasible owing to the complexity and variability of the compiler internal representations.
  Perhaps with time folks will find some different tradeoffs in that space - but I don't think it'll happen in the medium term at least.
  (but perhaps I'm confusing the point here - since R7 talks about the "file-based module discoverability" having this feature, but this is talking about a feature of both the output format of the compiler (BMI) and also about where these output files are stored/accessed/how they can be concurrently generated, I take it? So that they could be shared between different builds happening on the same machine?)

On Wed, Jun 9, 2021 at 3:57 PM Daniel Ruoso via SG15 <sg15@lists.isocpp.org> wrote:
Hello,

This is still just a draft, and therefore it doesn't reflect an official position. But I'm looking for some early feedback.

The draft is attached as PDF, but here's a snippet from the introduction, which describes the paper:

---

Bloomberg has a code base with tens of thousands of independent C++ projects, which are integrated together with a package-manager approach with aggressive dependency rebuilds to ensure coherency of all those projects into what we call a “distribution snapshot”.

That distribution snapshot contains prebuilt artifacts. Most developers will create a build context that contains only the source code they’re expecting to change, and build against an installed location where artifacts (headers, archives, pkg-config files) are deployed.

This stands in contrast with the practice of a “monorepo”. At Bloomberg we explain that distinction by the categories “Source-to-Source Builds” versus “Source-to-Binary Builds”. We will start by exploring those concepts, and how they reflect real-world practice.

It is our understanding that Bloomberg’s experience is not dissimilar to most Free/Libre Open Source Software communities, so while this document is heavily influenced by our particular experience, this document tries to frame it in a way that is not Bloomberg-specific.

After we describe the state of the world before the adoption of modules, we will break down different requirements for the successful adoption of modules for organizations that follow similar patterns.

---

Let me know what you folks think,

daniel
_______________________________________________
SG15 mailing list
SG15@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg15