Re: [Tooling] [isocpp-modules] Path to modules with old bad build systems

From: Steve Downey
Date: Fri, 8 Mar 2019 06:54:48 -0500
We will need a mechanism for `/usr/include` style directories that have
multiple packages independently installed. Perhaps a `module.modulemap.d`
with a file per header? A single file registry will decay quickly.

On Fri, Mar 8, 2019, 01:10 Tom Honermann wrote:

> Thank you for this write up, Ben. This all strongly matches what I've
> been thinking.
> The listed constraints suggest some requirements:
> 1. Means to determine if a header file is a modular header unit.
> 2. Means to map modules IDs to module interface unit source file names.
> 3. That pre-built packages provide:
> 3.1. The above information for their modular headers and module interface
> units (packaged software can't require consumers to perform a scanning
> step).
> 3.2. The source files for module interface units (packaged software can't
> require consumers to consume BMIs).
> Clang modules has already solved these problems in a way that I believe 1)
> has proven deployable, 2) that programmers have shown willingness to use.
> The following is heavily based on how Clang modules works today.
> Elaborating on #1 above. P1103R2 states in 2.3.4p2:
> "When a *#include* appears within non-modular code, if the named header
> file is known to correspond to a legacy header unit, the implementation
> treats the *#include* as an import of the corresponding legacy header
> unit. The mechanism for discovering this correspondence is left
> implementation-defined; there are multiple viable strategies here (such as
> explicitly building legacy header modules and providing them as input to
> downstream compilations, or introducing accompanying files describing the
> legacy header structure) and we wish to encourage exploration of this
> space. An implementation is also permitted to not provide any mapping
> mechanism, and process each legacy header unit independently."
> For the purposes of the TR, we'll need to define a mechanism for
> nominating a header file as a header unit. Clang modules accomplishes this
> via a module.modulemap file that must be co-located with the header file.
> When processing a #include directive, Clang searches include paths normally
> for a matching header file and, if a module.modulemap file is present,
> scans it to see if the header file is associated with a defined module. If
> it is, then the header is treated as a header unit, otherwise (or if module
> support is disabled), it is treated as a traditional header. I like this
> approach for several reasons:
> 1. It doesn't require any new search paths. Existing include paths
> suffice.
> 2. Having the module map file co-located with the header files it governs
> avoids complicated path matching needs.
> If such a module map file is also used to map module IDs to module
> interface unit source files, then we can also avoid requiring separate
> search paths for module interface units. An obvious consequence of this is
> that module interface unit source files would need to be present in an
> include path.
> Here's a bikeshed example of a hypothetical module map file. Please
> ignore concerns regarding syntax for now.
> // Definition of a header unit module:
> header module foo {
> header: "foo.h";
> }
> // Definition of a module ID and corresponding module interface unit
> source file:
> module bar {
> source_file: "bar.cppmi";
> }
> In principle, if we establish strong conventions for associating module ID
> and module interface unit source file names, then we can allow implicitly
> defined modules without requiring them to be present in a module map file.
> Note that header units can not be implicit however since an implementation
> must assume that header files are just headers unless otherwise informed.
> With this model, a compiler/tool need only be supplied with include paths
> just as they are today. Resolving a module import (whether via #include,
> import <>, or import id) only requires scanning include paths for matching
> header names and/or module map files.
> Tom.
On 2/23/19 1:17 PM, Ben Craig wrote:
> I would like to find a way for users to decouple the upgrading of tools
> from the migration to modules. I've got a half-baked suggestion on how to
> do so. I think this has the potential to make the upgrade from C++17 to
> C++20 roughly the same cost to users as the upgrade from a C++14 to C++17.
> This was discussed some in the impromptu tooling session on Friday at Kona
> 2019.
> The no-build-system-upgrade constraint implies other constraints:
> 1. No up-front scanning of the source to find module name and dependency
> information, because a lot of current build systems don't currently have a
> scan step.
> 2. No dynamic dependencies between TUs. Many current build systems assume
> that the .cpp -> .o[bj] transformation is trivially parallelizable.
> 3. No upgrade of build tool executables. This has to work with versions
> of "make", "ninja", and "cmake" from 10+ years ago.
> 4. No drastically different file formats to parse (like binary module
> interfaces).
> 5. You _can_ add compiler / linker flags.
> The scheme I have in mind would result in no build throughput improvements
> with the old bad build systems, but I think it would still provide the
> isolation benefits of modules and be conforming. When the user is able to
> upgrade their build system, they can start getting the build throughput
> improvements.
> The general idea is to treat the module interface file as a glorified
> header (Gaby has mentioned this possibility in various venues). When the
> user passes --strawman-slow-modules to the compiler, the compiler does a
> textual inclusion of the module interface file (no BMI involved at all).
> The textual inclusion would likely involve placing a #pragma
> strawman-module begin(name-of-module) directive, with a #pragma
> strawman-module end(name-of-module) directive at the end of the module
> text. Each TU will duplicate this work. If the compiler can emit this
> text file, then it can be distributed using existing technologies that are
> expecting preprocessed files. This is similar in nature to clang's
> -frewrite-modueles (I think that's the right spelling)
> So this requires that compilers support this textual modules approach. It
> also requires that the compiler be able to find the module interface files
> without requiring the (dumb) build system to scan in advance. The
> "easiest" (and slow) way to make this happen is to require that module
> names correspond to file names, and that compilers provide a search path.
> I am well aware that this isn't fast, but this general scheme is intended
> for build system compatibility. Vendors should also provide a faster thing
> that can be used by newer build systems. Compilers can also provide a
> command line override to say where a creatively named module can be found.
> Users would still need to build each module (as they have to build each
> .cpp) in order for all symbols to get defined. This might disappoint some
> people that think that textual modules will provide behavior similar to
> "unity" / "blob" builds. Non-inline function definitions in an imported
> module wouldn't have a strong linker definition (wrong words there, sorry)
> in importers... they would only be provided in the TU that defines that
> module.
> All of this is intended to allow a fully conforming modules
> implementation. It also does not preclude additional build options
> intended for new, smart, fast, build systems. To the contrary, this is an
> area that I encourage investigation and research.
> Let me know if there are holes in this plan, and if it sounds reasonable
> to implement. Also let me know if this sounds like it won't help in
> keeping your existing tool or build system chugging along.
Received on 2019-03-08 12:55:04