sg15: Re: [Tooling] [isocpp-modules] Path to modules with old bad build systems

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 8 Mar 2019 01:10:09 -0500

Thank you for this write up, Ben. This all strongly matches what I've
been thinking.

The listed constraints suggest some requirements:
1. Means to determine if a header file is a modular header unit.
2. Means to map modules IDs to module interface unit source file names.
3. That pre-built packages provide:
3.1. The above information for their modular headers and module
interface units (packaged software can't require consumers to perform a
scanning step).
3.2. The source files for module interface units (packaged software
can't require consumers to consume BMIs).

Clang modules has already solved these problems in a way that I believe
1) has proven deployable, 2) that programmers have shown willingness to
use. The following is heavily based on how Clang modules works today.

Elaborating on #1 above. P1103R2 states in 2.3.4p2:

    "When a *#include* appears within non-modular code, if the named
    header file is known to correspond to a legacy header unit, the
    implementation treats the *#include* as an import of the
    corresponding legacy header unit. The mechanism for discovering this
    correspondence is left implementation-defined; there are multiple
    viable strategies here (such as explicitly building legacy header
    modules and providing them as input to downstream compilations, or
    introducing accompanying files describing the legacy header
    structure) and we wish to encourage exploration of this space. An
    implementation is also permitted to not provide any mapping
    mechanism, and process each legacy header unit independently."

For the purposes of the TR, we'll need to define a mechanism for
nominating a header file as a header unit. Clang modules accomplishes
this via a module.modulemap file that must be co-located with the header
file. When processing a #include directive, Clang searches include
paths normally for a matching header file and, if a module.modulemap
file is present, scans it to see if the header file is associated with a
defined module. If it is, then the header is treated as a header unit,
otherwise (or if module support is disabled), it is treated as a
traditional header. I like this approach for several reasons:
1. It doesn't require any new search paths. Existing include paths suffice.
2. Having the module map file co-located with the header files it
governs avoids complicated path matching needs.

If such a module map file is also used to map module IDs to module
interface unit source files, then we can also avoid requiring separate
search paths for module interface units. An obvious consequence of this
is that module interface unit source files would need to be present in
an include path.

Here's a bikeshed example of a hypothetical module map file. Please
ignore concerns regarding syntax for now.

// Definition of a header unit module:
header module foo {
   header: "foo.h";
}
// Definition of a module ID and corresponding module interface unit
source file:
module bar {
   source_file: "bar.cppmi";
}

In principle, if we establish strong conventions for associating module
ID and module interface unit source file names, then we can allow
implicitly defined modules without requiring them to be present in a
module map file. Note that header units can not be implicit however
since an implementation must assume that header files are just headers
unless otherwise informed.

With this model, a compiler/tool need only be supplied with include
paths just as they are today. Resolving a module import (whether via
#include, import <>, or import id) only requires scanning include paths
for matching header names and/or module map files.

Tom.

On 2/23/19 1:17 PM, Ben Craig wrote:
> I would like to find a way for users to decouple the upgrading of
> tools from the migration to modules. I've got a half-baked suggestion
> on how to do so. I think this has the potential to make the upgrade
> from C++17 to C++20 roughly the same cost to users as the upgrade from
> a C++14 to C++17. This was discussed some in the impromptu tooling
> session on Friday at Kona 2019.
>
> The no-build-system-upgrade constraint implies other constraints:
> 1. No up-front scanning of the source to find module name and
> dependency information, because a lot of current build systems don't
> currently have a scan step.
> 2. No dynamic dependencies between TUs. Many current build systems
> assume that the .cpp -> .o[bj] transformation is trivially parallelizable.
> 3. No upgrade of build tool executables. This has to work with
> versions of "make", "ninja", and "cmake" from 10+ years ago.
> 4. No drastically different file formats to parse (like binary module
> interfaces).
> 5. You _can_ add compiler / linker flags.
>
> The scheme I have in mind would result in no build throughput
> improvements with the old bad build systems, but I think it would
> still provide the isolation benefits of modules and be conforming.
> When the user is able to upgrade their build system, they can start
> getting the build throughput improvements.
>
> The general idea is to treat the module interface file as a glorified
> header (Gaby has mentioned this possibility in various venues). When
> the user passes --strawman-slow-modules to the compiler, the compiler
> does a textual inclusion of the module interface file (no BMI involved
> at all). The textual inclusion would likely involve placing a #pragma
> strawman-module begin(name-of-module) directive, with a #pragma
> strawman-module end(name-of-module) directive at the end of the module
> text. Each TU will duplicate this work. If the compiler can emit
> this text file, then it can be distributed using existing technologies
> that are expecting preprocessed files. This is similar in nature to
> clang's -frewrite-modueles (I think that's the right spelling)
>
> So this requires that compilers support this textual modules
> approach. It also requires that the compiler be able to find the
> module interface files without requiring the (dumb) build system to
> scan in advance. The "easiest" (and slow) way to make this happen is
> to require that module names correspond to file names, and that
> compilers provide a search path. I am well aware that this isn't
> fast, but this general scheme is intended for build system
> compatibility. Vendors should also provide a faster thing that can be
> used by newer build systems. Compilers can also provide a command
> line override to say where a creatively named module can be found.
>
> Users would still need to build each module (as they have to build
> each .cpp) in order for all symbols to get defined. This might
> disappoint some people that think that textual modules will provide
> behavior similar to "unity" / "blob" builds. Non-inline function
> definitions in an imported module wouldn't have a strong linker
> definition (wrong words there, sorry) in importers... they would only
> be provided in the TU that defines that module.
>
> All of this is intended to allow a fully conforming modules
> implementation. It also does not preclude additional build options
> intended for new, smart, fast, build systems. To the contrary, this
> is an area that I encourage investigation and research.
>
> Let me know if there are holes in this plan, and if it sounds
> reasonable to implement. Also let me know if this sounds like it
> won't help in keeping your existing tool or build system chugging along.
>
>
>
> _______________________________________________
> Modules mailing list
> Modules_at_[hidden]
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/modules
> Link to this post: http://lists.isocpp.org/modules/2019/02/0089.php

Received on 2019-03-08 07:10:13