Date: Sat, 23 Feb 2019 16:05:51 -0500
On Sat, Feb 23, 2019 at 18:17:54 +0000, Ben Craig wrote:
> I would like to find a way for users to decouple the upgrading of
> tools from the migration to modules. I've got a half-baked suggestion
> on how to do so. I think this has the potential to make the upgrade
> from C++17 to C++20 roughly the same cost to users as the upgrade from
> a C++14 to C++17. This was discussed some in the impromptu tooling
> session on Friday at Kona 2019.
Following is responses based on my understanding, please point out where
I'm wrong if I've missed or misunderstood.
I think it should be made clear that this is an upgrade path suggestion,
not how modules should be expected to work in the farther future after
build systems and tools support better build strategies for modules. We
should be encouraging build systems and tools to support these new ones
and not to rely on the following mechanisms forever.
> The no-build-system-upgrade constraint implies other constraints:
> 1. No up-front scanning of the source to find module name and
> dependency information, because a lot of current build systems don't
> currently have a scan step.
I think compilers should be encouraged to provide these tools. This is
about providing mechanisms which do not require them for correct builds
(I'm not worried about efficient builds for this build module because by
not doing the scanning, you're already forfeiting that kind of build).
> 2. No dynamic dependencies between TUs. Many current build systems
> assume that the .cpp -> .o[bj] transformation is trivially
> parallelizable.
This means that the compiler will be doing BMI (or whatever) compilation
behind the scenes implicitly. It also means that the build system is
opting to not care about that. This does not mean that what is actually
going on has to be a completely opaque black box: the compiler can keep
track of what it's doing in whatever backend communication channel it
has for coordinating multiple invocations and provide a log of its work.
I'm imagining something like a hypothetical `module_commands.json`
format analogous to `compile_commands.json`.
> 3. No upgrade of build tool executables. This has to work with
> versions of "make", "ninja", and "cmake" from 10+ years ago.
I'd recommend taking CMake off of this list. CMake requires more
intimate knowledge of how compilers work than "simple" build executors
like make and ninja. For example, CMake from a few years ago has support
for telling compilers "please enable C++11", but because C++17 was not a
thing, there is no functionality in those CMake-provided mechanisms for
C++17. There are ways to do it, but I don't believe the answer from the
CMake triage side is likely to be anything other than "upgrade CMake to
a version which supports the features you're requesting".
I'll also note that CMake implements Fortran modules with POSIX make
(possibly even nmake as well, it appears so from the code, but I'd like
to double check that) since 2008 using an similar strategy as in D1483,
so the newer approach is already possible there. It is ninja which
requires a patch (which has already been around for 3 years, just not
merged upstream, but not for lack of trying). Other implementations of
ninja (e.g., shake and samurai, but I'm sure others exist) likely need
updating.
> 4. No drastically different file formats to parse (like binary module
> interfaces).
Exposed to the build system at least. I imagine there will be internal
formats for persisting this information on disk. I think the best that
can be asked for is documentation, but I don't know that is something
that build systems can require (tools doing static analysis may need to
know however).
> 5. You _can_ add compiler / linker flags.
But whether this makes different BMIs is up to the implementation. This
process is already asking for the compiler to do all this work. If the
compiler determines that compiler and linker flags require separate BMI
files, that is part of their implementation of this strategy.
> The scheme I have in mind would result in no build throughput
> improvements with the old bad build systems, but I think it would
> still provide the isolation benefits of modules and be conforming.
> When the user is able to upgrade their build system, they can start
> getting the build throughput improvements.
Agreed. My reading is that this allows for old build systems to stay
"correct", but for a possible severe drop in how much of an "efficient"
build it can provide until they can support the new build strategies.
> The general idea is to treat the module interface file as a glorified
> header (Gaby has mentioned this possibility in various venues). When
> the user passes --strawman-slow-modules to the compiler, the compiler
> does a textual inclusion of the module interface file (no BMI involved
> at all). The textual inclusion would likely involve placing a #pragma
> strawman-module begin(name-of-module) directive, with a #pragma
> strawman-module end(name-of-module) directive at the end of the module
> text. Each TU will duplicate this work. If the compiler can emit
> this text file, then it can be distributed using existing technologies
> that are expecting preprocessed files. This is similar in nature to
> clang's -frewrite-modueles (I think that's the right spelling)
I think a concrete example would be good here. Note that it has been
mentioned that -frewrite-modules has been declared as a debugging
facility in Clang, so something like -frewrite-modules=for-dist or
something may be required.
I also wonder how this is substantially different than a scan step
(though the derivation of discovered dependencies is not required for
this).
> So this requires that compilers support this textual modules approach.
> It also requires that the compiler be able to find the module
> interface files without requiring the (dumb) build system to scan in
> advance. The "easiest" (and slow) way to make this happen is to
> require that module names correspond to file names, and that compilers
> provide a search path. I am well aware that this isn't fast, but this
> general scheme is intended for build system compatibility. Vendors
> should also provide a faster thing that can be used by newer build
> systems.
I'd rather that the new model be suggested to be made available first so
that build systems and tools can start working on it. All the
infrastructure on the compiler side feels, to me, like it would take
longer than build system and tool upgrades given compilers to develop
against.
> Compilers can also provide a command line override to say
> where a creatively named module can be found.
Module maps are a thing in GCC and Clang, so they have it. Bikeshedding
on the format of it so that it can be shared among implementations would
be a nice-to-have in the TR.
> Users would still need to build each module (as they have to build
> each .cpp) in order for all symbols to get defined. This might
> disappoint some people that think that textual modules will provide
> behavior similar to "unity" / "blob" builds. Non-inline function
> definitions in an imported module wouldn't have a strong linker
> definition (wrong words there, sorry) in importers... they would only
> be provided in the TU that defines that module.
I think I need implementer input on this wording. GCC and Clang (AFAIU)
probably don't care since they don't have strong module ownership. This
also goes to the question of whether a module interface can have object
code when extracting a BMI from it for use by other TU compilation.
> All of this is intended to allow a fully conforming modules
> implementation. It also does not preclude additional build options
> intended for new, smart, fast, build systems. To the contrary, this
> is an area that I encourage investigation and research.
As I said above, I think we want to know at least where we're headed to
as well.
> Let me know if there are holes in this plan, and if it sounds
> reasonable to implement. Also let me know if this sounds like it
> won't help in keeping your existing tool or build system chugging
> along.
I think CMake will likely hope to just go towards the "end result"
directly since we have the infrastructure already and from talking with
the main compiler implementers, it is likely that such support can land
with C++20 modules support. However, that is something to discuss with
other CMake developers because it may have implications for non-Ninja
and non-Makefiles generators (basically the IDE generators) and we'll
need to implement this strategy for them as well.
Thanks,
--Ben
> I would like to find a way for users to decouple the upgrading of
> tools from the migration to modules. I've got a half-baked suggestion
> on how to do so. I think this has the potential to make the upgrade
> from C++17 to C++20 roughly the same cost to users as the upgrade from
> a C++14 to C++17. This was discussed some in the impromptu tooling
> session on Friday at Kona 2019.
Following is responses based on my understanding, please point out where
I'm wrong if I've missed or misunderstood.
I think it should be made clear that this is an upgrade path suggestion,
not how modules should be expected to work in the farther future after
build systems and tools support better build strategies for modules. We
should be encouraging build systems and tools to support these new ones
and not to rely on the following mechanisms forever.
> The no-build-system-upgrade constraint implies other constraints:
> 1. No up-front scanning of the source to find module name and
> dependency information, because a lot of current build systems don't
> currently have a scan step.
I think compilers should be encouraged to provide these tools. This is
about providing mechanisms which do not require them for correct builds
(I'm not worried about efficient builds for this build module because by
not doing the scanning, you're already forfeiting that kind of build).
> 2. No dynamic dependencies between TUs. Many current build systems
> assume that the .cpp -> .o[bj] transformation is trivially
> parallelizable.
This means that the compiler will be doing BMI (or whatever) compilation
behind the scenes implicitly. It also means that the build system is
opting to not care about that. This does not mean that what is actually
going on has to be a completely opaque black box: the compiler can keep
track of what it's doing in whatever backend communication channel it
has for coordinating multiple invocations and provide a log of its work.
I'm imagining something like a hypothetical `module_commands.json`
format analogous to `compile_commands.json`.
> 3. No upgrade of build tool executables. This has to work with
> versions of "make", "ninja", and "cmake" from 10+ years ago.
I'd recommend taking CMake off of this list. CMake requires more
intimate knowledge of how compilers work than "simple" build executors
like make and ninja. For example, CMake from a few years ago has support
for telling compilers "please enable C++11", but because C++17 was not a
thing, there is no functionality in those CMake-provided mechanisms for
C++17. There are ways to do it, but I don't believe the answer from the
CMake triage side is likely to be anything other than "upgrade CMake to
a version which supports the features you're requesting".
I'll also note that CMake implements Fortran modules with POSIX make
(possibly even nmake as well, it appears so from the code, but I'd like
to double check that) since 2008 using an similar strategy as in D1483,
so the newer approach is already possible there. It is ninja which
requires a patch (which has already been around for 3 years, just not
merged upstream, but not for lack of trying). Other implementations of
ninja (e.g., shake and samurai, but I'm sure others exist) likely need
updating.
> 4. No drastically different file formats to parse (like binary module
> interfaces).
Exposed to the build system at least. I imagine there will be internal
formats for persisting this information on disk. I think the best that
can be asked for is documentation, but I don't know that is something
that build systems can require (tools doing static analysis may need to
know however).
> 5. You _can_ add compiler / linker flags.
But whether this makes different BMIs is up to the implementation. This
process is already asking for the compiler to do all this work. If the
compiler determines that compiler and linker flags require separate BMI
files, that is part of their implementation of this strategy.
> The scheme I have in mind would result in no build throughput
> improvements with the old bad build systems, but I think it would
> still provide the isolation benefits of modules and be conforming.
> When the user is able to upgrade their build system, they can start
> getting the build throughput improvements.
Agreed. My reading is that this allows for old build systems to stay
"correct", but for a possible severe drop in how much of an "efficient"
build it can provide until they can support the new build strategies.
> The general idea is to treat the module interface file as a glorified
> header (Gaby has mentioned this possibility in various venues). When
> the user passes --strawman-slow-modules to the compiler, the compiler
> does a textual inclusion of the module interface file (no BMI involved
> at all). The textual inclusion would likely involve placing a #pragma
> strawman-module begin(name-of-module) directive, with a #pragma
> strawman-module end(name-of-module) directive at the end of the module
> text. Each TU will duplicate this work. If the compiler can emit
> this text file, then it can be distributed using existing technologies
> that are expecting preprocessed files. This is similar in nature to
> clang's -frewrite-modueles (I think that's the right spelling)
I think a concrete example would be good here. Note that it has been
mentioned that -frewrite-modules has been declared as a debugging
facility in Clang, so something like -frewrite-modules=for-dist or
something may be required.
I also wonder how this is substantially different than a scan step
(though the derivation of discovered dependencies is not required for
this).
> So this requires that compilers support this textual modules approach.
> It also requires that the compiler be able to find the module
> interface files without requiring the (dumb) build system to scan in
> advance. The "easiest" (and slow) way to make this happen is to
> require that module names correspond to file names, and that compilers
> provide a search path. I am well aware that this isn't fast, but this
> general scheme is intended for build system compatibility. Vendors
> should also provide a faster thing that can be used by newer build
> systems.
I'd rather that the new model be suggested to be made available first so
that build systems and tools can start working on it. All the
infrastructure on the compiler side feels, to me, like it would take
longer than build system and tool upgrades given compilers to develop
against.
> Compilers can also provide a command line override to say
> where a creatively named module can be found.
Module maps are a thing in GCC and Clang, so they have it. Bikeshedding
on the format of it so that it can be shared among implementations would
be a nice-to-have in the TR.
> Users would still need to build each module (as they have to build
> each .cpp) in order for all symbols to get defined. This might
> disappoint some people that think that textual modules will provide
> behavior similar to "unity" / "blob" builds. Non-inline function
> definitions in an imported module wouldn't have a strong linker
> definition (wrong words there, sorry) in importers... they would only
> be provided in the TU that defines that module.
I think I need implementer input on this wording. GCC and Clang (AFAIU)
probably don't care since they don't have strong module ownership. This
also goes to the question of whether a module interface can have object
code when extracting a BMI from it for use by other TU compilation.
> All of this is intended to allow a fully conforming modules
> implementation. It also does not preclude additional build options
> intended for new, smart, fast, build systems. To the contrary, this
> is an area that I encourage investigation and research.
As I said above, I think we want to know at least where we're headed to
as well.
> Let me know if there are holes in this plan, and if it sounds
> reasonable to implement. Also let me know if this sounds like it
> won't help in keeping your existing tool or build system chugging
> along.
I think CMake will likely hope to just go towards the "end result"
directly since we have the infrastructure already and from talking with
the main compiler implementers, it is likely that such support can land
with C++20 modules support. However, that is something to discuss with
other CMake developers because it may have implications for non-Ninja
and non-Makefiles generators (basically the IDE generators) and we'll
need to implement this strategy for them as well.
Thanks,
--Ben
Received on 2019-02-23 22:06:01