sg15: [Tooling] Minutes from 2019-01-31 SG15 Modules Tooling Interactions Telecon

From: Bryce Adelstein Lelbach aka wash <brycelelbach_at_[hidden]>
Date: Thu, 31 Jan 2019 13:56:18 -0800

ISO C++ SG15 Modules Tooling Interactions
2019-01-31 Telecon Minutes

Attendees:
Bryce Adelstein Lelbach (Chairing)
Ben Craig (Minute Taker)
Rene Rivera
Isabella Muerte (Izzy)
Colby Pike
Corentin Jabot
Dalton Woodard
Dan Kalowsky
Gabriel Dos Reis (GDR)
JF Bastien, Apple
Michael Spencer, Apple
Bruno Cardoso Lopes, Apple
David Blaike, Google
Richard Smith, Google
Brad King, Kitware
Ben Boeckel, Kitware
Robert Maynard, Kitware
Matthew Woehlke, Kitware
Bill Hoffman, Kitware
Zack Galbreath, Kitware
Vassil Vassilev
Steve Downey, Bloomberg
Thierry Lavoie
Tom Honerman

Bryce: Get everyone on same page as to nature of concerns. Then talk
about possible solutions and mitigations

Presenting: P1427R0
Rene: Made a list of separate issues. Three classes of issues.
    General.
    Source tools.
    Build systems

One key thing is that build systems overwhelming rely on host process
output to figure out DAG issues. We have to live with that.

Most complex part of build systems is figuring out the DAG. Problem
with modules is that we need to calculate DAG in advance. The current
model lets you build once and get a DAG as a side effect. Module make
you precompile the dependencies.

Titus: You hit on the crux... "that's the model we would like to keep"

Grafik Robot: Hasn't been a good way to tackle this so far.

Izzy: P1302 is a solution... Two NB requesting that it get put on the
EWG schedule. 5 NBs against modules right now. With P1302 drops down
to one maybe no.

GDR: How did you get the NB data?

Bryce: Let's not speculate on voting or where NBs will land on this call.

Izzy: This is from me speaking with heads

GDR: Asking because when going into last meeting that all NBs were yes.

Izzy: The problem is what we are discussing here. p1300 also talked
about this some. Requiring a patched ninja to build fortran scared
people.

GDR: implementation how headers / modules are found. Even having a
library filesystem doesn't solve this problem. Build machine and
target machine may have different kinds of filesystems.

Izzy: P1302 talks about this, covers filesystems from zOS.

Bryce: Back to extraction of dependencies in modular code. Is there
an example showing why it is difficult? How does this prevent you
from building a fast parallel build system.

Steve Downey: Biggest reason is that the statement that you are
importing a module doesn't give you enough to figure out what the
actual dependency is. There is an intermediate location. That is
hard for current build systems to handle. Build systems are very
expensive. I run a build and see that a file depends on module1. I
have no idea what that means until I see a file that exports module1.

GDR: Doesn't require a build system change. Have an extra tool on the
side that can figure out the dependency and extraction.

Izzy: That is not what compilers are pitching right now. GCC is
looking to open a socket.

Downey: That's what we used to do in the 80s. This is in the wrong
phase of the build. If we get the build order wrong, we can pull in
an old module.

GDR: several things getting mixed up here. Not saying this is desired
to have a third party tool. But that it is a possible solution. Not
expecting everyone to copy the GCC solution. In his environment, it's
a good solution, bu not necessarily every environment. This is
exploration.

Downey: You don't have a first pass with header files. It's a side
effect of compilation.

GDR: Incremental compilation happens different. Need to be open minded.

Kitware: Allow the preprocessor to do one step. That's one fortran
does. When we do that import step, we don't know the defines and all
the other flags associated with that import. Can't use the current
ones, because that's not what will be passed to that module.

Rene: You may not know what the name of the module is until you run
the preprocessor or the entire compiler.
Izzy: In SD if PP attempts to unroll and it came from a header file,
the preamble ends. This is a low discussed topic.
Apple (Microsoft): Preamble can have macros. No restriction.

Rene: Thoughts that legacy headers aren't going to work out.

GDR: Legacy headers came from clang impl of modules / ATOM

Izzy: Disagreeing with this part of the paper.

Downey: Concerns with legacy headers is finding them. Not being able
to look at a piece of code and understanding whether it is an import
or not.

GDR: If anywhere in your program you import it, but you include it
elsewhere, it can be replaced with an import

Corentin: Non-modular code, all the code that exists today. Build
system will need to scan everything to look for imports in order to
deal with includes. Need to parse everything in advance.

Rene: We don't have implementation experience. Not just modules
itself, but any non-trivial system with build tools. We have boris's
implementation, and there's a little cmake work that may be getting
done.

Kitware: I don't think build2 has seen what cmake has seen. build2
predates macros coming out of modules.

Izzy: working on backporting modules to C++17. Would be a 1000 source
file code base.

Presenting: P1441
Rene: simulated testing. Only so much can be done with existing
implementations. Initial good parallel perf on a laptop. Trying to
get some measurements on typical DAG depth between 26 - 37.

Bryce: What takeaways are you getting from this graph and experiment.

Rene: There is a perf bonus if you are on a single machine without a
lot of parallelization.

Downey: 1000 thread distribution systems are common.

Izzy: This is done with the modules TS

Rene: This is done with the latest GCC with the merged module
proposal. But it doesn't cover module partitions. Just covering DAG
problems. GCC Farm machine with 128 jobs, you lose all module perf
advantages.

Izzy: Is this a result of the preprocessor looking at all the includes
in parallel where modules are doing things sequentially?

Downey: Don't have published results yet, but it looks like this is
just work starvation. There isn't enough available work at a time to
fill out the 128 jobs. This is the build in DAG order that is the
problem. It has to compare against the embarrassing parallel nature
of .cpp files right now.

Izzy: 1 file and 30 partitions would allow you to get more parallelization.

Tom: Now when you are building the implementation unit, you can't
distribute them because you need them all on the same machine.

GDR: If you have a lot of sync nodes, you lose out. and we've been
saying that since the beginning. modules aren't the equivalent of
java classes. nice thing about a module partition is that it lets you
organize your source code how you want. Helps so that you don't have
as many nodes on the graph.

Tom: We spend so much time talking about build systems. It's just the
first thing we deal with. Tons of other tools that consume and
generate C++. We talk about techniques that could be used for build
systems, but that has to be replicated to the other tools. Clang
tools are a success because they were an optimization on headers.
Tools didn't need to retrofit.

Richard: Perf numbers, they are artificial. It neglects a lot of
relevant facts. Total amount of C++ you have to parse to get to the
end of a file is the same. Communication overhead makes things look
bad. We have experience and concrete data on this. A scratch build
you don't get much of an advantage. Using much less time from your
build farm, using much less time on incremental.

Rene: This is definitely an early test, lots of limitations.

Kitware: Macros, preprocessor and BMI interaction are still a big
concern. If the module is built with different flags than what the
header was built with, that's a problem.

Izzy: We have to focus on our build systems, or else we can't get
anywhere. Have to do all these work arounds because the compilers
don't handle dependency. Lack of incremental numbers is a problem.
Would like to see more effort from the compiler vendors. If you're
going to tell me that I need to launch a compiler and preprocessor
before every build to make things faster, I'm not going to believe
you. Especially if it is on an OS with slow process launches.

Richard: Clang modules experience, explicitly building modules as
actions in the build system, not building them automatically. Build
system extracts them. 100000+ modules in the system.

Boris: got exactly opposite results of rene. Modules beat any dag
graph on scale on up to 20 cores. Here are some of my numbers for
comparison at DAG depth of 36:
4-core/8-thread i7:

headers/make: 8.14
headers/build2: 9.47
modules/build2: 1.9120-core/20-thread Xeon (HT disabled):

headers/make: 3.82
headers/build2: 3.99
modules/build2: 2.44

Rene: Trying to figure out why the numbers are different. We need to
figure out what is happening.

GDR: Dependency extraction. Don't need BMI to extract dependency. We
expect it to be the case to look at the source code to extract the
dependencies. Izzy made the point that compiler writers aren't
putting in the work to make building easy. Only way to move the
needle is to write papers and explain the problems.

GDR: Clang modules are formalized with legacy headers.

Corentin: Always come back to letting the compiler doing everything?
Modules should be implicit and not need build system support. That
could be the solution and help out tools to.

GDR: Should talk more with the implementers to understand what is going on.

Michael: same header search path for legacy header units. import
doesn't export macros. Current preamble allows you to process
everything in parallel, you only need to pause if you interleave
things nasty.

Izzy / Michael: Can stop when you see the first non import declaration

Bruno: Implicit builds do a lot more work than explicit builds.

Tom: Richard, in the code base you cited that has many thousands of
modules, are module consumers using #include directives? Or is some
other mechanism (import or module declarations) being used?
Richard: Yes, using #includes.
Tom: So this is not an example of modules as proposed.
Richard: It does demonstrate DAG throughput. Doesn't demonstrate
toolability though.

Downey: Parsing module interface unit, concern isn't that we need to
do that, it's that we can't find it. The disconnect between the file
name and the export is difficult.
GDR: The Manifest should help with that.
Izzy: P1302 was written to solve this.
Downey: The manifest gets really complicated in an open world.
GDR: Depends on how it is done.
Downey: It can look like package management. We understand how _we_
could do it in a controlled environment, but not how to do it for
everyone.

Kitware: Would like to see the ability to do preprocessing, module
dependencies, header dependencies. That would help a lot. We have
our own Fortran parser, and it does some of that.

JFB: So we want something like a dependency scanner. For one level
deep. Please describe your inputs and outputs.

Kitware: It's not ideal, as it requires an extra pass, we don't want
to do multiple extra passes. Reducing passes is key.

Corentin: Every language that has a module system, all of them have a
direct mapping between the file name / path and the module name.

JFB: Swift doesn't

Izzy: For those languages, the compiler is also a build system.

GDR: Please be more open about how we want to do these things.

Corentin: Everybody will experiment with different solutions on
different platforms, and there will be a lot of failed interop. We
need a common ground.

GDR: We can address some of this with standing documents. Having a
direct mapping is not something I recommend... might work for other
languages.

Richard: No standards how #includes works, but that works fine. Don't
need something standards to make it work.

Corentin: Took a long time to get it to work.

Bryce: Suggestion about SD has been in the back of my mind for a while.

Izzy: Should not be heirarchies, but a single level directory can be
used as a module unit / container.

-- 
Bryce Adelstein Lelbach aka wash
ISO C++ Committee Member
HPX and Thrust Developer
CUDA Convert and Reformed AVX Junkie
--

Received on 2019-01-31 22:56:46