The indirect consumption approach implies that (all of) the source code must be available either at the original filesystem paths or within the BMI itself (the latter case would require us to implement our own virtual filesystem layer to access the bundled source code; we can't rely on extracting it to the customer's file system). In some cases, we may still require the native compiler version used to produce the BMI be installed on the system because we depend on probing the native compiler to determine how to emulate certain behaviors. However, if implementors constrain their support of distributable BMIs to matching versions of the native compiler, then this latter concern is not a problem (since the compiler version for the invocation that is consuming the BMI already matches (closely enough) the version used to produce the BMI.
The above information would enable us to, upon observing consumption of a BMI that we did not observe the construction of, to use the information from the BMI to construct our own variant. We still face some challenges in this area (for example, data races generating our own variant when capturing a parallel build), but I believe such technical difficulties are not a large impediment.
What would be helpful is if each implementor were to provide a utility with a common interface (like c++filt) that enabled extraction of the above information from a BMI. Perhaps the tool would produce JSON output with the above command line, wording directory, environment variables, etc... information and the ability to generate a .zip file with the bundled source code.
I would be interested in hearing from others how useful the
above would be for other tool providers.
Re-sending to SG15 and modules lists
Also attached the replies/questions on the old tooling group.
We discussed the topic on the last SG15 meeting and I’d like to collect more opinions/questions/scenarios where BMI reuse or at least extracting some data might be necessary or beneficial.
From: Olga Arkhipova
Sent: Thursday, May 23, 2019 5:31 PM
To: Tooling@isocpp.open-std.org; Gabriel Dos Reis <email@example.com>; Anna Gringauze <firstname.lastname@example.org>; Lukasz Mendakiewicz <email@example.com>; Cameron DaCamara <Cameron.DaCamara@microsoft.com>
Subject: BMI distribution and reading BMI data
I’d like to discuss the BMI usage and distribution (reuse) topics – can we do on tomorrow’s SG15 meeting? Or later?
BMI distribution (reuse)
Currently, built modules (BMI) are very similar to static libraries from build perspective:
1. They are specific to the compiler version (i.e. can only be used by the compiler binary compatible with the one which produced them)
2. If they depend on other modules, their BMIs need to be present too for successful build.
3. A number of compiler switches which were used to build the module should match the compiler switches for the source which uses this module.
So distribution of BMIs currently has similar limitations as the distribution of built static libraries:
· has strict requirements on the compiler and other used libraries versions
· limited to the platforms, architectures and #defines it is built for.
The BMI distribution definitely has performance advantage for the builds which meet all restrictions and requirements, i.e. the same ones which can use built static libraries.
If BMI's restriction of the specific compiler and exact command line can be weakened somehow, or at least some data can be extracted from all BMIs, the performance advantage of the BMI distribution can be wider.
Scenarios where extracting at least some data from BMIs is needed
VS instellisense (EDG)
Visual Studio and VS Code support not only MSVC, but also clang and gcc.
VS is using EDG compiler as intellisense engine, which currently supports MSVC, Clang and gcc modes. As performance of EDG compilation is very critical, ideally, EDG should be able to use modules already built by MSVC, clang and gcc.
Linters (as-you-type code analysis) require additional data specified in the source (annotations, pragmas, attributes, contracts). Ideally, this information should be always be present in BMI, independent on whether the code has been compiled for analysis or code generation.
o Alternative: producing a new BMI for linters and IntelliSense, making it slower.
Note: MSVC will have an option to include the original input source (not TU-expanded) into the IFCs.
Clang-cl and clang-gcc
Currently, clang-cl is (almost) ABI compatible with cl. I believe the same is true for clang and gcc.
Should Clang-cl be able to use modules produced by MSVC and vice versa?
Should Clang-gcc be able to use modules produced by gcc and vice versa?
If BMIs are distributed together with their sources (like modules for MS standard libs) build systems might want to check if the available BMIs are actually compatible with the current build settings and if not, produce a different BMI from the source
Static analysis (background code analysis, code analysis at build)
Static analysis often requires additional data computed from the source which is normally not stored in the BMI. Such additional information is produced by static analysis tools during a separate analysis phase of the module and needs to be stored into a different BMI file.
o Alternative 1: Always adding extra info which is not always needed is an unjustifiable performance expense.
o Alternative 2: Adding this information to already created BMI file creates build system complications.
The format of additional information is not defined at this point – the tools decide how to read and write the data. We recommend storing the data in a way that can be consumed by other tools and compilers.
BMI data to extract
At minimum, the following data should be extractable from any BMI
· Module source file name/location (as it was during the module build)
· Compilation options used to build BMI, especially the ones which have to match in the source using this module (#defines, etc.)
· Referenced modules and their BMIs (unless included into compilation options)
· Static analysis data
o Imported header units and the referenced header
o Additional information stored by static analysis tools
o Anything that is possible to extract from header files using a compiler frontend – i.e. types, symbols, ASTs, source information, annotations (declspecs), pragmas, attributes, etc (for consumption by code analysis)
Should we encourage the compiler vendors to provide a way to do this for the BMIs they produce?
_______________________________________________ Modules mailing list Modules@lists.isocpp.org Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/modules Link to this post: http://lists.isocpp.org/modules/2019/05/0430.php