sg15: [Tooling] BMI distribution and reading BMI data

From: Olga Arkhipova <olgaark_at_[hidden]>
Date: Fri, 24 May 2019 00:30:57 +0000

Hi all,
I'd like to discuss the BMI usage and distribution topics - can we do on tomorrow's SG15 meeting? Or later?
Thanks,
Olga

BMI distribution

Currently, built modules (BMI) are very similar to static libraries from build perspective:
1. They are specific to the compiler version (i.e. can only be used by the compiler binary compatible with the one which produced them)
2. If they depend on other modules, their BMIs need to be present too for successful build.
3. A number of compiler switches which were used to build the module should match the compiler switches for the source which uses this module.

So distribution of BMIs currently has similar limitations as the distribution of built static libraries:
* has strict requirements on the compiler and other used libraries versions
* limited to the platforms, architectures and #defines it is built for.

The BMI distribution definitely has performance advantage for the builds which meet all restrictions and requirements, i.e. the same ones which can use built static libraries.

If BMI's restriction of the specific compiler and exact command line can be weakened somehow, or at least some data can be extracted from all BMIs, the performance advantage of the BMI distribution can be wider.

Scenarios where extracting at least some data from BMIs is needed

VS instellisense (EDG)
Visual Studio and VS Code support not only MSVC, but also clang and gcc.
VS is using EDG compiler as intellisense engine, which currently supports MSVC, Clang and gcc modes. As performance of EDG compilation is very critical, ideally, EDG should be able to use modules already built by MSVC, clang and gcc.

Linters (as-you-type code analysis) require additional data specified in the source (annotations, pragmas, attributes, contracts). Ideally, this information should be always be present in BMI, independent on whether the code has been compiled for analysis or code generation.
o Alternative: producing a new BMI for linters and IntelliSense, making it slower.

Note: MSVC will have an option to include the original input source (not TU-expanded) into the IFCs.

Clang-cl and clang-gcc
Currently, clang-cl is (almost) ABI compatible with cl. I believe the same is true for clang and gcc.
Should Clang-cl be able to use modules produced by MSVC and vice versa?
Should Clang-gcc be able to use modules produced by gcc and vice versa?

Build systems
If BMIs are distributed together with their sources (like modules for MS standard libs) build systems might want to check if the available BMIs are actually compatible with the current build settings and if not, produce a different BMI from the source

Static analysis (background code analysis, code analysis at build)
Static analysis often requires additional data computed from the source which is normally not stored in the BMI. Such additional information is produced by static analysis tools during a separate analysis phase of the module and needs to be stored into a different BMI file.
o Alternative 1: Always adding extra info which is not always needed is an unjustifiable performance expense.
o Alternative 2: Adding this information to already created BMI file creates build system complications.
The format of additional information is not defined at this point - the tools decide how to read and write the data. We recommend storing the data in a way that can be consumed by other tools and compilers.

Other scenarios?

BMI data to extract

At minimum, the following data should be extractable from any BMI

* Module source file name/location (as it was during the module build)
* Compilation options used to build BMI, especially the ones which have to match in the source using this module (#defines, etc.)
* Referenced modules and their BMIs (unless included into compilation options)
* Static analysis data
o Imported header units and the referenced header
o Additional information stored by static analysis tools
o Anything that is possible to extract from header files using a compiler frontend - i.e. types, symbols, ASTs, source information, annotations (declspecs), pragmas, attributes, etc (for consumption by code analysis)

Should we encourage the compiler vendors to provide a way to do this for the BMIs they produce?

Received on 2019-05-24 02:31:02