C++ Logo


Advanced search

Re: [Tooling] Portable Module Representation

From: Dmitry Kozhevnikov <dmitry.kozhevnikov_at_[hidden]>
Date: Sun, 18 Mar 2018 22:56:11 +0300
Thank you for the insights, I've been thinking about similar topics for
quite some time now.

I believe that just the module source code (and all the things it's the
compilation depends on like header search paths, preprocessor definitions,
type sizes, etc) would be a good start.

The tools are expected to handle the C++ code anyway (since it
might appear not in the module, but in the source file itself), and for
those who're not willing to do so - they're able to call into an actual
compiler frontend (it's fairly trivial now thanks to the libclang or
libtooling efforts).

Your idea might still be nice as a performance bonus if these "portable
representations" could be distributed alongside the BMIs from the
build farms, or with the library binaries.

I haven't yet got your points about the compatibility, API versions, and so

> That universal representation would be such, that, every conforming
> compiler must generate an identical (or directly comparable ) file given
> the same ( preprocessed) source file.
> ...
> Given two universal representations generated from different versions of
> the same (preprocessed) sources, once should be able to determine if the
> versions are identical, API compatible, or not API compatible - That
> implies to formally describe what constitutes an API break in a module
> (Assuming consumers follow the rules for API consumption described in
> Titus' CppCon talk)

Is it important? Does it even achievable? IMO, it's impossible.

In the real world:
- all compilers are non-conforming (at least because of the present defects),
- there are still some wording defects in the standard, which are resolved
by the implementations in implementation-defined ways (possibly differently).

So, when parsing the code for a module, a compiler might hit one of these
corner cases, producing a different "universal representation" as a result.
IMO it's close to impossible to that the actual compiler might use the
result of a different one for the actual proper compilation because of this.

However, some tools (like IDEs, documentation generators, etc) arguably can
use it on a "best-effort" basis when some subtle corner case differences are

> that IDE would be able to parse them without the need for the modules
sources or compiled interfaces to be present,

If said IDE is clang-based, you also need a way to convert the "universal
representation" back the clang AST. It seems like a huge task. The whole
clang AST's public interface is accessible to the clients, and it's very
detailed, so reconstructing everything to a sensible state might be non-
trivial (especially that the "universal representation" would likely be some
lowest common denominator across different compilers). Maybe someone more
familiar with clang internals than I might comment on this.

If said IDE is using its custom parser (there are at least 7 notable
implementations that I know of, some of them more alive than the other),
their developers should also do the similar work (which seems way more time-
consuming than just using the existing engine to parse the module source
code, especially for the engines which are now in the maintenance mode).

Dmitry Kozhevnikov
CLion developer

Received on 2018-03-18 20:56:13