sg15: Re: [Tooling] Portable Module Representation

From: Dmitry Kozhevnikov <dmitry.kozhevnikov_at_[hidden]>
Date: Sun, 18 Mar 2018 22:56:11 +0300

Thank you for the insights, I've been thinking about similar topics for
quite some time now.

I believe that just the module source code (and all the things it's the
compilation depends on like header search paths, preprocessor definitions,
type sizes, etc) would be a good start.

The tools are expected to handle the C++ code anyway (since it
might appear not in the module, but in the source file itself), and for
those who're not willing to do so - they're able to call into an actual
compiler frontend (it's fairly trivial now thanks to the libclang or
libtooling efforts).

Your idea might still be nice as a performance bonus if these "portable
representations" could be distributed alongside the BMIs from the
build farms, or with the library binaries.

I haven't yet got your points about the compatibility, API versions, and so
on.

> That universal representation would be such, that, every conforming
> compiler must generate an identical (or directly comparable ) file given
> the same ( preprocessed) source file.
> ...
> Given two universal representations generated from different versions of
> the same (preprocessed) sources, once should be able to determine if the
> versions are identical, API compatible, or not API compatible - That
> implies to formally describe what constitutes an API break in a module
> (Assuming consumers follow the rules for API consumption described in
> Titus' CppCon talk)

Is it important? Does it even achievable? IMO, it's impossible.

In the real world:
- all compilers are non-conforming (at least because of the present defects),
- there are still some wording defects in the standard, which are resolved
by the implementations in implementation-defined ways (possibly differently).

So, when parsing the code for a module, a compiler might hit one of these
corner cases, producing a different "universal representation" as a result.
IMO it's close to impossible to that the actual compiler might use the
result of a different one for the actual proper compilation because of this.

However, some tools (like IDEs, documentation generators, etc) arguably can
use it on a "best-effort" basis when some subtle corner case differences are
tolerable.

> that IDE would be able to parse them without the need for the modules
sources or compiled interfaces to be present,

If said IDE is clang-based, you also need a way to convert the "universal
representation" back the clang AST. It seems like a huge task. The whole
clang AST's public interface is accessible to the clients, and it's very
detailed, so reconstructing everything to a sensible state might be non-
trivial (especially that the "universal representation" would likely be some
lowest common denominator across different compilers). Maybe someone more
familiar with clang internals than I might comment on this.

If said IDE is using its custom parser (there are at least 7 notable
implementations that I know of, some of them more alive than the other),
their developers should also do the similar work (which seems way more time-
consuming than just using the existing engine to parse the module source
code, especially for the engines which are now in the maintenance mode).

--
Dmitry Kozhevnikov
CLion developer

Received on 2018-03-18 20:56:13