Date: Sun, 18 Mar 2018 22:56:11 +0300
Thank you for the insights, I've been thinking about similar topics for
quite some time now.
I believe that just the module source code (and all the things it's the
compilation depends on like header search paths, preprocessor definitions,
type sizes, etc) would be a good start.
The tools are expected to handle the C++ code anyway (since it
might appear not in the module, but in the source file itself), and for
those who're not willing to do so - they're able to call into an actual
compiler frontend (it's fairly trivial now thanks to the libclang or
libtooling efforts).
Your idea might still be nice as a performance bonus if these "portable
representations" could be distributed alongside the BMIs from the
build farms, or with the library binaries.
I haven't yet got your points about the compatibility, API versions, and so
on.
> That universal representation would be such, that, every conforming
> compiler must generate an identical (or directly comparable ) file given
> the same ( preprocessed) source file.
> ...
> Given two universal representations generated from different versions of
> the same (preprocessed) sources, once should be able to determine if the
> versions are identical, API compatible, or not API compatible - That
> implies to formally describe what constitutes an API break in a module
> (Assuming consumers follow the rules for API consumption described in
> Titus' CppCon talk)
Is it important? Does it even achievable? IMO, it's impossible.
In the real world:
- all compilers are non-conforming (at least because of the present defects),
- there are still some wording defects in the standard, which are resolved
by the implementations in implementation-defined ways (possibly differently).
So, when parsing the code for a module, a compiler might hit one of these
corner cases, producing a different "universal representation" as a result.
IMO it's close to impossible to that the actual compiler might use the
result of a different one for the actual proper compilation because of this.
However, some tools (like IDEs, documentation generators, etc) arguably can
use it on a "best-effort" basis when some subtle corner case differences are
tolerable.
> that IDE would be able to parse them without the need for the modules
sources or compiled interfaces to be present,
If said IDE is clang-based, you also need a way to convert the "universal
representation" back the clang AST. It seems like a huge task. The whole
clang AST's public interface is accessible to the clients, and it's very
detailed, so reconstructing everything to a sensible state might be non-
trivial (especially that the "universal representation" would likely be some
lowest common denominator across different compilers). Maybe someone more
familiar with clang internals than I might comment on this.
If said IDE is using its custom parser (there are at least 7 notable
implementations that I know of, some of them more alive than the other),
their developers should also do the similar work (which seems way more time-
consuming than just using the existing engine to parse the module source
code, especially for the engines which are now in the maintenance mode).
quite some time now.
I believe that just the module source code (and all the things it's the
compilation depends on like header search paths, preprocessor definitions,
type sizes, etc) would be a good start.
The tools are expected to handle the C++ code anyway (since it
might appear not in the module, but in the source file itself), and for
those who're not willing to do so - they're able to call into an actual
compiler frontend (it's fairly trivial now thanks to the libclang or
libtooling efforts).
Your idea might still be nice as a performance bonus if these "portable
representations" could be distributed alongside the BMIs from the
build farms, or with the library binaries.
I haven't yet got your points about the compatibility, API versions, and so
on.
> That universal representation would be such, that, every conforming
> compiler must generate an identical (or directly comparable ) file given
> the same ( preprocessed) source file.
> ...
> Given two universal representations generated from different versions of
> the same (preprocessed) sources, once should be able to determine if the
> versions are identical, API compatible, or not API compatible - That
> implies to formally describe what constitutes an API break in a module
> (Assuming consumers follow the rules for API consumption described in
> Titus' CppCon talk)
Is it important? Does it even achievable? IMO, it's impossible.
In the real world:
- all compilers are non-conforming (at least because of the present defects),
- there are still some wording defects in the standard, which are resolved
by the implementations in implementation-defined ways (possibly differently).
So, when parsing the code for a module, a compiler might hit one of these
corner cases, producing a different "universal representation" as a result.
IMO it's close to impossible to that the actual compiler might use the
result of a different one for the actual proper compilation because of this.
However, some tools (like IDEs, documentation generators, etc) arguably can
use it on a "best-effort" basis when some subtle corner case differences are
tolerable.
> that IDE would be able to parse them without the need for the modules
sources or compiled interfaces to be present,
If said IDE is clang-based, you also need a way to convert the "universal
representation" back the clang AST. It seems like a huge task. The whole
clang AST's public interface is accessible to the clients, and it's very
detailed, so reconstructing everything to a sensible state might be non-
trivial (especially that the "universal representation" would likely be some
lowest common denominator across different compilers). Maybe someone more
familiar with clang internals than I might comment on this.
If said IDE is using its custom parser (there are at least 7 notable
implementations that I know of, some of them more alive than the other),
their developers should also do the similar work (which seems way more time-
consuming than just using the existing engine to parse the module source
code, especially for the engines which are now in the maintenance mode).
-- Dmitry Kozhevnikov CLion developer
Received on 2018-03-18 20:56:13