import std.core;
import widgets;
std::vector<widget>
get_some_widgets() {
/* ... */
}
Now, consider what a tool, such as an editor, an indexer, a
formatter, a static analyzer, a translation tool such as SWIG, a
documentation generator, or any other tool that requires a
semantic representation of source code, will require in order to
perform its intended job. How will such a tool parse this
code? Specifically, how will it resolve the module import
declarations for std.core and widgets such
that declarations for std::vector and widget
are available in order to successfully parse the remainder of
the code? This email thread explores a few possible answers to
this question with the intent of starting a discussion that,
hopefully, will identify a common approach that all compiler and
tool implementors can agree to implement (while still allowing
for compiler/tool specific optimizations when available).
The TL;DR; summary of the remainder of this email is:
Such implementation freedom has benefits, but it comes with a cost. If each tool imposes its own requirements for how module imports are resolved, what does that imply for their use? Each tool will require an answer to "where is the module interface unit source code for module X and what preprocessor and language dialect options do I use to translate it (for build mode Y)?", or "where is my cached module artifact for module X (for build mode Y)?". The answers to these questions will have to be supplied by a build system, a (generic or tool specific) environment configuration, or tool specific invocation options.
Build system support is a reasonable requirement for compilation, but is not a reasonable requirement for many other tools. For example, it strikes me as unreasonable to require build systems to be augmented with explicit support for each of Vim, Emacs, Visual C++, VS Code, Xcode, CLion, Cevelop, Eclipse, etc... in order for the maintainers of any particular code base to use their preferred editor with advanced features like code completion. Likewise, it seems unreasonable to require tools like editors to be able to query any particular build system.I asked the Xcode and Visual C++ developers how their respective editors would handle the code above. For Xcode, the answer is that, for features like code completion that depend on semantic analysis, the project will have to have been built first, and the editor will consume module artifacts produced during compilation; in other words, such features will only work when the code has been built and was built with a supported version of Clang. Visual C++ will likewise support consumption of module artifacts produced by the Microsoft compiler, but will additionally support configuration options to resolve module import declarations without the need for module artifacts. Should we expect editors like Vim, Emacs, CLion, Cevelop, etc... to be able to consume module artifacts? If so, for which (versions of which) compilers?
Some modules proponents have argued for a standardized module format that all tools could consume. So far, only Microsoft has invested in such an effort. Clang and gcc have both moved ahead with their own (highly optimized to their internal representation) module file formats. Concerns have been expressed regarding the viability of a common format due to performance requirements and the fidelity of the saved semantic model. Portions of the C++ language are implementation defined, so the semantic model stored by a producer may not match the model required by a consumer. Tool requirements also differ; compilers require a semantic description of exported entities and sufficient detail to emit useful diagnostics, but tools like static analyzers require comments, accurate and precise source location ranges including macro expansion contexts, locations of macro (un)definitions, locations of redundant and unused declarations, and much more (and yes, this information will be required for imported modules; the form of the declaration affects the analysis). A single format, even if limited in what it stores with fallback to textual analysis, is unlikely to be the best solution for all tools. My personal impression of the SG15 evening session in Jacksonville earlier this year is that this direction will not have consensus.
It has been suggested that a standardized API might overcome
some of the concerns expressed over a standardized format.
However, I would expect the same concerns regarding performance
and semantic models to apply here. To my knowledge, no designs
for such an API have been made public, nor has a collective
effort to design such an API materialized.
I believe sharing module artifacts, in any form, will prove to
be infeasible. For tools that already have an established
internal representation for C++ code, the cost of translating
the internal representation of another implementation, whether
via API or a common format, is very high (we know this from
experience at Coverity). For those familiar with the internal
representations used by gcc and Clang, consider what it would
take to translate one to the other. If I were assigned such a
task, the approach I would take is to use the internal
representation to generate source that closely reflects the
original source and that is then compiled by the other (this
would not be an easy task, nor is it necessarily possible
without loss of some information). I believe source code is a
better portable format than any binary format.
The LSP (language server protocol; https://langserver.org) provides a tool agnostic approach to avoiding the parsing question altogether by providing a protocol by which a client can request some semantic information such as code completion, hover text, and location information. The server (likely closely tied to a particular compiler) responds with information collected during a build (whether cached or on demand). Vim, Emacs, VS Code, CLion, and other editors have added or are adding support for it. While the LSP is useful for language agnostic tools, it isn't something that can scale to meet the semantic detail and performance requirements of language specific tools like static analyzers.
Many tools depend on the ability to consume standard library implementations produced by other vendors. The C++ standard will eventually prescribe modules such as std.core for standard library components, but these modules may be composed from many dependent modules, the structure of which is implementation detail. A separate configuration approach for each tool might require that each tool be configured for the internal module topology for each of the Microsoft, libstdc++, libc++, etc... standard library implementations. Such an approach matches how we handle header files today; tools must be configured with include paths that include implementation dependent paths. But what if an implementor were to make their standard library modules only available via module artifacts (as Microsoft does today, though this is expected to change). The Modules TS specifies (5.2 [lex.phases] p7) "It is implementation-defined whether the source for module interface units for modules on which the current translation unit has an interface dependency (10.7.3) is required to be available". It seems to me that withholding standard library module interface unit source code would be rather user hostile and I don't expect any implementations to do so; I believe that addition in the Modules TS is intended more for build system flexibility. Nevertheless, the potential for module interface unit source code to be absent is a concern for tools that are unable to consume module artifacts.
Historically, we've taken the individual tool configuration
approach for support of header files and, despite limitations,
it has sufficed. However, modules changes one critical aspect
of such configuration. Previously, header files needed to be
consumable with the same set of include paths and macro
definitions as is used for the primary source file. Translating
module interface unit source code may require different, even
conflicting, include paths and macro definitions. Thus,
configuration will become more challenging. I think we should
strive for a better solution for modules.
If we can't require build system integration for all tools, and we can't rely on sharing module artifacts, and separate configuration for each tool would be challenging, where does this leave us?
I think we need an industry standard, tool agnostic solution that works for common environments (e.g., non-exotic environments in which source code is stored in files) and is supported by all compilers and tools. Tools can always offer opt-in features for build optimization that require build system augmentation (analogous to use of precompiled headers today).
What might such an industry standard approach look like? Here
is a sketch of a design:
Clearly, such a specification falls outside the scope of the C++ standard. However, we could provide a specification in the form of a TS that implementors can adhere to.
So, what do you think? Do you agree that there is a problem worth solving here? Is a common specification a feasible solution? Is standardizing such a specification useful and desirable? What requirements should be placed on the design? If you are a compiler or tool implementor, have you already been working on modules support? If so, what approaches have you been considering? Are they captured above? What is your preferred solution?