sg15: Re: [Tooling] [Ext] Modules and tooling: Resolving module import declarations

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 10 Sep 2018 11:47:55 -0400

On 09/10/2018 02:40 AM, Loïc Joly wrote:
> Le 08/09/2018 à 04:16, Tom Honermann a écrit :
>>
>>>
>>> For closed-source library, it may not even be an option, but a
>>> requirement. And I expect some of those library writers might be
>>> very happy if they could avoid delivering headers, but only a
>>> collection of pre-compiled module interfaces for the compilers they
>>> support.
>>
>> This is what I fear. If library providers were to do that, tools
>> that are unable to consume the provided module artifacts would be
>> unable to parse any source code that has a module interface
>> dependency on those libraries. Library providers that do this are
>> not just restricting what compilers their customers can use, they are
>> restricting what tools their customers can use on the customer's own
>> code (at least the subset of it that has a module interface
>> dependency on the library). I would consider this a pretty user
>> hostile thing to do. I think we should make it as easy as possible
>> for library providers to enable their modular library to work with as
>> wide a range of tools as possible.
>
> What do you propose to do?

What I described in the original post; to establish a defacto industry
standard tool agnostic approach for mapping module names to module
interface unit source code and translation requirements (for various
build modes). Established practice provides incentive to support the
practice.

If a library provider determines that it is in their (and hopefully
their users) best interests to restrict what tools their library can be
used with, I can't fault them. But it should be a conscious decision;
they should be aware that not providing module interface unit source
code has larger ramifications than not providing library source code.
The former would restrict common tools like build systems, IDEs, static
analyzers, source translators, etc... while the latter only restricts
compilers and linkers (and less often used tools like dynamic analyzers).

>
> I think closed-source library providers are happy to expose the
> interface of their code, but currently, in C++, headers contain many
> things related to implementation (private class members, template
> function bodies), and techniques to hide them have an impact on
> performance. So they provide the headers (or in some cases accept
> decreased performances/reduced features), but they would be very happy
> for a solution that allows them to avoid doing so. And I think
> delivering only compiled modules will have many attraction to them.

I think closed and open source library providers would both benefit from
having more ability to separate interface and implementation, and
modules does help in this regard by disallowing direct access to
non-exported entities. However, the above sounds like arguments for
security-via-obscurity. Module artifacts may make it more difficult to
easily view the defined entities, but all of the details are still
present and available in an unencrypted format. Modules are not an
effective mechanism for hiding intellectual property.

>
> I think the way for tools to help users of those libraries is not to
> hope those library writers will be kind enough to continue providing
> headers. The tools will need something (inside or outside of the
> standard) that can at least partially understand the binary module
> artefacts. I think there are three ways to do that (two of which you
> already mentioned):
>
> - Those artefacts have a standardized format. Tools can read that
> format. There has been strong reluctance from some compiler vendors
> against going that way. We will probably end-up with too many formats
> for tools to be able to read them, even if they are published/open
> sourced...
> - Compilers provide a standardized API to read those. As you said,
> there does not seem to be many effort in this area, and we're not even
> sure the binary artefacts will contain all the data required by
> different tools.
> - Compilers provide an option to generate a pseudo header out of a
> binary module artefact. This pseudo header does not need to be
> equivalent to the headers that were used to generate the module. For
> instance, it can be aggregated in one file, it can omit private class
> members, it can omit inline function definitions, non exported
> entities might in some cases be omitted... It's only used for tools to
> be able to know the interface of a module. A good aspect of this
> option is that it does not really require compiler vendors to agree on
> anything, so we might see it happening. A bad aspect is that we are
> not sure if it can be done with enough information for tools (can we
> extract comments, for instance, out of those binary artefacts?)

This third option was mentioned in the original paper on this subject
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0804r0.html#int_translate).

I don't believe it would be viable to omit the properties you
mentioned. Static analyzers, for example, need access to the size of
classes, the definitions of inline functions, and even the form of the
declaration in some cases.

Since compiler vendors define the format of module artifacts, I think it
would inevitably fall on them to implement such tools (perhaps with some
exceptions for implementors that maintain stable and publicly documented
formats). If common tooling was provided for this (e.g., utilities with
similar command line interfaces), it might be a feasible solution for at
least some tools. However, it still leaves tools in the position of
having to have explicit support for each compiler in order to 1)
identify a module artifact, and 2) invoke the correct tool for it. This
approach may suffice to reduce the O(MxN) configuration issue to O(M+N),
but still leaves tools at the mercy of module artifacts being present
(e.g., that a (partial) build has been completed).

This approach has been used with Swift (see
https://github.com/apple/swift/blob/master/tools/SourceKit/docs/Protocol.md#module-interface-generation)
and it has downsides. The JetBrains folks could speak more to this, but
as an example, automated source code refactoring is limited.

Tom.

Received on 2018-09-10 17:48:01