C++ Logo

sg15

Advanced search

Re: [Tooling] Modules naming

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sat, 12 Jan 2019 10:14:11 +0100
On Sat, 12 Jan 2019 at 06:23 Tom Honermann <tom_at_[hidden]> wrote:

> On 1/11/19 6:04 PM, Corentin wrote:
>
>
>
> On Fri, 11 Jan 2019 at 23:31 Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 1/11/19 9:29 AM, Ben Craig wrote:
>>
>> > The manifest file could be generated (based on information in source
>> code) and
>>
>> > usable by multiple tools, IDEs, and build systems. It need not be a
>> statically
>>
>> > maintained file. I like this approach because it appropriately
>> separates the
>>
>> > steps of identifying/building the module name/file map vs using the
>> module
>>
>> > name/file map (most tools don't want to be build systems).
>>
>>
>>
>> This sounds like a clang compilation database.
>>
>>
>
>> I don't think so. It is a database, but not one that stores the history
>> of how a build was performed. Rather, it stores some of the information
>> that would be needed to construct a compiler invocation (or for a tool/IDE
>> to parse source files that require resolving module names to source files).
>>
>
>
> It's exactly what a compilation database is ( it includes informations
> such as include directories, defines and some flags) for every translation
> unit
>
> I think of the manifest file as more of a meta-build thing. I think of a
> compilation database as something that records the operations that were
> done as part of a build, not the instructions for how to do a build.
> Perhaps my perspective on this is unique.
>

I think it is - tools now provide ways to generate that file without doing
the actual build and it only specifies how to build translations units, not
whole programs.

>
>
>> Such a thing is useful for non-build tools, and can be generated
>> automatically, as you said. I am concerned that generating such a file
>> early in the build process would cause performance problems. Each source
>> file would need to be partially parsed before the build DAG could be fully
>> formed. However, I don’t have benchmarks indicating how expensive an extra
>> pass over all the source files is.
>>
>> Without such a file (or equivalent information statically encoded in
>> "the" build system), the only alternative is to examine every source file
>> in order to construct the build DAG in memory anyway. How else could such
>> a scan be avoided?
>>
>
> Build systems will always need to parse every file at least once before
> invoking the build.
> They can hopefully extract both the name and the dependency in a single
> pass.
>
> Unless the information is already available elsewhere.
>
> I feel like this point keeps getting missed. We use many tools that
> require semantic analysis. Many of those tools today do not have anything
> that we would call a build system. They might be configured with a
> collection of include paths and macro definitions, but that is pretty much
> it. I think too much of this discussion is in regard to "the" build system
> and not enough on how IDEs/tools will function (preferably collaboratively)
> in a modular world. How are vim and emacs going to resolve module imports?
>

I think you are right - Build systems and code analysis tools, including
IDE will need to cope differently.
The discussion of how symbols will be extracted from a file importing
modules is very important indeed, especially for your usecase.

Note that if modules are not tied to the filename, they can easily be
renamed at any time by the developer which implies that modifying a file
requires the mapping file you want to be regenerated.
The dependency graph of modules is much more dynamic.


> The issue with lack of module mapping is that the build system can't
> really have a top-down
> approach that would ensure some nodes of the build graph are fully
> resolved soon after
> the build starts.
>
> I don't see this issue as an artifact of a module mapping requirement but
> rather the separate compilation model for modules. It is exactly
> equivalent to complications with generated headers today. Knowing the
> module interface unit file name doesn't tell you which source files have a
> dependency on it.
>

I was thinking the other way around.
When you know that foo.cpp depends on module bar, if you have a O(1)
mechanism to map bar to a file, you can prioritize the scanning of the
corresponding file until all transitives dependencies of foo.cpp are
complete and you can start to build the DAG towards that node without
waiting the analysis of the whole project to be complete.
If the module mapping is in O(n), the best strategy for the build system is
to open files "randomly" and filling the blanck as it goes.

Does that makes sense ?



>
>
>
>> (There is another alternative; implicitly building modules on demand as
>> is done for Clang modules today. But to my knowledge, no implementors are
>> pursuing that for standard proposed modules).
>>
>
> I imagine this would solve a lot of issues : Give the compiler a bunch of
> source files and let it figure things out. It's more or less how rust and
> go works unless I'm mistaking?
>
> It solves some, but it has some scaling issues as well, particularly when
> a module must be built (and cached) multiple times to satisfy different
> sets of compilation options by different consumers. Also due to the need
> for parallel compiler invocations to synchronize.
>

I had not thought of that second point... that's... an interesting problem.
gosh !



> Tom.
>
>
> I highly recommand this article about how that problem can be solved in D
>
> https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/
>
> I don't know if implementers would be willing to move in this direction ?
>
> However, it's important to note that even if the compiler handle the
> module dependency itself
> it would also benefit from a fast module -> file mapping, even more so
> than build systems.
>
>
>
>> Tom.
>>
>>
>>
>>
>>
>> *From:* tooling-bounces_at_[hidden] <tooling-bounces_at_[hidden]>
>> <tooling-bounces_at_[hidden]> *On Behalf Of *Tom Honermann
>> *Sent:* Thursday, January 10, 2019 9:27 PM
>> *To:* WG21 Tooling Study Group SG15 <tooling_at_[hidden]>
>> <tooling_at_[hidden]>
>> *Subject:* Re: [Tooling] Modules naming
>>
>>
>>
>> On 1/10/19 5:17 PM, Ben Craig wrote:
>>
>> Can you elaborate more on the kinds of historical pains caused by tying a
>> #include directive to a file name? I know of issues with #pragma once, but
>> that feels like a distinct problem from file names.
>>
>>
>>
>> > This is based on decades of experience caused by header files.
>>
>> I think most of the participants in wg21 have years, often decades of
>> experience with header files. I know of plenty of issues with the
>> preprocessor, but I am not yet aware of any major problems on the file name
>> front. (ok, getting the right slashes can be annoying… but it’s not a huge
>> problem for me personally).
>>
>>
>>
>> I’m not a fan of the MANIFEST / module map approach in general. It
>> requires duplicating information that is already in the source. I get that
>> it has the potential to speed up builds, but I’d rather not have to update
>> another location when I add a new .cpp file to my project. Many build
>> systems allow for the user to make the tradeoff in whether they will use a
>> file system glob to enumerate their source, or require the user to list the
>> source manually. I usually fall into the file system glob crowd.
>>
>> The manifest file could be generated (based on information in source
>> code) and usable by multiple tools, IDEs, and build systems. It need not
>> be a statically maintained file. I like this approach because it
>> appropriately separates the steps of identifying/building the module
>> name/file map vs using the module name/file map (most tools don't want to
>> be build systems).
>>
>> Tom.
>>
>>
>>
>> *From:* tooling-bounces_at_[hidden] <tooling-bounces_at_[hidden]>
>> <tooling-bounces_at_[hidden]> * On Behalf Of *Gabriel Dos Reis
>> *Sent:* Thursday, January 10, 2019 3:15 PM
>> *To:* WG21 Tooling Study Group SG15 <tooling_at_[hidden]>
>> <tooling_at_[hidden]>
>> *Subject:* Re: [Tooling] Modules naming
>>
>>
>>
>> Microsoft strongly encourages its developers and customers to NOT tie a
>> module name with the containing source file of its interface. This is
>> based on decades of experience caused by header files. I would rather see
>> us move in the direction of some sort of MANIFEST file that map modules to
>> source files and artifacts.
>>
>>
>>
>> *From:* tooling-bounces_at_[hidden] <tooling-bounces_at_[hidden]> *On
>> Behalf Of *Corentin
>> *Sent:* Thursday, January 10, 2019 6:53 AM
>> *To:* WG21 Tooling Study Group SG15 <tooling_at_[hidden]>
>> *Subject:* [Tooling] Modules naming
>>
>>
>>
>> Hello.
>>
>> I would like to suggest two modules related proposals that I think SG15
>> should look at.
>>
>>
>>
>> -* Compiler enforced mapping between module names and module interface
>> file (resource) name. *
>>
>>
>>
>> Currently, modules interfaces can be declared in any file - which makes
>> dependency scanning more tedious than it needs to be and have performance
>> implications
>>
>> (The build system needs to open all files to gather a list of modules) -
>> notably when the build system tries to start building while the dependency
>> graph isn't yet complete.
>>
>>
>>
>> Tools ( ide, code servers, indexers, refactoring) may also greatly
>> benefit from an easier way to locate the source file which declares a
>> module.
>>
>>
>>
>> The specifics of the mapping are open to bikeshedding. However, I think
>> we would have better luck sticking to something simple like <module
>> identifier> <=> <file name>.<extension>
>>
>> (The standardese would mention *resource identifier* rather than
>> filename)
>>
>>
>>
>> - *A standing document giving guidelines for modules naming.*
>>
>>
>>
>> The goal is to take everything the community had to learn the hard way
>> about header naming over the past 30 years and apply it to modules by
>> providing a set of guidelines
>>
>> that could be partially enforced by build system vendors.
>>
>> Encouraging consistency and uniqueness of module identifiers across the
>> industry is I think a necessary step towards sane package management.
>>
>> Note that the standard requires uniqueness of modules identifiers within
>> (the standard definition of) a program but says little about a way to
>> ensure this uniqueness.
>>
>>
>>
>> Here is a rough draft of what I think would be good guidelines, partially
>> inspired by what is done by other languages facing similar issues.
>>
>> · *Prefix module names with an entity and/or a project name to
>> prevent modules from different companies, entities and projects of
>> declaring the same module names.*
>>
>> · *Exported top-level namespaces should have a name identic to
>> the project name used as part of the name of the module(s) from which it is
>> exported.*
>>
>> · *Do not export multiple top-level namespaces*
>>
>> · *Do not export entities in the global namespace outside of the
>> global module fragment.*
>>
>> · *Organize modules hierarchically.* For example, if both
>> modules example.foo and example.foo.bar exist as part of the public API
>> of example, example.foo should reexport example.foo.bar
>>
>> · *Avoid common names such as *util* and *core* for module name
>> prefix and top-level namespace names.*
>>
>> · *Use lower-case module names*
>>
>> · *Do not use characters outside of the basic source character
>> set in module name identifiers.*
>>
>> My hope is that these 2 proposals (whose impact on the standard is
>> minimal) would make it easier for current tooling to deal with modules
>>
>> while making possible for example to design dependency managers and build
>> systems able to work at the module level.
>>
>>
>>
>> I'd love to gather feedback and opinions before going further in that
>> direction.
>>
>> Thanks a lot!
>>
>>
>>
>> Corentin
>>
>>
>>
>> PS: For a bit of background, I talked about these issues there
>>
>>
>>
>> https://cor3ntin.github.io/posts/modules_mapping/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcor3ntin.github.io-252Fposts-252Fmodules-5Fmapping-252F-26data-3D02-257C01-257Cgdr-2540microsoft.com-257C1139eb25a2ca43b5cb2e08d6770b6606-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C636827288180838903-26sdata-3DRbCelyBe1YDW4eNJtYEgKkAeHGxvkhsYqzPk0wf3F58-253D-26reserved-3D0&d=DwMGaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-UCZRX0Vl1g&m=Yv6fjy4yWnfBkW_0m604prnwiQIO5K6DRLBHMjpiaxI&s=v7Z40T9WgivvxWUJ6plSphOw4d8bdvfEz9NAqCruKwE&e=>
>>
>> https://cor3ntin.github.io/posts/modules_naming/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__nam06.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcor3ntin.github.io-252Fposts-252Fmodules-5Fnaming-252F-26data-3D02-257C01-257Cgdr-2540microsoft.com-257C1139eb25a2ca43b5cb2e08d6770b6606-257C72f988bf86f141af91ab2d7cd011db47-257C1-257C0-257C636827288180838903-26sdata-3DtMhQa4ijeqUd2qxXV4loP47nU5NESRTKJLwZqe-252FI1fc-253D-26reserved-3D0&d=DwMGaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-UCZRX0Vl1g&m=Yv6fjy4yWnfBkW_0m604prnwiQIO5K6DRLBHMjpiaxI&s=O9uUoT3QItO0vkb2QTG-EnXjsGfOiq7t93GgFz4YHx8&e=>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Tooling mailing list
>>
>> Tooling_at_[hidden]
>>
>> http://www.open-std.org/mailman/listinfo/tooling <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open-2Dstd.org_mailman_listinfo_tooling&d=DwMDaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=y8mub81SfUi-UCZRX0Vl1g&m=_dusEGqwzSzglMFFwUFdPvzdZCb1dTUZ9DjjrQwHaUw&s=o4EJMe6pKxUA_1edRMmVmbN3paM7ckt_7iDjgIveiwA&e=>
>>
>>
>>
>> _______________________________________________
>> Tooling mailing listTooling_at_[hidden]://www.open-std.org/mailman/listinfo/tooling
>>
>>
>> _______________________________________________
>> Tooling mailing list
>> Tooling_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/tooling
>>
>
> _______________________________________________
> Tooling mailing listTooling_at_[hidden]://www.open-std.org/mailman/listinfo/tooling
>
>
> _______________________________________________
> Tooling mailing list
> Tooling_at_[hidden]
> http://www.open-std.org/mailman/listinfo/tooling
>

Received on 2019-01-12 10:14:26