C++ Logo

sg15

Advanced search

Re: [SG15] P2473R0: Distributing C++ Module Libraries

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Fri, 15 Oct 2021 18:25:11 +0000
[Isabella]

  * Tools like ninja have shown that stat-ing many files can be done quickly and efficiently even on platforms where poor performance is blamed on the filesystem. If the bigger issue is "getting a list of files", simply caching the list of files found on the previous run and comparing it to the new list of files is much faster.

As you may know, the MSVC toolset has various techniques (including Bloom filters for caching) to mitigate the issue for years, but it IS still an issue at scale. The better solution is to avoid that isue if we can help it or avoid it.

-- Gaby

From: SG15 <sg15-bounces_at_[hidden]> On Behalf Of Isabella Muerte via SG15
Sent: Friday, October 15, 2021 11:09 AM
To: SG15 <sg15_at_[hidden]>
Cc: Isabella Muerte <imuerte_at_[hidden]>
Subject: Re: [SG15] P2473R0: Distributing C++ Module Libraries

On Fri, Oct 15, 2021, at 10:06, Gabriel Dos Reis via SG15 wrote:

  * While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.
  *



Modules should reflect source code architecture, not a particular filesystem idiosyncrasies.



'stat'ing the file system all the way is something we have 40+ painful experience with ('#include); consuming an inode per module components (Perl's way) is something we have 30+ years of painful experience with.



Tools like ninja have shown that stat-ing many files can be done quickly and efficiently even on platforms where poor performance is blamed on the filesystem. If the bigger issue is "getting a list of files", simply caching the list of files found on the previous run and comparing it to the new list of files is much faster.

You don't need to stat the files until you need to see what has changed (and even then some platforms allow you to stat files as you walk the tree), and if the list of files you require has changed its immaterial as everything needs to be re-scanned, rechecked, and rebuilt due to the consequence of a lack of a module mapping format.

Most tools do not get a list of files before continuing to stat for changes.

At my previous job I made a file(GLOB sources CONFIGURE_DEPENDS) modified LLVM build (with the monorepo) and CMake checking for new or removed files from source lists took only 0.5 seconds on a 300MB/s read SSD from 2015 under WSL2 where the files were stored on my windows desktop, but being read from linux. More time was spent linking than any other operation. If a file was added or removed, CMake was going to have to reconfigure the entire build anyhow and that's where some slowdown comes into play.

I saw nearly equivalent numbers for native NTFS however there is an issue within CMake and other tools where any API that might call CloseFile on a file handle will sit and wait, as it is extremely slow. (As an aside, the rust folks were able to greatly increase the speed of their rustup installer and cargo build tool performance on NTFS by simply pushing all calls to CloseFile into another thread, and performance is now comparable to Linux).

Izzy


We should exercise caution with reflexively copying whatever we are familiar with. For familiarity may not necessarily be the answer to our problems.



-- Gaby


From: SG15 <sg15-bounces_at_[hidden]<mailto:sg15-bounces_at_[hidden]>> On Behalf Of Daniel Ruoso via SG15
Sent: Friday, October 15, 2021 9:38 AM
To: iain_at_[hidden]<mailto:iain_at_[hidden]>
Cc: Daniel Ruoso <daniel_at_[hidden]<mailto:daniel_at_[hidden]>>; sg15_at_[hidden]<mailto:sg15_at_[hidden]>
Subject: Re: [SG15] P2473R0: Distributing C++ Module Libraries



On Fri, Oct 15, 2021 at 12:22 PM Iain Sandoe <iain_at_[hidden]<mailto:iain_at_[hidden]>> wrote:
.. I had a question on the periods in module names (which might just mean I'm kinda new to the group and missed some previous design discussion)
These have no hierarchical significance to the compiler, what problem is it solving to make them have disk layout hierarchy in the tooling?



While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.



Now, apart from the strict reading of the standard, we have plenty of prior art in other languages for the translation of the word separator in the module name to the path separator in the file system, e.g.: Perl, Python, Java, Rust. Fortran doesn't seem to allow word separators in module names (haven't read the entire docs for it), and Golang uses an opaque string (intended as an URI, IIUC) as the identifier.

It also allows the filesystem usage to be smarter, instead of ending with a flat directory with all the module files in it.


_______________________________________________
SG15 mailing list
SG15_at_[hidden]<mailto:SG15_at_[hidden]>
https://lists.isocpp.org/mailman/listinfo.cgi/sg15<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&data=04%7C01%7Cgdr%40microsoft.com%7C2e50422eec3d4d4372d208d990071c77%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637699182502566350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vtSVAweJsxz2mU8gBMZLnX5bvvnvWjqmjoUVRRD566w%3D&reserved=0>



Received on 2021-10-15 13:25:24