Date: Fri, 15 Oct 2021 11:09:26 -0700
On Fri, Oct 15, 2021, at 10:06, Gabriel Dos Reis via SG15 wrote:
> * While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.
> *
>
> Modules should reflect source code architecture, not a particular filesystem idiosyncrasies.
>
> ‘stat’ing the file system all the way is something we have 40+ painful experience with (‘#include); consuming an inode per module components (Perl’s way) is something we have 30+ years of painful experience with.
>
Tools like ninja have shown that stat-ing many files can be done quickly and efficiently even on platforms where poor performance is blamed on the filesystem. If the bigger issue is "getting a list of files", simply caching the list of files found on the previous run and comparing it to the new list of files is much faster.
You don't need to stat the files until you need to see what has changed (and even then some platforms allow you to stat files as you walk the tree), and if the list of files you require has changed its immaterial as everything needs to be re-scanned, rechecked, and rebuilt due to the consequence of a lack of a module mapping format.
Most tools *do not* get a list of files before continuing to stat for changes.
At my previous job I made a file(GLOB sources CONFIGURE_DEPENDS) modified LLVM build (with the monorepo) and CMake checking for new or removed files from source lists took only 0.5 seconds on a 300MB/s read SSD from 2015 under WSL2 where the files were stored on my windows desktop, but being read from linux. More time was spent linking than any other operation. If a file was added or removed, CMake was going to have to reconfigure the entire build anyhow and that's where some slowdown comes into play.
I saw nearly equivalent numbers for native NTFS however there is an issue within CMake and other tools where any API that might call CloseFile on a file handle will sit and wait, as it is extremely slow. (As an aside, the rust folks were able to greatly increase the speed of their rustup installer and cargo build tool performance on NTFS by simply pushing all calls to CloseFile into another thread, and performance is now comparable to Linux).
Izzy
> We should exercise caution with reflexively copying whatever we are familiar with. For familiarity may not necessarily be the answer to our problems.
>
> -- Gaby
>
> *From:* SG15 <sg15-bounces_at_[hidden]> *On Behalf Of *Daniel Ruoso via SG15
> *Sent:* Friday, October 15, 2021 9:38 AM
> *To:* iain_at_[hidden]
> *Cc:* Daniel Ruoso <daniel_at_[hidden]>; sg15_at_[hidden]
> *Subject:* Re: [SG15] P2473R0: Distributing C++ Module Libraries
>
>
> On Fri, Oct 15, 2021 at 12:22 PM Iain Sandoe <iain_at_[hidden]> wrote:
>> .. I had a question on the periods in module names (which might just mean I’m kinda new to the group and missed some previous design discussion)
>> These have no hierarchical significance to the compiler, what problem is it solving to make them have disk layout hierarchy in the tooling?
>>
>
> While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.
>
> Now, apart from the strict reading of the standard, we have plenty of prior art in other languages for the translation of the word separator in the module name to the path separator in the file system, e.g.: Perl, Python, Java, Rust. Fortran doesn't seem to allow word separators in module names (haven't read the entire docs for it), and Golang uses an opaque string (intended as an URI, IIUC) as the identifier.
>
> It also allows the filesystem usage to be smarter, instead of ending with a flat directory with all the module files in it.
>
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
> * While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.
> *
>
> Modules should reflect source code architecture, not a particular filesystem idiosyncrasies.
>
> ‘stat’ing the file system all the way is something we have 40+ painful experience with (‘#include); consuming an inode per module components (Perl’s way) is something we have 30+ years of painful experience with.
>
Tools like ninja have shown that stat-ing many files can be done quickly and efficiently even on platforms where poor performance is blamed on the filesystem. If the bigger issue is "getting a list of files", simply caching the list of files found on the previous run and comparing it to the new list of files is much faster.
You don't need to stat the files until you need to see what has changed (and even then some platforms allow you to stat files as you walk the tree), and if the list of files you require has changed its immaterial as everything needs to be re-scanned, rechecked, and rebuilt due to the consequence of a lack of a module mapping format.
Most tools *do not* get a list of files before continuing to stat for changes.
At my previous job I made a file(GLOB sources CONFIGURE_DEPENDS) modified LLVM build (with the monorepo) and CMake checking for new or removed files from source lists took only 0.5 seconds on a 300MB/s read SSD from 2015 under WSL2 where the files were stored on my windows desktop, but being read from linux. More time was spent linking than any other operation. If a file was added or removed, CMake was going to have to reconfigure the entire build anyhow and that's where some slowdown comes into play.
I saw nearly equivalent numbers for native NTFS however there is an issue within CMake and other tools where any API that might call CloseFile on a file handle will sit and wait, as it is extremely slow. (As an aside, the rust folks were able to greatly increase the speed of their rustup installer and cargo build tool performance on NTFS by simply pushing all calls to CloseFile into another thread, and performance is now comparable to Linux).
Izzy
> We should exercise caution with reflexively copying whatever we are familiar with. For familiarity may not necessarily be the answer to our problems.
>
> -- Gaby
>
> *From:* SG15 <sg15-bounces_at_[hidden]> *On Behalf Of *Daniel Ruoso via SG15
> *Sent:* Friday, October 15, 2021 9:38 AM
> *To:* iain_at_[hidden]
> *Cc:* Daniel Ruoso <daniel_at_[hidden]>; sg15_at_[hidden]
> *Subject:* Re: [SG15] P2473R0: Distributing C++ Module Libraries
>
>
> On Fri, Oct 15, 2021 at 12:22 PM Iain Sandoe <iain_at_[hidden]> wrote:
>> .. I had a question on the periods in module names (which might just mean I’m kinda new to the group and missed some previous design discussion)
>> These have no hierarchical significance to the compiler, what problem is it solving to make them have disk layout hierarchy in the tooling?
>>
>
> While it's true that they don't have hierarchical significance, the name is a list of identifiers separated by dots. The natural word separator for the file system is a hierarchy.
>
> Now, apart from the strict reading of the standard, we have plenty of prior art in other languages for the translation of the word separator in the module name to the path separator in the file system, e.g.: Perl, Python, Java, Rust. Fortran doesn't seem to allow word separators in module names (haven't read the entire docs for it), and Golang uses an opaque string (intended as an URI, IIUC) as the identifier.
>
> It also allows the filesystem usage to be smarter, instead of ending with a flat directory with all the module files in it.
>
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>
Received on 2021-10-15 13:10:03