Date: Tue, 26 Apr 2022 10:48:34 -0400
Thanks Ben; this is a valuable answer, and I appreciate it.
Cheers!
Le lun. 25 avr. 2022 à 21:19, Ben Boeckel <ben.boeckel_at_[hidden]> a
écrit :
> [ What follows is a personal opinion, not that of my role on CMake. I am
> also not an implementor, but hopefully I can at least clear some
> things up from my experience as a build systems guy. ]
>
> On Mon, Apr 25, 2022 at 18:41:46 -0400, Patrice Roy via Ext wrote:
> > I think a piece from this discussion is missing : there seems to be
> strong
> > resistance from some implementers as to supporting Tom's "Congrats Gaby"
> > hello-world-style program that would only depend on a modularized
> standard
> > library (let's leave Boost and other well-known but non-std libraries for
> > the moment). This resistance would be hard to explain to users without
> > knowing more about the reasons for this resistance.
> >
> > Would an implementer care to explain why this seems so unreasonable
> without
> > a build system? Ideally, comparing the "old-style" (lexically included
> > headers) approach to the modules-based approach.
> >
> > From this, it would at least be easier to explain to beginners why just
> > compiling their simple, standard-library-only programs requires more
> > tooling than it used to. Everyone would benefit from that knowledge, or
> so
> > it seems to me. My users are game programmers; they are experienced, they
> > use build systems, but they also compile small test code manually at the
> > command line and if they cannot use modules for this, they will ask why
> and
> > I would really like to have an answer. It's not a sand-castle vs
> skyscraper
> > issue; it's something they will need to know to integrate it in their
> > workflow.
>
> Note that I go further than just "standard-library-only" here, but the
> standard library is not immune to flags passed on the command line and
> can transform itself based on things like `-ffast-math` and other
> ABI-affecting flags that put it right back into the "cannot treat as
> built-in" that are common enough that shipping prebuilts per
> configuration is infeasible. Not to mention Linux setups where `clang`
> uses `libstdc++` or Apple where `gcc` uses `libc++` where the stdlib is
> suddenly *not* trivially "associated with the compiler" and is much
> closer to "just another external dependency". Or that that some
> toolchains have historically reused platform standard libraries rather
> than bringing their own (IIRC, pre-OneAPI `icc` and `pgi` have done this
> and though their direct applicability to C++20 is likely "low") or
> projects such as STLPort which have been standalone standard library
> implementations.
>
> I find the "requires more tooling" to be because the standard refuses to
> talk about code in anything other than in abstract "TU" components.
> While this has merits, it does have its costs. Because the standard does
> not talk about what `import foo;` means other than through verbiage like
> "makes names reachable", handling such code isn't grounded in anything
> beyond "implementers will provide mechanisms to make such imports have
> meaning". It has no relation to filesystems (be it conventional or
> "archive as filesystem" FUSE-like interfaces to other things that can be
> treated as filesystems in some way), so there needs to be some mechanism
> to translate `import foo;` into "here's what that means to this TU".
> Right now, we only have flags like `-reference` (MSVC) or
> `-fmodule-mapper=` (GCC) to specify these things, but filling these out
> is the hard part. Now, the compiler can certainly try to answer this on
> its own with some to-be-decided-upon rules, but C++ projects
> historically end up throwing all kinds of semantically meaningful
> metadata (read: compiler flags) on top of what *their* module means that
> any default setup that make any such guess unsuitable for some
> substantial portion of the userbase (cf. `FOO_IS_SHARED` defines for
> library visibility macro expansion, `BUILT_WITH_SOME_OPTIONAL_DEP`
> defines altering available APIs, `-Ofast` for some performance-sensitive
> component, `WITH_DEBUG_MEMBERS`, etc.).
>
> I'll also note that backwards compatibility has a *lot* of value in
> minimizing churn of known-working code, but it also ends up welding
> doors shut that one might want to use. Just as an example, the list
> representation in CMake makes the `;` an absolute landmine and
> complicates safely passing CMake values around). Would it be nice if one
> could just do `cmake -Dfoo=${foo}` to pass it along to some build
> command? Sure, but breaking every non-playground CMake project in the
> process is not worth that price.
>
> Could C++ have said things like "the source encoding must be compatible
> with module lookup namespaces" or "filenames must correlate with module
> names"? Sure. But then folks on non-utf-8 platforms or in non-Unicode
> locales get upset. Could C++ have then said "modules must be self
> contained" and allowed compilers to figure out what to do just based on
> the source? Sure, but then there'd be things like `#pragma flag`
> ifdeffery preludes or `/** semantic comment */` to do what is possible
> today without some other more structured mechanism also being available
> (for prior art, see Rust's `#![feature()]` and `#![cfg()]` attributes,
> Haskell's `{-# LANGUAGE #-}` syntax, Python's magic `from __future__`
> mechanism, or CMake's policy scopes).
>
> But it didn't. Did everyone understand that C++ chose a module system
> isomorphic to Fortran's instead of something like Python or Rust?
> Unlikely. But it's what we have. I can foresee projects being built
> using tools that cobble together a Rust-like or Haskell-like "here's a
> pile of sources and high-level dependency metadata, please build it"
> experience, but the problem happens when a project wants to interface
> with external code *not* using this pattern. There has been all manners
> of digital ink spilled about Cargo not "playing well" with
> non-Rust-centered build systems (Cabal is largely the same, but,
> rounding, "no one" is using Haskell in this way). Sure, Python and Rust
> both have "here's some C or C++ code, please build it" helper tools, but
> trying to use these to build existing projects that have long leveraged
> the flexibility C and C++ builds have offered (say, HDF5) is like
> bringing a squeaky toy toolbox to a construction site: it's just not
> going to cut it for many widely-used existing projects. Consuming and
> understanding extant external code is fraught under such a model and
> that's where a lot of C++'s value is to large projects.
>
> Now, what do I think it would take to make this stuff much more possible
> within the limits we have? SG15 is discussing it. What has been proposed
> (though I am not aware of a paper number as yet) is basically some
> sidecar metadata to say something to the effect of "here is what C++
> information is important to consume this project". Rust has this as
> crate metadata (not typically distributed) and just needs to be told
> "here is a compiled crate, please use", Python has some mechanism for
> its packages as well (including `.pth` files and other things that have
> accumulated over the years) that can supplement available packages.
> These tools know how to handle this and consume it natively. Given that
> there's already have an implementation out there, and the sidecar
> metadata hasn't even been formally proposed, trying to mandate any such
> metadata at this point is like starting to build a cart for a horse that
> is already standing at the starting gate. So, C++ build systems are the
> level at which this is dealt with at this point (though that doesn't
> preclude some basic support from compilers themselves, it is not trivial
> and I don't foresee implementers chomping at the bit to put even more
> fractally detailed work onto their plates).
>
> In short, I would describe it as "with great power comes great
> responsibility". The power of modules to consume APIs more precisely
> beyond being equivalent a fancy:
>
> xargs -a included-files cat | $(CC) /dev/stdin
>
> and hoping everything seeing the same content gets the same idea of
> what's going on now comes with the responsibility to tell the compiler
> more about dependencies beyond "look here for API descriptions and this
> file to the linker" and hoping none of the following have occurred:
>
> - specify flags that modify the headers in some meaningful way
> - gave the wrong library to the linker
> - gave different directories to different TUs for the same include
> - disagree on what other dependencies used in the API mean
> (`_ITERATOR_DEBUG_LEVEL`, Boost's `NDEBUG`-optional members, etc.)
> - give the wrong headers for the library (e.g., macOS's SDK Python
> headers with a Homebrew Python library)
>
> Would it be ideal to just say something along the lines of:
>
> $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
> -c -o uses-boost.o \
> uses-boost.cpp
> $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
> -fdependency-metadata-output=uses-boost.1.0.0.json \
> -shared -o uses-boost.so \
> uses-boost.o
>
> Yes, I'd love it. But we're not there yet and until then, we'll need
> build systems to dig into any such `boost.latest.json` and translate
> that into flags to pass to compilers that exist today. Unfortunately,
> given the state of the standard library's sensitivity to consumer
> patterns, it is also subject to such things. It could be supported in
> the absolute simplest situations, but the path for that is *very* narrow
> and people will stray off the beaten path far too easily
> (`clang`/`clang-tidy` on Linux and `gcc` on macOS being the most common
> I can think of without even considering compiler flag interactions).
>
> --Ben
>
Cheers!
Le lun. 25 avr. 2022 à 21:19, Ben Boeckel <ben.boeckel_at_[hidden]> a
écrit :
> [ What follows is a personal opinion, not that of my role on CMake. I am
> also not an implementor, but hopefully I can at least clear some
> things up from my experience as a build systems guy. ]
>
> On Mon, Apr 25, 2022 at 18:41:46 -0400, Patrice Roy via Ext wrote:
> > I think a piece from this discussion is missing : there seems to be
> strong
> > resistance from some implementers as to supporting Tom's "Congrats Gaby"
> > hello-world-style program that would only depend on a modularized
> standard
> > library (let's leave Boost and other well-known but non-std libraries for
> > the moment). This resistance would be hard to explain to users without
> > knowing more about the reasons for this resistance.
> >
> > Would an implementer care to explain why this seems so unreasonable
> without
> > a build system? Ideally, comparing the "old-style" (lexically included
> > headers) approach to the modules-based approach.
> >
> > From this, it would at least be easier to explain to beginners why just
> > compiling their simple, standard-library-only programs requires more
> > tooling than it used to. Everyone would benefit from that knowledge, or
> so
> > it seems to me. My users are game programmers; they are experienced, they
> > use build systems, but they also compile small test code manually at the
> > command line and if they cannot use modules for this, they will ask why
> and
> > I would really like to have an answer. It's not a sand-castle vs
> skyscraper
> > issue; it's something they will need to know to integrate it in their
> > workflow.
>
> Note that I go further than just "standard-library-only" here, but the
> standard library is not immune to flags passed on the command line and
> can transform itself based on things like `-ffast-math` and other
> ABI-affecting flags that put it right back into the "cannot treat as
> built-in" that are common enough that shipping prebuilts per
> configuration is infeasible. Not to mention Linux setups where `clang`
> uses `libstdc++` or Apple where `gcc` uses `libc++` where the stdlib is
> suddenly *not* trivially "associated with the compiler" and is much
> closer to "just another external dependency". Or that that some
> toolchains have historically reused platform standard libraries rather
> than bringing their own (IIRC, pre-OneAPI `icc` and `pgi` have done this
> and though their direct applicability to C++20 is likely "low") or
> projects such as STLPort which have been standalone standard library
> implementations.
>
> I find the "requires more tooling" to be because the standard refuses to
> talk about code in anything other than in abstract "TU" components.
> While this has merits, it does have its costs. Because the standard does
> not talk about what `import foo;` means other than through verbiage like
> "makes names reachable", handling such code isn't grounded in anything
> beyond "implementers will provide mechanisms to make such imports have
> meaning". It has no relation to filesystems (be it conventional or
> "archive as filesystem" FUSE-like interfaces to other things that can be
> treated as filesystems in some way), so there needs to be some mechanism
> to translate `import foo;` into "here's what that means to this TU".
> Right now, we only have flags like `-reference` (MSVC) or
> `-fmodule-mapper=` (GCC) to specify these things, but filling these out
> is the hard part. Now, the compiler can certainly try to answer this on
> its own with some to-be-decided-upon rules, but C++ projects
> historically end up throwing all kinds of semantically meaningful
> metadata (read: compiler flags) on top of what *their* module means that
> any default setup that make any such guess unsuitable for some
> substantial portion of the userbase (cf. `FOO_IS_SHARED` defines for
> library visibility macro expansion, `BUILT_WITH_SOME_OPTIONAL_DEP`
> defines altering available APIs, `-Ofast` for some performance-sensitive
> component, `WITH_DEBUG_MEMBERS`, etc.).
>
> I'll also note that backwards compatibility has a *lot* of value in
> minimizing churn of known-working code, but it also ends up welding
> doors shut that one might want to use. Just as an example, the list
> representation in CMake makes the `;` an absolute landmine and
> complicates safely passing CMake values around). Would it be nice if one
> could just do `cmake -Dfoo=${foo}` to pass it along to some build
> command? Sure, but breaking every non-playground CMake project in the
> process is not worth that price.
>
> Could C++ have said things like "the source encoding must be compatible
> with module lookup namespaces" or "filenames must correlate with module
> names"? Sure. But then folks on non-utf-8 platforms or in non-Unicode
> locales get upset. Could C++ have then said "modules must be self
> contained" and allowed compilers to figure out what to do just based on
> the source? Sure, but then there'd be things like `#pragma flag`
> ifdeffery preludes or `/** semantic comment */` to do what is possible
> today without some other more structured mechanism also being available
> (for prior art, see Rust's `#![feature()]` and `#![cfg()]` attributes,
> Haskell's `{-# LANGUAGE #-}` syntax, Python's magic `from __future__`
> mechanism, or CMake's policy scopes).
>
> But it didn't. Did everyone understand that C++ chose a module system
> isomorphic to Fortran's instead of something like Python or Rust?
> Unlikely. But it's what we have. I can foresee projects being built
> using tools that cobble together a Rust-like or Haskell-like "here's a
> pile of sources and high-level dependency metadata, please build it"
> experience, but the problem happens when a project wants to interface
> with external code *not* using this pattern. There has been all manners
> of digital ink spilled about Cargo not "playing well" with
> non-Rust-centered build systems (Cabal is largely the same, but,
> rounding, "no one" is using Haskell in this way). Sure, Python and Rust
> both have "here's some C or C++ code, please build it" helper tools, but
> trying to use these to build existing projects that have long leveraged
> the flexibility C and C++ builds have offered (say, HDF5) is like
> bringing a squeaky toy toolbox to a construction site: it's just not
> going to cut it for many widely-used existing projects. Consuming and
> understanding extant external code is fraught under such a model and
> that's where a lot of C++'s value is to large projects.
>
> Now, what do I think it would take to make this stuff much more possible
> within the limits we have? SG15 is discussing it. What has been proposed
> (though I am not aware of a paper number as yet) is basically some
> sidecar metadata to say something to the effect of "here is what C++
> information is important to consume this project". Rust has this as
> crate metadata (not typically distributed) and just needs to be told
> "here is a compiled crate, please use", Python has some mechanism for
> its packages as well (including `.pth` files and other things that have
> accumulated over the years) that can supplement available packages.
> These tools know how to handle this and consume it natively. Given that
> there's already have an implementation out there, and the sidecar
> metadata hasn't even been formally proposed, trying to mandate any such
> metadata at this point is like starting to build a cart for a horse that
> is already standing at the starting gate. So, C++ build systems are the
> level at which this is dealt with at this point (though that doesn't
> preclude some basic support from compilers themselves, it is not trivial
> and I don't foresee implementers chomping at the bit to put even more
> fractally detailed work onto their plates).
>
> In short, I would describe it as "with great power comes great
> responsibility". The power of modules to consume APIs more precisely
> beyond being equivalent a fancy:
>
> xargs -a included-files cat | $(CC) /dev/stdin
>
> and hoping everything seeing the same content gets the same idea of
> what's going on now comes with the responsibility to tell the compiler
> more about dependencies beyond "look here for API descriptions and this
> file to the linker" and hoping none of the following have occurred:
>
> - specify flags that modify the headers in some meaningful way
> - gave the wrong library to the linker
> - gave different directories to different TUs for the same include
> - disagree on what other dependencies used in the API mean
> (`_ITERATOR_DEBUG_LEVEL`, Boost's `NDEBUG`-optional members, etc.)
> - give the wrong headers for the library (e.g., macOS's SDK Python
> headers with a Homebrew Python library)
>
> Would it be ideal to just say something along the lines of:
>
> $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
> -c -o uses-boost.o \
> uses-boost.cpp
> $(CC) -fdepend-on=/path/to/boost-regex.latest.json \
> -fdependency-metadata-output=uses-boost.1.0.0.json \
> -shared -o uses-boost.so \
> uses-boost.o
>
> Yes, I'd love it. But we're not there yet and until then, we'll need
> build systems to dig into any such `boost.latest.json` and translate
> that into flags to pass to compilers that exist today. Unfortunately,
> given the state of the standard library's sensitivity to consumer
> patterns, it is also subject to such things. It could be supported in
> the absolute simplest situations, but the path for that is *very* narrow
> and people will stray off the beaten path far too easily
> (`clang`/`clang-tidy` on Linux and `gcc` on macOS being the most common
> I can think of without even considering compiler flag interactions).
>
> --Ben
>
Received on 2022-04-26 14:48:46