Date: Thu, 2 Jun 2022 14:32:44 -0400
On Thu, Jun 02, 2022 at 11:57:11 -0400, Daniel Ruoso via SG15 wrote:
> I have been trying to work on laying out the requirements for the
> metadata format for modules distributed in pre-built libraries, and,
> in that context, I started thinking about how we describe importable
> headers.
CMake stores this information as:
```cmake
target_sources(Imported::Bar
FILE_SET TYPE CXX_MODULE_HEADERS
BASE_DIRS
/usr/include/bar
FILES
/usr/include/bar/a/bad/name.h
/usr/include/bar/a/bad/other.h)
```
`BASE_DIRS` may not overlap, so each file has one and only one name
relative to one of the base directories. This is what I think should
be used for the name of any importable header.
> While for named modules we have a clear understanding that the build
> system needs to assemble an unambiguous mapping of the logical name to
> a specific translation unit ahead of time, the same is not as clearly
> true for importable headers.
>
> My main current question is:
>
> Is it a requirement that the primary source file for
> a header unit being imported matches the source
> file that would be included given the specific
> compiler arguments?
>
> Let me try to give an example, let's say that:
>
> 1. libraries "foo", "bar" and "baz" are being used by a single project.
>
> 2. library "foo" contains a header file named "a/bad/name.h" in the
> include directory that it advertises (e.g.:
> /usr/include/foo/a/bad/name.h, with -I/usr/include/foo).
>
> 3. library "bar" also contains a header files named "a/bad/name.h"
> and "a/bad/other.h" in its include directory (e.g.:
> /usr/include/bar/a/bad/name.h and /usr/include/bar/a/bad/other.h, with
> -I/usr/include/bar)
>
> 4. library "baz" install other headers, and unfortunately ends up
> adding an include directory that overlaps with library "bar" (e.g.:
> -I/usr/include/bar/a/).
>
> 5. library "bar" advertises that "a/bad/name.h" and "a/bad/other.h"
> are importable headers
>
> 6. the translation unit of a project ends up with "-I/usr/include/foo
> -I/usr/include/bar -I/usr/include/bar/a" in the command line arguments
> for its translation units.
>
> 6. filea.cpp does "import <a/bad/name.h>"
>
> 7. fileb.cpp does "#include <a/bad/name.h>"
>
> 8. filec.cpp does "import <bad/other.h>"
>
> 9. filed.cpp does "#include <bad/other.h>"
>
> Given this scenario, here's a few questions:
>
> A. Should filea.cpp ignore the incoherent -I arguments and assume
> that the importable header can only mean the one that was advertised?
> Or should the build system and the compiler work together to map the
> import statement to a specific resolution and error when trying to
> translate filea.cpp since foo's header is not importable?
I have tests for duplicate names in the sandbox repo:
https://github.com/mathstuf/cxx-modules-sandbox/blob/master/link-use-mask/CMakeLists.txt
The resolution order is "once we see a module of name X, all others with
that name are ignored". Since CMake doesn't know that foo's headers are
importable, it will treat them as not importable (as we have no rules to
make BMIs for them).
However, that is for named modules (where conflicts are IFNDR AFAIK).
For headers, they use `unique-on-source-path` set to `true` which means
that it is hooked up via the full path to the source, not the logical
name, for what module this actually means. As long as the scanner and
compiler agree, this is no worse than having a local `zlib.h` that
shadows a system one pre-modules (where `<zlib.h>` and `"zlib.h"` could
get you different files).
> B. Since "a/bad/name.h" is an importable header, is the compiler free
> to optimize away the include directive in fileb.cpp? Or should it work
> with the build system to determine that in this particular translation
> it cannot do that optimization?
The plan with CMake is that each importable header has a `BASE_DIR` from
its `FILE_SET` declaration. That is the logical name CMake will report
it as during scanning at least (the full path may also be passed if
needed). But as long as the scanner reports what the compiler will do, I
don't think it matters much.
> C. Even though library "bar" meant to offer <a/bad/other.h>, the
> incidental include statement added by library "baz" results in
> <bad/other.h> being a valid name for inclusion. Should filec.cpp find
> the same header unit and import it? Or should it give an error because
> the interface is not being used as expected?
I think it'd be possible to find it, but if `<a/bad/other.h>` is what we
tell the compiler is importable (rather than
`<a/bad/other.h>,/usr/include/bar/a/bad/other.h`), I don't know how that
determination could be made.
> D. Given that "#include <bad/other.h>" results in the inclusion of
> the same file that was advertised as an importable header, should the
> compiler be allowed to replace the include statement in filed.cpp by
> an import?
Not if it wasn't scanned that way (because CMake won't know how to tell
it where the BMI is if the scanner didn't report it as imported).
Another case to consider is where a symlink `/usr/include/fun ->
/usr/include/bar` exists and it gets placed onto the `-I` (or included
via `<fun/a/bad/other.h>`. I don't know of a good way to handle this
off-hand (repeat for junctions/reparse points on Windows). FWIW, I'm
also fine with saying "don't use symlinks that confuse things without
expecting things to get confused".
--Ben
> I have been trying to work on laying out the requirements for the
> metadata format for modules distributed in pre-built libraries, and,
> in that context, I started thinking about how we describe importable
> headers.
CMake stores this information as:
```cmake
target_sources(Imported::Bar
FILE_SET TYPE CXX_MODULE_HEADERS
BASE_DIRS
/usr/include/bar
FILES
/usr/include/bar/a/bad/name.h
/usr/include/bar/a/bad/other.h)
```
`BASE_DIRS` may not overlap, so each file has one and only one name
relative to one of the base directories. This is what I think should
be used for the name of any importable header.
> While for named modules we have a clear understanding that the build
> system needs to assemble an unambiguous mapping of the logical name to
> a specific translation unit ahead of time, the same is not as clearly
> true for importable headers.
>
> My main current question is:
>
> Is it a requirement that the primary source file for
> a header unit being imported matches the source
> file that would be included given the specific
> compiler arguments?
>
> Let me try to give an example, let's say that:
>
> 1. libraries "foo", "bar" and "baz" are being used by a single project.
>
> 2. library "foo" contains a header file named "a/bad/name.h" in the
> include directory that it advertises (e.g.:
> /usr/include/foo/a/bad/name.h, with -I/usr/include/foo).
>
> 3. library "bar" also contains a header files named "a/bad/name.h"
> and "a/bad/other.h" in its include directory (e.g.:
> /usr/include/bar/a/bad/name.h and /usr/include/bar/a/bad/other.h, with
> -I/usr/include/bar)
>
> 4. library "baz" install other headers, and unfortunately ends up
> adding an include directory that overlaps with library "bar" (e.g.:
> -I/usr/include/bar/a/).
>
> 5. library "bar" advertises that "a/bad/name.h" and "a/bad/other.h"
> are importable headers
>
> 6. the translation unit of a project ends up with "-I/usr/include/foo
> -I/usr/include/bar -I/usr/include/bar/a" in the command line arguments
> for its translation units.
>
> 6. filea.cpp does "import <a/bad/name.h>"
>
> 7. fileb.cpp does "#include <a/bad/name.h>"
>
> 8. filec.cpp does "import <bad/other.h>"
>
> 9. filed.cpp does "#include <bad/other.h>"
>
> Given this scenario, here's a few questions:
>
> A. Should filea.cpp ignore the incoherent -I arguments and assume
> that the importable header can only mean the one that was advertised?
> Or should the build system and the compiler work together to map the
> import statement to a specific resolution and error when trying to
> translate filea.cpp since foo's header is not importable?
I have tests for duplicate names in the sandbox repo:
https://github.com/mathstuf/cxx-modules-sandbox/blob/master/link-use-mask/CMakeLists.txt
The resolution order is "once we see a module of name X, all others with
that name are ignored". Since CMake doesn't know that foo's headers are
importable, it will treat them as not importable (as we have no rules to
make BMIs for them).
However, that is for named modules (where conflicts are IFNDR AFAIK).
For headers, they use `unique-on-source-path` set to `true` which means
that it is hooked up via the full path to the source, not the logical
name, for what module this actually means. As long as the scanner and
compiler agree, this is no worse than having a local `zlib.h` that
shadows a system one pre-modules (where `<zlib.h>` and `"zlib.h"` could
get you different files).
> B. Since "a/bad/name.h" is an importable header, is the compiler free
> to optimize away the include directive in fileb.cpp? Or should it work
> with the build system to determine that in this particular translation
> it cannot do that optimization?
The plan with CMake is that each importable header has a `BASE_DIR` from
its `FILE_SET` declaration. That is the logical name CMake will report
it as during scanning at least (the full path may also be passed if
needed). But as long as the scanner reports what the compiler will do, I
don't think it matters much.
> C. Even though library "bar" meant to offer <a/bad/other.h>, the
> incidental include statement added by library "baz" results in
> <bad/other.h> being a valid name for inclusion. Should filec.cpp find
> the same header unit and import it? Or should it give an error because
> the interface is not being used as expected?
I think it'd be possible to find it, but if `<a/bad/other.h>` is what we
tell the compiler is importable (rather than
`<a/bad/other.h>,/usr/include/bar/a/bad/other.h`), I don't know how that
determination could be made.
> D. Given that "#include <bad/other.h>" results in the inclusion of
> the same file that was advertised as an importable header, should the
> compiler be allowed to replace the include statement in filed.cpp by
> an import?
Not if it wasn't scanned that way (because CMake won't know how to tell
it where the BMI is if the scanner didn't report it as imported).
Another case to consider is where a symlink `/usr/include/fun ->
/usr/include/bar` exists and it gets placed onto the `-I` (or included
via `<fun/a/bad/other.h>`. I don't know of a good way to handle this
off-hand (repeat for junctions/reparse points on Windows). FWIW, I'm
also fine with saying "don't use symlinks that confuse things without
expecting things to get confused".
--Ben
Received on 2022-06-02 18:32:02