ISOCPP SG15 List: Re: "logical name" of importable headers

From: Daniel Ruoso <daniel_at_[hidden]>
Date: Thu, 2 Jun 2022 15:56:16 -0400

Em qui., 2 de jun. de 2022 às 14:32, Ben Boeckel
<ben.boeckel_at_[hidden]> escreveu:
> `BASE_DIRS` may not overlap, so each file has one and only one name
> relative to one of the base directories. This is what I think should
> be used for the name of any importable header.

That is a bit tangential to my question, IIUC.

My main question is on how the build system communicates to the
compiler which importable headers exist:

Option 1: Name as it appears in the import statement, without the
full path (i.e.: `<a/bad/name.h>`), which would mean that any import
statement would consume the given header unit, regardless of what `-I`
was given on the compiler command line (i.e.: filea.cpp works and
imports bar's header unit), also assume that if the token after
`#include` matches an importable header, it means the header unit,
regardless of the `-I`.

Option 2: Option 1, but don't assume you can replace an `#include` by
an `import`, since we don't actually have the path to the header file.

Option 3: Name it as the file logically formed by concatenating the
`-I` with the `import` statement as well as ""-include rules. (i.e.:
`/opt/bb/include/bar/a/bad/name.h`), which means the compiler would
need to resolve the path to a header that needs to be imported before
matching it to the list of importable headers. This means that `import
<bad/other.h>` and `import <a/bad/other.h>` would be equivalent, and
so would be `import "other.h"` from the same directory. But `import
<fun/a/bad/other.h>` would not work (for the case where `fun` is a
symlink that is also in the `-I`).

Option 4: Option 3, but normalize the files with `realpath` or some
other mechanism (e.g.: stat's device id + inode). This solves the
problems with symlinks as well as the usage of `..` in the import or
include, but it incurs a significant additional cost as canonicalizing
all those files will potentially result in a very large number of
system calls.

Option 5: Name it as a tuple of the name used in the import and the
path where it was found (e.g.: `<a/bad/other.h>,/usr/include/bar`).
This means the compiler would still need to resolve the location of
the imported header file, but the header unit would only be usable if
it was imported as expected. This would also mean `import "other.h"`
would not work unless it's explicitly declared that way, and it would
be a separate header unit in that case.

Option 6: Option 5, but normalize the directories with `realpath` or
some other mechanism (e.g.: stat's device id + inode). This would
solve the problems with symlinks to the directories, as long as the
import statement uses the same name.

None of those options seem like an obvious choice to me.

Option 1 would be the "cleanest", imho, but that is incredibly
backwards-incompatible.

Option 2 would be a compromise on the backwards-incompatibility, but
it would remove the "replace-include-by-import" optimization.

Option 4 would be the most backwards-compatible, but it's not clear to
me that we want that much backwards compatibility for import
statements, and it's likely very expensive.

Option 3 would remove the excessive cost of Option 4, but it would not
be resilient to symlinks or the usage of `..`. The limits would show
up as either failed imports or less clarity on when an include
statement gets replaced by an import.

Options 5 and 6 are interesting compromise solutions, but they
compromise a lot. They solve the semantic problem of not importing the
header by the intended interface, but at the cost of the same header
unit being translated many times, or simply fail the import statement
entirely. It's also not going to be clear to the user when an include
would be replaced by an import.

I'm interested in hearing where folks stand on that.

I am, personally, partial to Option 2. I know it's a neat
optimization, but it generates a *lot* of complexity.

Going with Option 2 also means we wouldn't need to provide the list of
importable headers to the dependency scanning step, since the output
would no longer depend on that list.

It would also be more clear to the users, since in that case
`#include` never depends on the module mapping, and `import` never
depends on `-I`.

daniel

Received on 2022-06-02 19:56:28

sg15