C++ Logo

sg15

Advanced search

Re: "logical name" of importable headers

From: Nathan Sidwell <nathan_at_[hidden]>
Date: Fri, 3 Jun 2022 07:26:14 -0400
On 6/2/22 23:06, Steve Downey via SG15 wrote:
> The case of "config.h" being an importable header, and different across
> translation units in the same build has to be supported.
> ```import "config.h" ``` has to handle collisions, as does the rewritten
> form for #include for "known to be importable". This may mean compilers
> have to stat include paths before concluding about import translation,
> and we have to distinguish absolute paths for the cached effective bmi.
> I don't think we can require uniqueness as we do for module names.

Indeed. Even in header-file land I don't think there's general
agreement about what constitutes 'the same header'. (And hence some
horrible corner cases of '#pragma once'/ #import)

1) Is the pathname the determiner? (what about systems that have a
mixture of case-folding and case-preserving file systems?)

2) Is the drive/inode (or equivalent) the determiner. (this would
resolve symlinks)

3) Is the [hash of] contents the determiner?

4) Is the sequence of post-phase-3 tokens the determiner?

I'd be surprised if there are compilers that choose #4, but I think #2
and #3 are used, (and possibly #1?)

I like something like Ben's description of CMAKE -- a bunch of root
directories and a bunch of pathnames relative to a particular root. A
header-unit's name is its relative path and a unique identifier of the
root to which it is relative -- the containing project name if you will.
  But even that runs into problems if there are cross-project symlinks
or cross-project verbatim copies (depending on which of the above
heuristics one chooses for 'same header file').

> I'm pretty sure I could torture the wording to really require this, but
> I'm also pretty sure this is the intent.

It's my interpretation of the wording. Use the name by which the
compiler opens the header-file.

nathan

>
> On Thu, Jun 2, 2022 at 10:18 PM Gabriel Dos Reis via SG15
> <sg15_at_[hidden] <mailto:sg15_at_[hidden]>> wrote:
>
> To add to Olga's excellent summary:
>
> - MSVC looks at <header> and "header" as logical names of the
> headers, as written in the source code. For example, <vector> is
> not the same as <bar/vector> even if both might resolved to the same
> physical find being found by the '#include' algorithm.
> - MSVC looks at <drive:/absolute/path> or "drive:/absolute/path"
> as *hard coded* ID for the header unit.
>
> MSVC recommends the standard notation of <header> or "header" as the
> preferred notation for headers (and it emits that in its BMI, IFC
> file). That allows relocation and other form of cloud builds where
> all that matters is what is written in the source code (for
> reproducibility), and not the exact location on the drive filesystem
> - imagine building in labs and distributing the result on consumers'
> machine, different from the fancy labs set up.
>
> If you ask for what we (SG15) should recommend: the logical name as
> normally written in the input source file, NOT the physical
> location of the resolution of the logical header or header file.
>
> -- Gaby
>
> -----Original Message-----
> From: SG15 <sg15-bounces_at_[hidden]
> <mailto:sg15-bounces_at_[hidden]>> On Behalf Of Olga Arkhipova
> via SG15
> Sent: Thursday, June 2, 2022 6:58 PM
> To: sg15_at_[hidden] <mailto:sg15_at_[hidden]>; Ben
> Boeckel <ben.boeckel_at_[hidden] <mailto:ben.boeckel_at_[hidden]>>
> Cc: Olga Arkhipova <olgaark_at_[hidden]
> <mailto:olgaark_at_[hidden]>>
> Subject: Re: [SG15] "logical name" of importable headers
>
> >> My main question is on how the build system communicates to the
> compiler which importable headers exist:
>
> Yes, we've struggled with this question too and came up with the
> following:
>
> cl.exe /headerUnit switch has the following options
>
> /headerUnit header-filename=ifc-filename
> /headerUnit:quote [header-filename=ifc-filename]
> /headerUnit:angle [header-filename=ifc-filename]
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fbuild%2Freference%2Fheaderunit%3Fview%3Dmsvc-170&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=L5QH%2Bw%2Bp%2BR4VWTGNQwt1S9JoJxyBxdbf33zE2VL7e24%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fbuild%2Freference%2Fheaderunit%3Fview%3Dmsvc-170&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=L5QH%2Bw%2Bp%2BR4VWTGNQwt1S9JoJxyBxdbf33zE2VL7e24%3D&amp;reserved=0>
>
> In other words, header unit "logical name" can be a full path to .h
> or <a/b/header.h> or "header.h" forms similar to the ones used in
> the code, but does not require to exactly match the code usage (see
> below). The command line requires to contain all necessary -I to be
> able to find the imported .h, the same as for #include.
>
> The compiler would resolve the imported .h using include path and do
> the same for /headerUnit <> and "" options to obtain full paths.
> Then it will use the full paths to match the import and the header
> unit specified on the command line.
>
> In other words, the resolved header file path is used as a header
> unit ID.
>
> As file path is unique, there is no ambiguity. This also allows some
> flexibility in header unit "logical names" - as soon as they are
> resolved to the same path, the header unit BMI will be used. As
> symlinks are different file system entities, they obviously will not
> be matched to non symlinks locations. But I believe this is no
> different than headers resolution today.
>
> This does require the .h file (and not only BMI) to be present on
> the machine (or rather file system) as well as a set of -I on the
> command line. But this is not different from today's headers usage
> and should not be a big problem.
>
> MSVC will not create BMIs on its own and always require them to be
> specified on the command line.
>
> The build system knows which BMIs it needs to build from the
> following info:
> - user directly specifying the headers to be built as header units
> - scan data of the sources (if the build system supports automatic
> build of imported header units).
>
> In the last case the build system will recursively scan all imported
> headers and use original source base compilation options for header
> units' creation if they don't already exist.
>
> So to use a prebuilt header unit from a library the following will
> be needed
> - Directory of the header (or its parent dir) should be added to the
> include path (no different than today)
> - The "logical name" of the header unit (in the lib's metadata)
> would be <header.h> or <a/b/header.h> - whatever allows to find it
> in that directory. The full path can also be used if the library
> (and the header unit) is built on the user's machine.
>
> I believe header units were designed to ease the transition from
> #includes to modules and from this perspective it is desirable to
> keep the resolution as similar as possible to what is used in #includes.
>
> Thanks,
> Olga
>
>
> -----Original Message-----
> From: SG15 <sg15-bounces_at_[hidden]
> <mailto:sg15-bounces_at_[hidden]>> On Behalf Of Daniel Ruoso
> via SG15
> Sent: Thursday, June 2, 2022 12:56
> To: Ben Boeckel <ben.boeckel_at_[hidden]
> <mailto:ben.boeckel_at_[hidden]>>
> Cc: Daniel Ruoso <daniel_at_[hidden] <mailto:daniel_at_[hidden]>>;
> Daniel Ruoso via SG15 <sg15_at_[hidden]
> <mailto:sg15_at_[hidden]>>
> Subject: Re: [SG15] "logical name" of importable headers
>
> Em qui., 2 de jun. de 2022 às 14:32, Ben Boeckel
> <ben.boeckel_at_[hidden] <mailto:ben.boeckel_at_[hidden]>> escreveu:
> > `BASE_DIRS` may not overlap, so each file has one and only one name
> > relative to one of the base directories. This is what I think should
> > be used for the name of any importable header.
>
> That is a bit tangential to my question, IIUC.
>
> My main question is on how the build system communicates to the
> compiler which importable headers exist:
>
> Option 1: Name as it appears in the import statement, without the
> full path (i.e.: `<a/bad/name.h>`), which would mean that any import
> statement would consume the given header unit, regardless of what
> `-I` was given on the compiler command line (i.e.: filea.cpp works
> and imports bar's header unit), also assume that if the token after
> `#include` matches an importable header, it means the header unit,
> regardless of the `-I`.
>
> Option 2: Option 1, but don't assume you can replace an `#include`
> by an `import`, since we don't actually have the path to the header
> file.
>
> Option 3: Name it as the file logically formed by concatenating
> the `-I` with the `import` statement as well as ""-include rules. (i.e.:
> `/opt/bb/include/bar/a/bad/name.h`), which means the compiler would
> need to resolve the path to a header that needs to be imported
> before matching it to the list of importable headers. This means
> that `import <bad/other.h>` and `import <a/bad/other.h>` would be
> equivalent, and so would be `import "other.h"` from the same
> directory. But `import <fun/a/bad/other.h>` would not work (for the
> case where `fun` is a symlink that is also in the `-I`).
>
> Option 4: Option 3, but normalize the files with `realpath` or
> some other mechanism (e.g.: stat's device id + inode). This solves
> the problems with symlinks as well as the usage of `..` in the
> import or include, but it incurs a significant additional cost as
> canonicalizing all those files will potentially result in a very
> large number of system calls.
>
> Option 5: Name it as a tuple of the name used in the import and
> the path where it was found (e.g.: `<a/bad/other.h>,/usr/include/bar`).
> This means the compiler would still need to resolve the location of
> the imported header file, but the header unit would only be usable
> if it was imported as expected. This would also mean `import
> "other.h"` would not work unless it's explicitly declared that way,
> and it would be a separate header unit in that case.
>
> Option 6: Option 5, but normalize the directories with `realpath`
> or some other mechanism (e.g.: stat's device id + inode). This would
> solve the problems with symlinks to the directories, as long as the
> import statement uses the same name.
>
> None of those options seem like an obvious choice to me.
>
> Option 1 would be the "cleanest", imho, but that is incredibly
> backwards-incompatible.
>
> Option 2 would be a compromise on the backwards-incompatibility, but
> it would remove the "replace-include-by-import" optimization.
>
> Option 4 would be the most backwards-compatible, but it's not clear
> to me that we want that much backwards compatibility for import
> statements, and it's likely very expensive.
>
> Option 3 would remove the excessive cost of Option 4, but it would
> not be resilient to symlinks or the usage of `..`. The limits would
> show up as either failed imports or less clarity on when an include
> statement gets replaced by an import.
>
> Options 5 and 6 are interesting compromise solutions, but they
> compromise a lot. They solve the semantic problem of not importing
> the header by the intended interface, but at the cost of the same
> header unit being translated many times, or simply fail the import
> statement entirely. It's also not going to be clear to the user when
> an include would be replaced by an import.
>
> I'm interested in hearing where folks stand on that.
>
> I am, personally, partial to Option 2. I know it's a neat
> optimization, but it generates a *lot* of complexity.
>
> Going with Option 2 also means we wouldn't need to provide the list
> of importable headers to the dependency scanning step, since the
> output would no longer depend on that list.
>
> It would also be more clear to the users, since in that case
> `#include` never depends on the module mapping, and `import` never
> depends on `-I`.
>
> daniel
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15_at_[hidden]>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=DWYIq9pabsgc1LiaVJOgTg16%2FW5Ghyl4%2F3vHFyPxKn8%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=DWYIq9pabsgc1LiaVJOgTg16%2FW5Ghyl4%2F3vHFyPxKn8%3D&amp;reserved=0>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15_at_[hidden]>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=DWYIq9pabsgc1LiaVJOgTg16%2FW5Ghyl4%2F3vHFyPxKn8%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fsg15&amp;data=05%7C01%7Cgdr%40microsoft.com%7C74a361e470d2439c712f08da450482a7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637898182930186151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=DWYIq9pabsgc1LiaVJOgTg16%2FW5Ghyl4%2F3vHFyPxKn8%3D&amp;reserved=0>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg15>
>
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
>


-- 
Nathan Sidwell

Received on 2022-06-03 11:26:16