ISOCPP std-proposals List: Re: [std-proposals] Revising #pragma once

From: Gašper Ažman <gasper.azman_at_[hidden]>
Date: Sun, 1 Sep 2024 14:10:29 +0100

... Just so you are aware, we sometimes run over of commandline max length
because of the number of -I flags. Also note that it's common to have -I
dir/ -I dir/subdir/ on the commandline. How do you translate that to your
schema?

On Sun, Sep 1, 2024, 10:42 Sebastian Wittmeier via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> We established that
>
> - there are situations, two header files in different projects are the
> same, and have to be included twice (master header)
>
> - there are situations, two header files in different projects are the
> same, and have to be included once (zlib.h)
>
>
>
> Path:
>
> Each include is either
>
> - done by <>
>
> - done by "" relative to a different include file or the cpp file
>
>
>
> In the case of <> uniqueness is important to prevent libraries (or their
> headers) to be imported more than once.
>
>
>
> In the case of "" uniqueness, doubled imports within the project should be
> avoided. Circular imports would probably give an error anyway.
>
>
>
>
>
> If we strictly use the path in the C++ files (not the path in the file
> system), we should get
>
>
>
> [<systemheader_filepath>]/["projectheader_filepath"]
>
> with [] for optional
>
> E.g. <mylib/importall.h>/"mylib_internal/types.h"
>
>
>
>
>
> Why is this path not enough?
>
>
>
> The zlib.h case with headers-only libraries?
>
> Those should probably keep symbol-dependent include guards.
>
>
>
> Relying on this path should make once independent of the build environment
> and the storage of the files.
>
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> *Von:* Mike Reed via Std-Proposals <std-proposals_at_[hidden]>
> *Gesendet:* So 01.09.2024 09:57
> *Betreff:* Re: [std-proposals] Revising #pragma once
> *An:* Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>;
> *CC:* Mike Reed <mike_at_[hidden]>; Tom Honermann <tom_at_[hidden]>;
> Of course one could argue that you went through all that pain *because*
> there isn't a standard on #once. If there were then everyone would
> implement it the same, devs would use it knowing exactly what to expect and
> you wouldn't have gone through all that pain.
> The problem is, is it possible to produce such a spec? Does the current
> proposal succeed where the previous proposal failed?
> If we introduce a new directive "#once" then there is not so much pressure
> to be backwards compatible with *all* #pragma once directive implentations
> (which is an impossible ask). Clearly, the behaviour of #include and #once
> are intimately linked. As already pointed out, #include fetches a
> resource, #once blocks that resource from being fetched more than once. So
> we simply (ha!) have to specify how the include tree is built...
>
> but with modules on the horizon is it worth it? Maybe, if it looks like
> modules are staying on the horizon :/
>
> On Sun, 1 Sept 2024, 06:32 Tom Honermann via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
> On 8/30/24 5:50 AM, Gašper Ažman wrote:
>
> If you knew up-front you wouldn't do it :).
>
> This happens though. There are people who generate code - whole include
> trees - and for plugins they end up looking very very similar or identical.
> And then as the build engineer my users complain that "their build doesn't
> work on the cloud build environment" and I have to somehow *find their bug*.
>
> Think about it. How the hell do you diagnose this silent noninclusion?
>
> In my experience, very carefully, over a long period of time, with much
> hair pulling and lots of WTFs.
>
> I'm going to relate my most memorable experience with this feature below,
> but I'll preface this by saying that the root cause in this case was a bug
> in the implementation. Readers are free to say, yeah, bugs happen, nothing
> to see here. and that is fair. The point is not (just) to relay yet another
> story from the trenches, but to enforce Gašper's point that debugging a
> case where #pragma once fails is about as hard as debugging UB or a
> miscompilation.
>
> I was a Coverity developer for ~10 years working on the Coverity C and C++
> front ends. Coverity is one of a number of tools in the unenviable position
> of having to emulate the behaviors (intended and otherwise) of many other
> compilers. The emulation requirements extend to compiler extensions,
> implementation-defined behavior, and even unspecified behavior. Coverity
> includes multiple C and C++ front end implementations, one of which is
> chosen to use to emulate the behavior of a given compiler. Exact matching
> of front end behavior for all supported compilers is not economically
> feasible, so different behavior becomes observable at times. Sometimes in
> surprising ways.
>
> It was uncommon for me to get directly involved in debugging a problem
> happening in a customer environment. A bug report came in that Coverity was
> failing to successfully compile parts of the customer's code; errors were
> being reported about undeclared names and failed overload resolution. The
> customer was using MSVC which, of course, was having no issues compiling
> their code. The first course of action in such cases is to request
> preprocessed output produced by both the native compiler (MSVC) and
> Coverity's front end (don't get me started about how much harder C++
> modules will make this first step if programmers start actually using C++
> modules, but I digress). The reported errors reproduced easily using the
> Coverity produced preprocessed output. Searching for the undeclared names
> from the error messages in the MSVC preprocessed output made it clear early
> on that large chunks of the source code were simply missing. Thanks to
> #line directives in the MSVC preprocessed output, it wasn't too hard to
> correlate the missing content with a particular source file and a request
> was sent to the customer to share that file, which they did. The file used
> a proper header guard (it did not use #pragma once), so I naturally went
> down the rabbit hole of trying to figure out why the guard macro was
> apparently defined when Coverity processed the file, but not when MSVC did.
> It isn't all that uncommon for Coverity customers to have to implement
> workarounds to address cases where Coverity doesn't exactly emulate some
> aspect of another compiler, so I assumed that the customer had probably
> applied a workaround for some other issue that was somehow now contributing
> to this issue and I started asking those "what did you do?" questions in
> the most polite corporate phrasing I could come up with.
>
> I didn't get to blame the customer for doing something silly. They weren't
> doing anything weird or unusual. I spent lots of time looking at code in
> our front end trying to imagine how we could possibly be processing a
> header guard incorrectly. The front end cached knowledge about header
> guards and used that information to avoid re-tokenizing a re-included
> source file. I wondered if, somehow, an invalid cache hit was happening,
> but nothing seemed wrong. Header guards are not rocket science.
>
> I wondered if include paths were somehow inconsistent and if Coverity was
> resolving the #include directive to a different file; perhaps an empty
> file or one with a different header guard macro. No such luck. I didn't get
> to blame the customer for having multiple copies of a file with
> inconsistent contents in different locations.
>
> It got worse. It turned out there wasn't a single file that was
> problematic. On different runs, content would be missing for different
> source files.
>
> I started asking about the customer build environment. That was when we
> got the first indication of where to look that was actually on the right
> path. The customer was building source code hosted on a network storage
> device. I wondered about some kind of weird filesystem corruption that
> caused an empty file to sometimes be served instead of the actual file
> contents. I hypothesized improbable bugs in which an empty file was only
> produced with the file system APIs that Coverity used and not with the ones
> that MSVC used. I asked the customer to verify that their network storage
> device had all updates applied.
>
> I tried replicating the problem in house using network storage devices
> available within the company. No luck of course.
>
> I eventually had to resort to sending the customer builds of Coverity with
> extra debugging instrumentation to run in their environment. I'm sure
> everyone reading this can appreciate that, when things get to the point
> that you are asking customers to run one-off modified builds in their
> production environment in order to figure out WTF is going wrong with your
> software, that things are just not going well. I was still focused on the
> header guard cache, so that is where I added some debug instrumentation.
> The header guard cache naturally relied on recognition of the same file
> being re-included and used inodes to do so. So I added logging which, after
> several iterations of narrowing the logging, much to my surprise,
> demonstrated that the same inode was being returned for distinct files. My
> head exploded.
>
> Remember when 640K was enough for everybody? These days, 64 bits isn't
> enough for an inode. Coverity was using the Win32
> GetFileInformationByHandle()
> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle>
> function to retrieve file inodes (file IDs or indexes in the Win32
> documentation). The inode/ID/index information is returned in the
> BY_HANDLE_FILE_INFORMATION
> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information>
> structure; the combination of the dwVolumeSerialNumber, nFileIndexHigh,
> and nFileIndexLow data members provide for unique identification for any
> open file.
>
> Pause here and read that last sentence again. For any *open* file. The
> documentation isn't particularly explicit about this, but filesystems are
> free to reuse file index values once the last handle for a file is closed.
> In this case, the network storage device was apparently mapping its 128 bit
> inodes to a pool of 64-bit inodes that it recycled as files were closed.
> Coverity wasn't keeping previously included files open so its cached inodes
> were effectively dangling pointers.
>
> The fix was simple. We disabled use of inodes to recognize re-included
> files (on Windows). We could have considered reworking the code to call the
> newer Win32 GetFileInformationByHandleEx()
> <https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getfileinformationbyhandleex>
> function to obtain a 128 bit file ID via the FILE_ID_INFO
> <https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_info>
> structure. I don't know if that would have been any more reliable; this
> function might also only guarantee stability of a file ID while a file is
> open; the documentation isn't clear on this. Rewriting the code to keep all
> files open for the duration of the compilation would have required
> significant change and would have required raising limits on number of open
> files.
>
> MSVC didn't hit this problem because it relies on canonical path
> comparisons rather than filesystem identity to recognize re-included files.
> If I remember correctly, it is possible to defeat the MSVC #pragma once
> implementation through use of sym links, hard links, junction points,
> etc... However, you just end up with double inclusion in that case;
> something that is much easier to diagnose than mysterious never inclusion.
>
> I was lucky that I was able to reference the MSVC preprocessed output in
> order to determine that chunks of source code were missing. Comparison with
> other implementations isn't necessarily an option for general compiler
> users.
>
> We were also lucky that we were able to just disable the broken inode
> dependent code. Thankfully, header guards Just WorkTM without need for
> fancy and complicated mechanisms. #pragma once can't.
>
> Tom.
>
>
> On Fri, Aug 30, 2024 at 9:30 AM Marcin Jaczewski <
> marcinjaczewski86_at_[hidden]> wrote:
>
> pt., 30 sie 2024 o 01:20 Gašper Ažman via Std-Proposals
> <std-proposals_at_[hidden]> napisał(a):
> >
> > Darlings,
> >
> > byte-identical is just plain incorrect. Consider.
> >
> > [library1/public.hpp]
> > #pragma once
> > #include "utils.hpp"
> >
> > [library1/utils.hpp]
> > class lib1 {};
> >
> > library2/public.hpp
> > #pragma once
> > #include "utils.hpp"
> >
> > [library2/utils.hpp]
> > class lib2 {};
> >
> > [main.cpp]
> > #include "library1/public.hpp"
> > #include "library2/public.hpp" # boom, library2/utils.hpp does not get
> included
> >
> > same-contents also means same-relative-include-trees. Congratulations.
> >
>
> Question is do you know upfront that files are problematic?
> Then we could do something like this:
> a) we only consider normalized paths
> b) use have option to add path mappings that are used in path comparison
>
> Like:
>
> We have folders `libs/fooV1` and `libs/fooV2`. Both folders are symlinks
> but by default when compiler load `libs/fooV1/bar.h` and `libs/fooV2/bar.h`
> the compiler considers them as separate files even when they are the same
> file.
> Now we add compiler flag `-PI libs/fooV1:libs/fooV2` (similar to `-I`) and
> now
> when compiler load `libs/fooV1/foo.h` he consider it as if he load
> `libs/fooV2/bar.h` and thus next loading of `libs/fooV2/bar.h` will be
> blocked.
>
> And this should be ver dumb process it could be cases when
> `libs/fooV1/bar.h`
> and `libs/fooV2/bar.h` are unrelated but if you make specific maping it
> will
> override even diffrnet files. This could be useful to hacking some broken
> includes like:
>
> ```
> -PI hack/foo/bar.h:libs/fooV2/bar.h
> ```
>
> and we can in our files do:
>
> ```
> #include "hack/foo/bar.h"
> #include "libs/fooV2/foo.h" // it will ignore `#include "bar.h"`
> ```
>
> Could mechnism like this work on your build system?
>
> > On Fri, Aug 30, 2024 at 12:15 AM Jeremy Rifkin via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>
> >> > In this very thread there are examples showing why taking only the
> content into account doesn't work but it was brushed off as "that can be
> fixed".
> >>
> >> I'm sorry you feel I have brushed any presented examples off. I have
> >> found them all immensely helpful for consideration. It's hard for me
> >> to imagine times when you'd want the same include-guarded content
> >> included twice, however I found the example of a "main header"
> >> compelling. The example of a header that only defines macros you undef
> >> is also not impractical.
> >>
> >> However, there are also compelling examples for filesystem identity.
> >> Mainly the multiple mount point issue.
> >>
> >> I think both can be reasonable, however, I have been trying to
> >> understand the most probable failure modes. While I originally
> >> proposed a content-based definition, I do think a filesystem-based
> >> definition is closer to current semantics and expectations.
> >>
> >> Jeremy
> >>
> >> On Thu, Aug 29, 2024 at 5:24 PM Breno Guimarães via Std-Proposals
> >> <std-proposals_at_[hidden]> wrote:
> >> >
> >> > To add to that, the whole idea is to standardize standard practice.
> If the first thing you do is to change spec to something else, then you're
> not standardizing standard practice, you are adding a new feature that
> inconveniently clashes with an existing one.
> >> >
> >> > In this very thread there are examples showing why taking only the
> content into account doesn't work but it was brushed off as "that can be
> fixed".
> >> >
> >> > None of this make sense to me.
> >> >
> >> >
> >> > Em qui., 29 de ago. de 2024 18:59, Tiago Freire via Std-Proposals <
> std-proposals_at_[hidden]> escreveu:
> >> >>
> >> >> Again, hashing content... totally unnecessary.
> >> >>
> >> >> There's no need to identify "same content" which as far as I can see
> can be defeated by modifications that don't change the interpretation, like
> spaces, which although not technically a violation of "same content" it
> clearly defeats the intent.
> >> >>
> >> >> An include summons a resource, a pragma once bars that resources
> from bey re-summonable. That's it. File paths should be more than enough.
> >> >>
> >> >> I'm unconvinced that the "bad cases" are not just a product of bad
> build architecture, if done properly a compiler should never be presented
> with multiple alternatives of the same file. And putting such requirements
> on compilers puts an unnecessary burden on developers to support a scenario
> that it is that is arguably bad practice.
> >> >>
> >> >> The argument is "prgma once" is supported everywhere it is good, we
> should make it official in the standard, effectively no change to a
> compiler should occur as a consequence.
> >> >> If a change needs to occur, then in fact "your version" of what you
> mean by "pragma once" is actually "not supported" by all the major
> compilers.
> >> >>
> >> >> Current compiler support of "pragma once" and it's usage on cross
> platform projects have a particular way of dealing with dependencies in
> mind. That workflow works. It's pointless to have this discussion if you
> don't understand that flow, and you shouldn't tailor the tool to a workflow
> that doesn't exist to the detriment of all.
> >> >>
> >> >>
> >> >> ________________________________
> >> >> From: Std-Proposals <std-proposals-bounces_at_[hidden]> on
> behalf of Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden]>
> >> >> Sent: Thursday, August 29, 2024 9:56:18 PM
> >> >> To: Tom Honermann <tom_at_[hidden]>
> >> >> Cc: Jeremy Rifkin <rifkin.jer_at_[hidden]>;
> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
> >> >> Subject: Re: [std-proposals] Revising #pragma once
> >> >>
> >> >> Performance should be fine if using a content definition. An
> implementation can do inode/path checks against files it already knows of,
> as a fast path. The first time a file is #included it’s just a hash+table
> lookup to decide whether to continue.
> >> >>
> >> >> Regarding the filesystem definition vs content definition question,
> while I think a content-based definition is robust I can see there is FUD
> about it and also an argument about current practice being a
> filesystem-based definition. It may just be best to approach this as
> filesystem uniqueness to the implementation’s ability, with a requirement
> that symbolic links/hard links are handled. This doesn’t cover the case of
> multiple mount points, but we’ve discussed that that’s impossible with
> #pragma once without using contents instead.
> >> >>
> >> >> Jeremy
> >> >>
> >> >> On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden]>
> wrote:
> >> >>>
> >> >>> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote:
> >> >>>
> >> >>> Another question is whether the comparison should be post
> translation
> >> >>> phase 1.
> >> >>>
> >> >>> I gave this some thought while drafting the proposal. I think it
> comes
> >> >>> down to whether the intent is single inclusion of files or single
> >> >>> inclusion of contents.
> >> >>>
> >> >>> Indeed. The proposal currently favors the "same contents" approach
> and offers the following wording.
> >> >>>
> >> >>> A preprocessing directive of the form
> >> >>> # pragma once new-line
> >> >>> shall cause no subsequent #include directives to perform
> replacement for a file with text contents identical to this file.
> >> >>>
> >> >>> The wording will have to define what it means for contents to be
> identical. Options include:
> >> >>>
> >> >>> The files must be byte-for-byte identical. This makes source file
> encoding observable (which I would be strongly against).
> >> >>> The files must encode the same character sequence post translation
> phase 1. This makes comparisons potentially expensive.
> >> >>>
> >> >>> Note that the "same contents" approach obligates an implementation
> to consider every previously encountered file for every #include directive.
> An inode based optimization can help to determine if a file was previously
> encountered based on identity, but it doesn't help to reduce the costs when
> a file that was not previously seen is encountered.
> >> >>>
> >> >>>
> >> >>> Tom.
> >> >>>
> >> >>> Jeremy
> >> >>>
> >> >>> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals
> >> >>> <std-proposals_at_[hidden]> wrote:
> >> >>>
> >> >>> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote:
> >> >>>
> >> >>> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via
> Std-Proposals
> >> >>> wrote:
> >> >>>
> >> >>> The fact that gcc took the approach to compare file contents I
> consider
> >> >>> a poor choice, and not an argument to standardize this
> implementation.
> >> >>>
> >> >>> Another question is whether a byte comparison of two files of the
> same size is
> >> >>> expensive for compilers.
> >> >>>
> >> >>> #once ID doesn't need to compare the entire file.
> >> >>>
> >> >>> Another question is whether the comparison should be post
> translation
> >> >>> phase 1. In other words, whether differently encoded source files
> that
> >> >>> decode to the same sequence of code points are considered the same
> file
> >> >>> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++
> does
> >> >>> not currently allow source file encoding to be observable but a
> #pragma
> >> >>> once implementation that only compares bytes would make such
> differences
> >> >>> observable.
> >> >>>
> >> >>> Tom.
> >> >>>
> >> >>> --
> >> >>> Std-Proposals mailing list
> >> >>> Std-Proposals_at_[hidden]
> >> >>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >> >>
> >> >>
> >> >> --
> >> >> Std-Proposals mailing list
> >> >> Std-Proposals_at_[hidden]
> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >> >
> >> > --
> >> > Std-Proposals mailing list
> >> > Std-Proposals_at_[hidden]
> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >> --
> >> Std-Proposals mailing list
> >> Std-Proposals_at_[hidden]
> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2024-09-01 13:10:46