ISOCPP std-proposals List: Re: [std-proposals] Revising #pragma once

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Sun, 1 Sep 2024 11:42:10 +0200

We established that - there are situations, two header files in different projects are the same, and have to be included twice (master header) - there are situations, two header files in different projects are the same, and have to be included once (zlib.h) Path: Each include is either - done by <> - done by "" relative to a different include file or the cpp file In the case of <> uniqueness is important to prevent libraries (or their headers) to be imported more than once. In the case of "" uniqueness, doubled imports within the project should be avoided. Circular imports would probably give an error anyway. If we strictly use the path in the C++ files (not the path in the file system), we should get [<systemheader_filepath>]/["projectheader_filepath"] with [] for optional E.g. <mylib/importall.h>/"mylib_internal/types.h" Why is this path not enough? The zlib.h case with headers-only libraries? Those should probably keep symbol-dependent include guards. Relying on this path should make once independent of the build environment and the storage of the files. -----Ursprüngliche Nachricht----- Von:Mike Reed via Std-Proposals <std-proposals_at_[hidden]> Gesendet:So 01.09.2024 09:57 Betreff:Re: [std-proposals] Revising #pragma once An:Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>; CC:Mike Reed <mike_at_[hidden]>; Tom Honermann <tom_at_[hidden]>; Of course one could argue that you went through all that pain *because* there isn't a standard on #once. If there were then everyone would implement it the same, devs would use it knowing exactly what to expect and you wouldn't have gone through all that pain. The problem is, is it possible to produce such a spec? Does the current proposal succeed where the previous proposal failed? If we introduce a new directive "#once" then there is not so much pressure to be backwards compatible with *all* #pragma once directive implentations (which is an impossible ask). Clearly, the behaviour of #include and #once are intimately linked. As already pointed out, #include fetches a resource, #once blocks that resource from being fetched more than once. So we simply (ha!) have to specify how the include tree is built... but with modules on the horizon is it worth it? Maybe, if it looks like modules are staying on the horizon :/ On Sun, 1 Sept 2024, 06:32 Tom Honermann via Std-Proposals, <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: On 8/30/24 5:50 AM, Gašper Ažman wrote: If you knew up-front you wouldn't do it :). This happens though. There are people who generate code - whole include trees - and for plugins they end up looking very very similar or identical. And then as the build engineer my users complain that "their build doesn't work on the cloud build environment" and I have to somehow *find their bug*. Think about it. How the hell do you diagnose this silent noninclusion? In my experience, very carefully, over a long period of time, with much hair pulling and lots of WTFs. I'm going to relate my most memorable experience with this feature below, but I'll preface this by saying that the root cause in this case was a bug in the implementation. Readers are free to say, yeah, bugs happen, nothing to see here. and that is fair. The point is not (just) to relay yet another story from the trenches, but to enforce Gašper's point that debugging a case where #pragma once fails is about as hard as debugging UB or a miscompilation. I was a Coverity developer for ~10 years working on the Coverity C and C++ front ends. Coverity is one of a number of tools in the unenviable position of having to emulate the behaviors (intended and otherwise) of many other compilers. The emulation requirements extend to compiler extensions, implementation-defined behavior, and even unspecified behavior. Coverity includes multiple C and C++ front end implementations, one of which is chosen to use to emulate the behavior of a given compiler. Exact matching of front end behavior for all supported compilers is not economically feasible, so different behavior becomes observable at times. Sometimes in surprising ways. It was uncommon for me to get directly involved in debugging a problem happening in a customer environment. A bug report came in that Coverity was failing to successfully compile parts of the customer's code; errors were being reported about undeclared names and failed overload resolution. The customer was using MSVC which, of course, was having no issues compiling their code. The first course of action in such cases is to request preprocessed output produced by both the native compiler (MSVC) and Coverity's front end (don't get me started about how much harder C++ modules will make this first step if programmers start actually using C++ modules, but I digress). The reported errors reproduced easily using the Coverity produced preprocessed output. Searching for the undeclared names from the error messages in the MSVC preprocessed output made it clear early on that large chunks of the source code were simply missing. Thanks to #line directives in the MSVC preprocessed output, it wasn't too hard to correlate the missing content with a particular source file and a request was sent to the customer to share that file, which they did. The file used a proper header guard (it did not use #pragma once), so I naturally went down the rabbit hole of trying to figure out why the guard macro was apparently defined when Coverity processed the file, but not when MSVC did. It isn't all that uncommon for Coverity customers to have to implement workarounds to address cases where Coverity doesn't exactly emulate some aspect of another compiler, so I assumed that the customer had probably applied a workaround for some other issue that was somehow now contributing to this issue and I started asking those "what did you do?" questions in the most polite corporate phrasing I could come up with. I didn't get to blame the customer for doing something silly. They weren't doing anything weird or unusual. I spent lots of time looking at code in our front end trying to imagine how we could possibly be processing a header guard incorrectly. The front end cached knowledge about header guards and used that information to avoid re-tokenizing a re-included source file. I wondered if, somehow, an invalid cache hit was happening, but nothing seemed wrong. Header guards are not rocket science. I wondered if include paths were somehow inconsistent and if Coverity was resolving the #include directive to a different file; perhaps an empty file or one with a different header guard macro. No such luck. I didn't get to blame the customer for having multiple copies of a file with inconsistent contents in different locations. It got worse. It turned out there wasn't a single file that was problematic. On different runs, content would be missing for different source files. I started asking about the customer build environment. That was when we got the first indication of where to look that was actually on the right path. The customer was building source code hosted on a network storage device. I wondered about some kind of weird filesystem corruption that caused an empty file to sometimes be served instead of the actual file contents. I hypothesized improbable bugs in which an empty file was only produced with the file system APIs that Coverity used and not with the ones that MSVC used. I asked the customer to verify that their network storage device had all updates applied. I tried replicating the problem in house using network storage devices available within the company. No luck of course. I eventually had to resort to sending the customer builds of Coverity with extra debugging instrumentation to run in their environment. I'm sure everyone reading this can appreciate that, when things get to the point that you are asking customers to run one-off modified builds in their production environment in order to figure out WTF is going wrong with your software, that things are just not going well. I was still focused on the header guard cache, so that is where I added some debug instrumentation. The header guard cache naturally relied on recognition of the same file being re-included and used inodes to do so. So I added logging which, after several iterations of narrowing the logging, much to my surprise, demonstrated that the same inode was being returned for distinct files. My head exploded. Remember when 640K was enough for everybody? These days, 64 bits isn't enough for an inode. Coverity was using the Win32 GetFileInformationByHandle() function to retrieve file inodes (file IDs or indexes in the Win32 documentation). The inode/ID/index information is returned in the BY_HANDLE_FILE_INFORMATION structure; the combination of the dwVolumeSerialNumber, nFileIndexHigh, and nFileIndexLow data members provide for unique identification for any open file. Pause here and read that last sentence again. For any open file. The documentation isn't particularly explicit about this, but filesystems are free to reuse file index values once the last handle for a file is closed. In this case, the network storage device was apparently mapping its 128 bit inodes to a pool of 64-bit inodes that it recycled as files were closed. Coverity wasn't keeping previously included files open so its cached inodes were effectively dangling pointers. The fix was simple. We disabled use of inodes to recognize re-included files (on Windows). We could have considered reworking the code to call the newer Win32 GetFileInformationByHandleEx() function to obtain a 128 bit file ID via the FILE_ID_INFO structure. I don't know if that would have been any more reliable; this function might also only guarantee stability of a file ID while a file is open; the documentation isn't clear on this. Rewriting the code to keep all files open for the duration of the compilation would have required significant change and would have required raising limits on number of open files. MSVC didn't hit this problem because it relies on canonical path comparisons rather than filesystem identity to recognize re-included files. If I remember correctly, it is possible to defeat the MSVC #pragma once implementation through use of sym links, hard links, junction points, etc... However, you just end up with double inclusion in that case; something that is much easier to diagnose than mysterious never inclusion. I was lucky that I was able to reference the MSVC preprocessed output in order to determine that chunks of source code were missing. Comparison with other implementations isn't necessarily an option for general compiler users. We were also lucky that we were able to just disable the broken inode dependent code. Thankfully, header guards Just WorkTM without need for fancy and complicated mechanisms. #pragma once can't. Tom. On Fri, Aug 30, 2024 at 9:30 AM Marcin Jaczewski <marcinjaczewski86_at_[hidden] <mailto:marcinjaczewski86_at_[hidden]> > wrote: pt., 30 sie 2024 o 01:20 Gašper Ažman via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > napisał(a): > > Darlings, > > byte-identical is just plain incorrect. Consider. > > [library1/public.hpp] > #pragma once > #include "utils.hpp" > > [library1/utils.hpp] > class lib1 {}; > > library2/public.hpp > #pragma once > #include "utils.hpp" > > [library2/utils.hpp] > class lib2 {}; > > [main.cpp] > #include "library1/public.hpp" > #include "library2/public.hpp" # boom, library2/utils.hpp does not get included > > same-contents also means same-relative-include-trees. Congratulations. > Question is do you know upfront that files are problematic? Then we could do something like this: a) we only consider normalized paths b) use have option to add path mappings that are used in path comparison Like: We have folders `libs/fooV1` and `libs/fooV2`. Both folders are symlinks but by default when compiler load `libs/fooV1/bar.h` and `libs/fooV2/bar.h` the compiler considers them as separate files even when they are the same file. Now we add compiler flag `-PI libs/fooV1:libs/fooV2` (similar to `-I`) and now when compiler load `libs/fooV1/foo.h` he consider it as if he load `libs/fooV2/bar.h` and thus next loading of `libs/fooV2/bar.h` will be blocked. And this should be ver dumb process it could be cases when `libs/fooV1/bar.h` and `libs/fooV2/bar.h` are unrelated but if you make specific maping it will override even diffrnet files. This could be useful to hacking some broken includes like: ``` -PI hack/foo/bar.h:libs/fooV2/bar.h ``` and we can in our files do: ``` #include "hack/foo/bar.h" #include "libs/fooV2/foo.h" // it will ignore `#include "bar.h"` ``` Could mechnism like this work on your build system? > On Fri, Aug 30, 2024 at 12:15 AM Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: >> >> > In this very thread there are examples showing why taking only the content into account doesn't work but it was brushed off as "that can be fixed". >> >> I'm sorry you feel I have brushed any presented examples off. I have >> found them all immensely helpful for consideration. It's hard for me >> to imagine times when you'd want the same include-guarded content >> included twice, however I found the example of a "main header" >> compelling. The example of a header that only defines macros you undef >> is also not impractical. >> >> However, there are also compelling examples for filesystem identity. >> Mainly the multiple mount point issue. >> >> I think both can be reasonable, however, I have been trying to >> understand the most probable failure modes. While I originally >> proposed a content-based definition, I do think a filesystem-based >> definition is closer to current semantics and expectations. >> >> Jeremy >> >> On Thu, Aug 29, 2024 at 5:24 PM Breno Guimarães via Std-Proposals >> <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: >> > >> > To add to that, the whole idea is to standardize standard practice. If the first thing you do is to change spec to something else, then you're not standardizing standard practice, you are adding a new feature that inconveniently clashes with an existing one. >> > >> > In this very thread there are examples showing why taking only the content into account doesn't work but it was brushed off as "that can be fixed". >> > >> > None of this make sense to me. >> > >> > >> > Em qui., 29 de ago. de 2024 18:59, Tiago Freire via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > escreveu: >> >> >> >> Again, hashing content... totally unnecessary. >> >> >> >> There's no need to identify "same content" which as far as I can see can be defeated by modifications that don't change the interpretation, like spaces, which although not technically a violation of "same content" it clearly defeats the intent. >> >> >> >> An include summons a resource, a pragma once bars that resources from bey re-summonable. That's it. File paths should be more than enough. >> >> >> >> I'm unconvinced that the "bad cases" are not just a product of bad build architecture, if done properly a compiler should never be presented with multiple alternatives of the same file. And putting such requirements on compilers puts an unnecessary burden on developers to support a scenario that it is that is arguably bad practice. >> >> >> >> The argument is "prgma once" is supported everywhere it is good, we should make it official in the standard, effectively no change to a compiler should occur as a consequence. >> >> If a change needs to occur, then in fact "your version" of what you mean by "pragma once" is actually "not supported" by all the major compilers. >> >> >> >> Current compiler support of "pragma once" and it's usage on cross platform projects have a particular way of dealing with dependencies in mind. That workflow works. It's pointless to have this discussion if you don't understand that flow, and you shouldn't tailor the tool to a workflow that doesn't exist to the detriment of all. >> >> >> >> >> >> ________________________________ >> >> From: Std-Proposals <std-proposals-bounces_at_[hidden] <mailto:std-proposals-bounces_at_[hidden]> > on behalf of Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > >> >> Sent: Thursday, August 29, 2024 9:56:18 PM >> >> To: Tom Honermann <tom_at_[hidden] <mailto:tom_at_[hidden]> > >> >> Cc: Jeremy Rifkin <rifkin.jer_at_[hidden] <mailto:rifkin.jer_at_[hidden]> >; std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > >> >> Subject: Re: [std-proposals] Revising #pragma once >> >> >> >> Performance should be fine if using a content definition. An implementation can do inode/path checks against files it already knows of, as a fast path. The first time a file is #included it’s just a hash+table lookup to decide whether to continue. >> >> >> >> Regarding the filesystem definition vs content definition question, while I think a content-based definition is robust I can see there is FUD about it and also an argument about current practice being a filesystem-based definition. It may just be best to approach this as filesystem uniqueness to the implementation’s ability, with a requirement that symbolic links/hard links are handled. This doesn’t cover the case of multiple mount points, but we’ve discussed that that’s impossible with #pragma once without using contents instead. >> >> >> >> Jeremy >> >> >> >> On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden] <mailto:tom_at_[hidden]> > wrote: >> >>> >> >>> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote: >> >>> >> >>> Another question is whether the comparison should be post translation >> >>> phase 1. >> >>> >> >>> I gave this some thought while drafting the proposal. I think it comes >> >>> down to whether the intent is single inclusion of files or single >> >>> inclusion of contents. >> >>> >> >>> Indeed. The proposal currently favors the "same contents" approach and offers the following wording. >> >>> >> >>> A preprocessing directive of the form >> >>> # pragma once new-line >> >>> shall cause no subsequent #include directives to perform replacement for a file with text contents identical to this file. >> >>> >> >>> The wording will have to define what it means for contents to be identical. Options include: >> >>> >> >>> The files must be byte-for-byte identical. This makes source file encoding observable (which I would be strongly against). >> >>> The files must encode the same character sequence post translation phase 1. This makes comparisons potentially expensive. >> >>> >> >>> Note that the "same contents" approach obligates an implementation to consider every previously encountered file for every #include directive. An inode based optimization can help to determine if a file was previously encountered based on identity, but it doesn't help to reduce the costs when a file that was not previously seen is encountered. >> >>> >> >>> >> >>> Tom. >> >>> >> >>> Jeremy >> >>> >> >>> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals >> >>> <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: >> >>> >> >>> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote: >> >>> >> >>> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via Std-Proposals >> >>> wrote: >> >>> >> >>> The fact that gcc took the approach to compare file contents I consider >> >>> a poor choice, and not an argument to standardize this implementation. >> >>> >> >>> Another question is whether a byte comparison of two files of the same size is >> >>> expensive for compilers. >> >>> >> >>> #once ID doesn't need to compare the entire file. >> >>> >> >>> Another question is whether the comparison should be post translation >> >>> phase 1. In other words, whether differently encoded source files that >> >>> decode to the same sequence of code points are considered the same file >> >>> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++ does >> >>> not currently allow source file encoding to be observable but a #pragma >> >>> once implementation that only compares bytes would make such differences >> >>> observable. >> >>> >> >>> Tom. >> >>> >> >>> -- >> >>> Std-Proposals mailing list >> >>> Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> >> >>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals >> >> >> >> >> >> -- >> >> Std-Proposals mailing list >> >> Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals >> > >> > -- >> > Std-Proposals mailing list >> > Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals >> -- >> Std-Proposals mailing list >> Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals > > -- > Std-Proposals mailing list > Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals -- Std-Proposals mailing list Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2024-09-01 09:42:15