C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Revising #pragma once

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Sun, 1 Sep 2024 13:18:31 +0000
We are discussing this in terms of a C++ feature, but the truth is the problems being described here are related to the tool being used and the inability to figure out proper invariants.

The proper use of a build tool is as important as the language it tries to compile.
But how the build tool works is not part of the standard.
And as far as I 'm able to tell, every single tool is lacking in some aspect or another.
We can try to ignore the problem, but this is the problem.
The reason why modules don't work right now, and they will not work any time soon (at least without a major revision), and they will not fix this problem despite some of you believing that it will, it's because they don't integrate well with a build process.

Had for example, your include files be defined externally by the build tool and passed into the compiler instead of being defined in files, you could:
* automatic do dependency propagation
* simpler header change detection for more efficient rebuilds
* more finely tailor pre-compiled headers units
* And would resolve re-entry problems without the need for pragma once or include guards (and we wouldn’t be having this argument)

And that process could much more easily migrate from header files to modules.

But this is not the reality we live in. Those tools don't exist. And people have not been as actively trying to improve processes (build systems are hard and demotivating). Which is the problem!!!



Stories like those of Gaspar's are quite familiar.
I've spent quite some time refining build processes for the several companies I've worked with, to make sure content is strictly curated, and de-duplicated. New libraries when integrated into are adopted into this process.
And then we have stories like those of Tom, which seems to be a downside of trying to go beyond "same file path" in order to be over accommodating to environments that duplicate their paths/files. Which although with good intentions, they have unintended side effects, and end up doing more harm than good.
Although relatable, and I’m sorry you had to deal with it.
But we have to realize that this problem is artificially created, if your filesystem didn't allow for such duplication, we wouldn't be having this conversation.

And that's the thing, I have fixed this duplication problem for the projects I work with, permanently, for good.
Developers would have to work intentionally and really really hard and to even cause problems in this space.
If a problem were to even happen, it would happen immediately to the user introducing the problem, preventing them from introducing issues to begin with, and it would be trivial to trace.

My main mode of failure in this space isn't file/path duplication, it's faulty include guards.
My main source of workload isn't having to resolve hard to trace bad inclusion issues, it's resolving mislabeled include guards or having to type them to begin with.
And pragma once, completely eliminates all of my problems in this space for my environment when include guards wouldn't.
And the fact that adoption of "pragma once" vs "include guards" is going up and not down, despite the fact that it is not standard, indicate that this is increasingly the experience of more and more developers.

We need better tools; we need better processes. That is the problem here, not that “pragma once” doesn’t work.
I don’t like to type pragma once either, the better solution would be “nothing”, but until we have better tools and better processes, it works, it works better.
Maybe not for you, but nobody is forcing you to use it. Either you like it or not, this exists today, and nobody is forcing you to use it.
And you should not force others to use include guards, when pragma once clearly works better for their environment.




________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]p.org> on behalf of Mike Reed via Std-Proposals <std-proposals_at_[hidden]>
Sent: Sunday, September 1, 2024 9:57:15 AM
To: Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>
Cc: Mike Reed <mike_at_[hidden]>; Tom Honermann <tom_at_[hidden]>
Subject: Re: [std-proposals] Revising #pragma once

Of course one could argue that you went through all that pain *because* there isn't a standard on #once. If there were then everyone would implement it the same, devs would use it knowing exactly what to expect and you wouldn't have gone through all that pain.
The problem is, is it possible to produce such a spec? Does the current proposal succeed where the previous proposal failed?
If we introduce a new directive "#once" then there is not so much pressure to be backwards compatible with *all* #pragma once directive implentations (which is an impossible ask). Clearly, the behaviour of #include and #once are intimately linked. As already pointed out, #include fetches a resource, #once blocks that resource from being fetched more than once. So we simply (ha!) have to specify how the include tree is built...

 but with modules on the horizon is it worth it? Maybe, if it looks like modules are staying on the horizon :/

On Sun, 1 Sept 2024, 06:32 Tom Honermann via Std-Proposals, <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>> wrote:
On 8/30/24 5:50 AM, Gašper Ažman wrote:
If you knew up-front you wouldn't do it :).

This happens though. There are people who generate code - whole include trees - and for plugins they end up looking very very similar or identical. And then as the build engineer my users complain that "their build doesn't work on the cloud build environment" and I have to somehow *find their bug*.

Think about it. How the hell do you diagnose this silent noninclusion?

In my experience, very carefully, over a long period of time, with much hair pulling and lots of WTFs.

I'm going to relate my most memorable experience with this feature below, but I'll preface this by saying that the root cause in this case was a bug in the implementation. Readers are free to say, yeah, bugs happen, nothing to see here. and that is fair. The point is not (just) to relay yet another story from the trenches, but to enforce Gašper's point that debugging a case where #pragma once fails is about as hard as debugging UB or a miscompilation.

I was a Coverity developer for ~10 years working on the Coverity C and C++ front ends. Coverity is one of a number of tools in the unenviable position of having to emulate the behaviors (intended and otherwise) of many other compilers. The emulation requirements extend to compiler extensions, implementation-defined behavior, and even unspecified behavior. Coverity includes multiple C and C++ front end implementations, one of which is chosen to use to emulate the behavior of a given compiler. Exact matching of front end behavior for all supported compilers is not economically feasible, so different behavior becomes observable at times. Sometimes in surprising ways.

It was uncommon for me to get directly involved in debugging a problem happening in a customer environment. A bug report came in that Coverity was failing to successfully compile parts of the customer's code; errors were being reported about undeclared names and failed overload resolution. The customer was using MSVC which, of course, was having no issues compiling their code. The first course of action in such cases is to request preprocessed output produced by both the native compiler (MSVC) and Coverity's front end (don't get me started about how much harder C++ modules will make this first step if programmers start actually using C++ modules, but I digress). The reported errors reproduced easily using the Coverity produced preprocessed output. Searching for the undeclared names from the error messages in the MSVC preprocessed output made it clear early on that large chunks of the source code were simply missing. Thanks to #line directives in the MSVC preprocessed output, it wasn't too hard to correlate the missing content with a particular source file and a request was sent to the customer to share that file, which they did. The file used a proper header guard (it did not use #pragma once), so I naturally went down the rabbit hole of trying to figure out why the guard macro was apparently defined when Coverity processed the file, but not when MSVC did. It isn't all that uncommon for Coverity customers to have to implement workarounds to address cases where Coverity doesn't exactly emulate some aspect of another compiler, so I assumed that the customer had probably applied a workaround for some other issue that was somehow now contributing to this issue and I started asking those "what did you do?" questions in the most polite corporate phrasing I could come up with.

I didn't get to blame the customer for doing something silly. They weren't doing anything weird or unusual. I spent lots of time looking at code in our front end trying to imagine how we could possibly be processing a header guard incorrectly. The front end cached knowledge about header guards and used that information to avoid re-tokenizing a re-included source file. I wondered if, somehow, an invalid cache hit was happening, but nothing seemed wrong. Header guards are not rocket science.

I wondered if include paths were somehow inconsistent and if Coverity was resolving the #include directive to a different file; perhaps an empty file or one with a different header guard macro. No such luck. I didn't get to blame the customer for having multiple copies of a file with inconsistent contents in different locations.

It got worse. It turned out there wasn't a single file that was problematic. On different runs, content would be missing for different source files.

I started asking about the customer build environment. That was when we got the first indication of where to look that was actually on the right path. The customer was building source code hosted on a network storage device. I wondered about some kind of weird filesystem corruption that caused an empty file to sometimes be served instead of the actual file contents. I hypothesized improbable bugs in which an empty file was only produced with the file system APIs that Coverity used and not with the ones that MSVC used. I asked the customer to verify that their network storage device had all updates applied.

I tried replicating the problem in house using network storage devices available within the company. No luck of course.

I eventually had to resort to sending the customer builds of Coverity with extra debugging instrumentation to run in their environment. I'm sure everyone reading this can appreciate that, when things get to the point that you are asking customers to run one-off modified builds in their production environment in order to figure out WTF is going wrong with your software, that things are just not going well. I was still focused on the header guard cache, so that is where I added some debug instrumentation. The header guard cache naturally relied on recognition of the same file being re-included and used inodes to do so. So I added logging which, after several iterations of narrowing the logging, much to my surprise, demonstrated that the same inode was being returned for distinct files. My head exploded.

Remember when 640K was enough for everybody? These days, 64 bits isn't enough for an inode. Coverity was using the Win32 GetFileInformationByHandle()<https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle> function to retrieve file inodes (file IDs or indexes in the Win32 documentation). The inode/ID/index information is returned in the BY_HANDLE_FILE_INFORMATION<https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information> structure; the combination of the dwVolumeSerialNumber, nFileIndexHigh, and nFileIndexLow data members provide for unique identification for any open file.

Pause here and read that last sentence again. For any open file. The documentation isn't particularly explicit about this, but filesystems are free to reuse file index values once the last handle for a file is closed. In this case, the network storage device was apparently mapping its 128 bit inodes to a pool of 64-bit inodes that it recycled as files were closed. Coverity wasn't keeping previously included files open so its cached inodes were effectively dangling pointers.

The fix was simple. We disabled use of inodes to recognize re-included files (on Windows). We could have considered reworking the code to call the newer Win32 GetFileInformationByHandleEx()<https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getfileinformationbyhandleex> function to obtain a 128 bit file ID via the FILE_ID_INFO<https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_info> structure. I don't know if that would have been any more reliable; this function might also only guarantee stability of a file ID while a file is open; the documentation isn't clear on this. Rewriting the code to keep all files open for the duration of the compilation would have required significant change and would have required raising limits on number of open files.

MSVC didn't hit this problem because it relies on canonical path comparisons rather than filesystem identity to recognize re-included files. If I remember correctly, it is possible to defeat the MSVC #pragma once implementation through use of sym links, hard links, junction points, etc... However, you just end up with double inclusion in that case; something that is much easier to diagnose than mysterious never inclusion.

I was lucky that I was able to reference the MSVC preprocessed output in order to determine that chunks of source code were missing. Comparison with other implementations isn't necessarily an option for general compiler users.

We were also lucky that we were able to just disable the broken inode dependent code. Thankfully, header guards Just WorkTM without need for fancy and complicated mechanisms. #pragma once can't.

Tom.

On Fri, Aug 30, 2024 at 9:30 AM Marcin Jaczewski <marcinjaczewski86_at_[hidden]<mailto:marcinjaczewski86_at_[hidden]>> wrote:
pt., 30 sie 2024 o 01:20 Gašper Ažman via Std-Proposals
<std-proposals_at_[hidden]p.org<mailto:std-proposals_at_[hidden]>> napisał(a):
>
> Darlings,
>
> byte-identical is just plain incorrect. Consider.
>
> [library1/public.hpp]
> #pragma once
> #include "utils.hpp"
>
> [library1/utils.hpp]
> class lib1 {};
>
> library2/public.hpp
> #pragma once
> #include "utils.hpp"
>
> [library2/utils.hpp]
> class lib2 {};
>
> [main.cpp]
> #include "library1/public.hpp"
> #include "library2/public.hpp" # boom, library2/utils.hpp does not get included
>
> same-contents also means same-relative-include-trees. Congratulations.
>

Question is do you know upfront that files are problematic?
Then we could do something like this:
a) we only consider normalized paths
b) use have option to add path mappings that are used in path comparison

Like:

We have folders `libs/fooV1` and `libs/fooV2`. Both folders are symlinks
but by default when compiler load `libs/fooV1/bar.h` and `libs/fooV2/bar.h`
the compiler considers them as separate files even when they are the same file.
Now we add compiler flag `-PI libs/fooV1:libs/fooV2` (similar to `-I`) and now
when compiler load `libs/fooV1/foo.h` he consider it as if he load
`libs/fooV2/bar.h` and thus next loading of `libs/fooV2/bar.h` will be blocked.

And this should be ver dumb process it could be cases when `libs/fooV1/bar.h`
and `libs/fooV2/bar.h` are unrelated but if you make specific maping it will
override even diffrnet files. This could be useful to hacking some broken
includes like:

```
-PI hack/foo/bar.h:libs/fooV2/bar.h
```

and we can in our files do:

```
#include "hack/foo/bar.h"
#include "libs/fooV2/foo.h" // it will ignore `#include "bar.h"`
```

Could mechnism like this work on your build system?

> On Fri, Aug 30, 2024 at 12:15 AM Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>> wrote:
>>
>> > In this very thread there are examples showing why taking only the content into account doesn't work but it was brushed off as "that can be fixed".
>>
>> I'm sorry you feel I have brushed any presented examples off. I have
>> found them all immensely helpful for consideration. It's hard for me
>> to imagine times when you'd want the same include-guarded content
>> included twice, however I found the example of a "main header"
>> compelling. The example of a header that only defines macros you undef
>> is also not impractical.
>>
>> However, there are also compelling examples for filesystem identity.
>> Mainly the multiple mount point issue.
>>
>> I think both can be reasonable, however, I have been trying to
>> understand the most probable failure modes. While I originally
>> proposed a content-based definition, I do think a filesystem-based
>> definition is closer to current semantics and expectations.
>>
>> Jeremy
>>
>> On Thu, Aug 29, 2024 at 5:24 PM Breno Guimarães via Std-Proposals
>> <std-proposals_at_[hidden]g<mailto:std-proposals_at_[hidden]>> wrote:
>> >
>> > To add to that, the whole idea is to standardize standard practice. If the first thing you do is to change spec to something else, then you're not standardizing standard practice, you are adding a new feature that inconveniently clashes with an existing one.
>> >
>> > In this very thread there are examples showing why taking only the content into account doesn't work but it was brushed off as "that can be fixed".
>> >
>> > None of this make sense to me.
>> >
>> >
>> > Em qui., 29 de ago. de 2024 18:59, Tiago Freire via Std-Proposals <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>> escreveu:
>> >>
>> >> Again, hashing content... totally unnecessary.
>> >>
>> >> There's no need to identify "same content" which as far as I can see can be defeated by modifications that don't change the interpretation, like spaces, which although not technically a violation of "same content" it clearly defeats the intent.
>> >>
>> >> An include summons a resource, a pragma once bars that resources from bey re-summonable. That's it. File paths should be more than enough.
>> >>
>> >> I'm unconvinced that the "bad cases" are not just a product of bad build architecture, if done properly a compiler should never be presented with multiple alternatives of the same file. And putting such requirements on compilers puts an unnecessary burden on developers to support a scenario that it is that is arguably bad practice.
>> >>
>> >> The argument is "prgma once" is supported everywhere it is good, we should make it official in the standard, effectively no change to a compiler should occur as a consequence.
>> >> If a change needs to occur, then in fact "your version" of what you mean by "pragma once" is actually "not supported" by all the major compilers.
>> >>
>> >> Current compiler support of "pragma once" and it's usage on cross platform projects have a particular way of dealing with dependencies in mind. That workflow works. It's pointless to have this discussion if you don't understand that flow, and you shouldn't tailor the tool to a workflow that doesn't exist to the detriment of all.
>> >>
>> >>
>> >> ________________________________
>> >> From: Std-Proposals <std-proposals-bounces_at_[hidden]<mailto:std-proposals-bounces_at_[hidden]>> on behalf of Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>>
>> >> Sent: Thursday, August 29, 2024 9:56:18 PM
>> >> To: Tom Honermann <tom_at_honermann.net<mailto:tom_at_[hidden]>>
>> >> Cc: Jeremy Rifkin <rifkin.jer_at_[hidden]<mailto:rifkin.jer_at_[hidden]>>; std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]> <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>>
>> >> Subject: Re: [std-proposals] Revising #pragma once
>> >>
>> >> Performance should be fine if using a content definition. An implementation can do inode/path checks against files it already knows of, as a fast path. The first time a file is #included it’s just a hash+table lookup to decide whether to continue.
>> >>
>> >> Regarding the filesystem definition vs content definition question, while I think a content-based definition is robust I can see there is FUD about it and also an argument about current practice being a filesystem-based definition. It may just be best to approach this as filesystem uniqueness to the implementation’s ability, with a requirement that symbolic links/hard links are handled. This doesn’t cover the case of multiple mount points, but we’ve discussed that that’s impossible with #pragma once without using contents instead.
>> >>
>> >> Jeremy
>> >>
>> >> On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden]<mailto:tom_at_[hidden]>> wrote:
>> >>>
>> >>> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote:
>> >>>
>> >>> Another question is whether the comparison should be post translation
>> >>> phase 1.
>> >>>
>> >>> I gave this some thought while drafting the proposal. I think it comes
>> >>> down to whether the intent is single inclusion of files or single
>> >>> inclusion of contents.
>> >>>
>> >>> Indeed. The proposal currently favors the "same contents" approach and offers the following wording.
>> >>>
>> >>> A preprocessing directive of the form
>> >>> # pragma once new-line
>> >>> shall cause no subsequent #include directives to perform replacement for a file with text contents identical to this file.
>> >>>
>> >>> The wording will have to define what it means for contents to be identical. Options include:
>> >>>
>> >>> The files must be byte-for-byte identical. This makes source file encoding observable (which I would be strongly against).
>> >>> The files must encode the same character sequence post translation phase 1. This makes comparisons potentially expensive.
>> >>>
>> >>> Note that the "same contents" approach obligates an implementation to consider every previously encountered file for every #include directive. An inode based optimization can help to determine if a file was previously encountered based on identity, but it doesn't help to reduce the costs when a file that was not previously seen is encountered.
>> >>>
>> >>>
>> >>> Tom.
>> >>>
>> >>> Jeremy
>> >>>
>> >>> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals
>> >>> <std-proposals_at_[hidden]<mailto:std-proposals_at_[hidden]>> wrote:
>> >>>
>> >>> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote:
>> >>>
>> >>> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via Std-Proposals
>> >>> wrote:
>> >>>
>> >>> The fact that gcc took the approach to compare file contents I consider
>> >>> a poor choice, and not an argument to standardize this implementation.
>> >>>
>> >>> Another question is whether a byte comparison of two files of the same size is
>> >>> expensive for compilers.
>> >>>
>> >>> #once ID doesn't need to compare the entire file.
>> >>>
>> >>> Another question is whether the comparison should be post translation
>> >>> phase 1. In other words, whether differently encoded source files that
>> >>> decode to the same sequence of code points are considered the same file
>> >>> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++ does
>> >>> not currently allow source file encoding to be observable but a #pragma
>> >>> once implementation that only compares bytes would make such differences
>> >>> observable.
>> >>>
>> >>> Tom.
>> >>>
>> >>> --
>> >>> Std-Proposals mailing list
>> >>> Std-Proposals_at_[hidden]<mailto:Std-Proposals_at_[hidden]>
>> >>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >>
>> >>
>> >> --
>> >> Std-Proposals mailing list
>> >> Std-Proposals_at_[hidden]<mailto:Std-Proposals_at_[hidden]>
>> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >
>> > --
>> > Std-Proposals mailing list
>> > Std-Proposals_at_[hidden]<mailto:Std-Proposals_at_[hidden]>
>> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]<mailto:Std-Proposals_at_[hidden]>
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden].org<mailto:Std-Proposals_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]<mailto:Std-Proposals_at_[hidden]>
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2024-09-01 13:18:38