ISOCPP std-proposals List: Re: [std-proposals] Revising #pragma once

From: Mike Reed <mike_at_[hidden]>
Date: Sun, 1 Sep 2024 08:56:52 +0100

Of course one could argue that you went through all that pain *because*
there isn't a standard on #once. If there were then everyone would
implement it the same, devs would use it knowing exactly what to expect and
you wouldn't have gone through all that pain.
The problem is, is it possible to produce such a spec? Does the current
proposal succeed where the previous proposal failed?
If we introduce a new directive "#once" then there is not so much pressure
to be backwards compatible with *all* #pragma once directive implentations
(which is an impossible ask). Clearly, the behaviour of #include and #once
are intimately linked. As already pointed out, #include fetches a
resource, #once blocks that resource from being fetched more than once. So
we simply (ha!) have to specify how the include tree is built...

but with modules on the horizon is it worth it? Maybe, if it looks like
modules are staying on the horizon :/

On Sun, 1 Sept 2024, 06:32 Tom Honermann via Std-Proposals, <
std-proposals_at_[hidden]> wrote:

> On 8/30/24 5:50 AM, Gašper Ažman wrote:
>
> If you knew up-front you wouldn't do it :).
>
> This happens though. There are people who generate code - whole include
> trees - and for plugins they end up looking very very similar or identical.
> And then as the build engineer my users complain that "their build doesn't
> work on the cloud build environment" and I have to somehow *find their bug*.
>
> Think about it. How the hell do you diagnose this silent noninclusion?
>
> In my experience, very carefully, over a long period of time, with much
> hair pulling and lots of WTFs.
>
> I'm going to relate my most memorable experience with this feature below,
> but I'll preface this by saying that the root cause in this case was a bug
> in the implementation. Readers are free to say, yeah, bugs happen, nothing
> to see here. and that is fair. The point is not (just) to relay yet another
> story from the trenches, but to enforce Gašper's point that debugging a
> case where #pragma once fails is about as hard as debugging UB or a
> miscompilation.
>
> I was a Coverity developer for ~10 years working on the Coverity C and C++
> front ends. Coverity is one of a number of tools in the unenviable position
> of having to emulate the behaviors (intended and otherwise) of many other
> compilers. The emulation requirements extend to compiler extensions,
> implementation-defined behavior, and even unspecified behavior. Coverity
> includes multiple C and C++ front end implementations, one of which is
> chosen to use to emulate the behavior of a given compiler. Exact matching
> of front end behavior for all supported compilers is not economically
> feasible, so different behavior becomes observable at times. Sometimes in
> surprising ways.
>
> It was uncommon for me to get directly involved in debugging a problem
> happening in a customer environment. A bug report came in that Coverity was
> failing to successfully compile parts of the customer's code; errors were
> being reported about undeclared names and failed overload resolution. The
> customer was using MSVC which, of course, was having no issues compiling
> their code. The first course of action in such cases is to request
> preprocessed output produced by both the native compiler (MSVC) and
> Coverity's front end (don't get me started about how much harder C++
> modules will make this first step if programmers start actually using C++
> modules, but I digress). The reported errors reproduced easily using the
> Coverity produced preprocessed output. Searching for the undeclared names
> from the error messages in the MSVC preprocessed output made it clear early
> on that large chunks of the source code were simply missing. Thanks to
> #line directives in the MSVC preprocessed output, it wasn't too hard to
> correlate the missing content with a particular source file and a request
> was sent to the customer to share that file, which they did. The file used
> a proper header guard (it did not use #pragma once), so I naturally went
> down the rabbit hole of trying to figure out why the guard macro was
> apparently defined when Coverity processed the file, but not when MSVC did.
> It isn't all that uncommon for Coverity customers to have to implement
> workarounds to address cases where Coverity doesn't exactly emulate some
> aspect of another compiler, so I assumed that the customer had probably
> applied a workaround for some other issue that was somehow now contributing
> to this issue and I started asking those "what did you do?" questions in
> the most polite corporate phrasing I could come up with.
>
> I didn't get to blame the customer for doing something silly. They weren't
> doing anything weird or unusual. I spent lots of time looking at code in
> our front end trying to imagine how we could possibly be processing a
> header guard incorrectly. The front end cached knowledge about header
> guards and used that information to avoid re-tokenizing a re-included
> source file. I wondered if, somehow, an invalid cache hit was happening,
> but nothing seemed wrong. Header guards are not rocket science.
>
> I wondered if include paths were somehow inconsistent and if Coverity was
> resolving the #include directive to a different file; perhaps an empty
> file or one with a different header guard macro. No such luck. I didn't get
> to blame the customer for having multiple copies of a file with
> inconsistent contents in different locations.
>
> It got worse. It turned out there wasn't a single file that was
> problematic. On different runs, content would be missing for different
> source files.
>
> I started asking about the customer build environment. That was when we
> got the first indication of where to look that was actually on the right
> path. The customer was building source code hosted on a network storage
> device. I wondered about some kind of weird filesystem corruption that
> caused an empty file to sometimes be served instead of the actual file
> contents. I hypothesized improbable bugs in which an empty file was only
> produced with the file system APIs that Coverity used and not with the ones
> that MSVC used. I asked the customer to verify that their network storage
> device had all updates applied.
>
> I tried replicating the problem in house using network storage devices
> available within the company. No luck of course.
>
> I eventually had to resort to sending the customer builds of Coverity with
> extra debugging instrumentation to run in their environment. I'm sure
> everyone reading this can appreciate that, when things get to the point
> that you are asking customers to run one-off modified builds in their
> production environment in order to figure out WTF is going wrong with your
> software, that things are just not going well. I was still focused on the
> header guard cache, so that is where I added some debug instrumentation.
> The header guard cache naturally relied on recognition of the same file
> being re-included and used inodes to do so. So I added logging which, after
> several iterations of narrowing the logging, much to my surprise,
> demonstrated that the same inode was being returned for distinct files. My
> head exploded.
>
> Remember when 640K was enough for everybody? These days, 64 bits isn't
> enough for an inode. Coverity was using the Win32
> GetFileInformationByHandle()
> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle>
> function to retrieve file inodes (file IDs or indexes in the Win32
> documentation). The inode/ID/index information is returned in the
> BY_HANDLE_FILE_INFORMATION
> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information>
> structure; the combination of the dwVolumeSerialNumber, nFileIndexHigh,
> and nFileIndexLow data members provide for unique identification for any
> open file.
>
> Pause here and read that last sentence again. For any *open* file. The
> documentation isn't particularly explicit about this, but filesystems are
> free to reuse file index values once the last handle for a file is closed.
> In this case, the network storage device was apparently mapping its 128 bit
> inodes to a pool of 64-bit inodes that it recycled as files were closed.
> Coverity wasn't keeping previously included files open so its cached inodes
> were effectively dangling pointers.
>
> The fix was simple. We disabled use of inodes to recognize re-included
> files (on Windows). We could have considered reworking the code to call the
> newer Win32 GetFileInformationByHandleEx()
> <https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getfileinformationbyhandleex>
> function to obtain a 128 bit file ID via the FILE_ID_INFO
> <https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_info>
> structure. I don't know if that would have been any more reliable; this
> function might also only guarantee stability of a file ID while a file is
> open; the documentation isn't clear on this. Rewriting the code to keep all
> files open for the duration of the compilation would have required
> significant change and would have required raising limits on number of open
> files.
>
> MSVC didn't hit this problem because it relies on canonical path
> comparisons rather than filesystem identity to recognize re-included files.
> If I remember correctly, it is possible to defeat the MSVC #pragma once
> implementation through use of sym links, hard links, junction points,
> etc... However, you just end up with double inclusion in that case;
> something that is much easier to diagnose than mysterious never inclusion.
>
> I was lucky that I was able to reference the MSVC preprocessed output in
> order to determine that chunks of source code were missing. Comparison with
> other implementations isn't necessarily an option for general compiler
> users.
>
> We were also lucky that we were able to just disable the broken inode
> dependent code. Thankfully, header guards Just WorkTM without need for
> fancy and complicated mechanisms. #pragma once can't.
>
> Tom.
>
>
> On Fri, Aug 30, 2024 at 9:30 AM Marcin Jaczewski <
> marcinjaczewski86_at_[hidden]> wrote:
>
>> pt., 30 sie 2024 o 01:20 Gašper Ažman via Std-Proposals
>> <std-proposals_at_[hidden]> napisał(a):
>> >
>> > Darlings,
>> >
>> > byte-identical is just plain incorrect. Consider.
>> >
>> > [library1/public.hpp]
>> > #pragma once
>> > #include "utils.hpp"
>> >
>> > [library1/utils.hpp]
>> > class lib1 {};
>> >
>> > library2/public.hpp
>> > #pragma once
>> > #include "utils.hpp"
>> >
>> > [library2/utils.hpp]
>> > class lib2 {};
>> >
>> > [main.cpp]
>> > #include "library1/public.hpp"
>> > #include "library2/public.hpp" # boom, library2/utils.hpp does not get
>> included
>> >
>> > same-contents also means same-relative-include-trees. Congratulations.
>> >
>>
>> Question is do you know upfront that files are problematic?
>> Then we could do something like this:
>> a) we only consider normalized paths
>> b) use have option to add path mappings that are used in path comparison
>>
>> Like:
>>
>> We have folders `libs/fooV1` and `libs/fooV2`. Both folders are symlinks
>> but by default when compiler load `libs/fooV1/bar.h` and
>> `libs/fooV2/bar.h`
>> the compiler considers them as separate files even when they are the same
>> file.
>> Now we add compiler flag `-PI libs/fooV1:libs/fooV2` (similar to `-I`)
>> and now
>> when compiler load `libs/fooV1/foo.h` he consider it as if he load
>> `libs/fooV2/bar.h` and thus next loading of `libs/fooV2/bar.h` will be
>> blocked.
>>
>> And this should be ver dumb process it could be cases when
>> `libs/fooV1/bar.h`
>> and `libs/fooV2/bar.h` are unrelated but if you make specific maping it
>> will
>> override even diffrnet files. This could be useful to hacking some broken
>> includes like:
>>
>> ```
>> -PI hack/foo/bar.h:libs/fooV2/bar.h
>> ```
>>
>> and we can in our files do:
>>
>> ```
>> #include "hack/foo/bar.h"
>> #include "libs/fooV2/foo.h" // it will ignore `#include "bar.h"`
>> ```
>>
>> Could mechnism like this work on your build system?
>>
>> > On Fri, Aug 30, 2024 at 12:15 AM Jeremy Rifkin via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>> >>
>> >> > In this very thread there are examples showing why taking only the
>> content into account doesn't work but it was brushed off as "that can be
>> fixed".
>> >>
>> >> I'm sorry you feel I have brushed any presented examples off. I have
>> >> found them all immensely helpful for consideration. It's hard for me
>> >> to imagine times when you'd want the same include-guarded content
>> >> included twice, however I found the example of a "main header"
>> >> compelling. The example of a header that only defines macros you undef
>> >> is also not impractical.
>> >>
>> >> However, there are also compelling examples for filesystem identity.
>> >> Mainly the multiple mount point issue.
>> >>
>> >> I think both can be reasonable, however, I have been trying to
>> >> understand the most probable failure modes. While I originally
>> >> proposed a content-based definition, I do think a filesystem-based
>> >> definition is closer to current semantics and expectations.
>> >>
>> >> Jeremy
>> >>
>> >> On Thu, Aug 29, 2024 at 5:24 PM Breno Guimarães via Std-Proposals
>> >> <std-proposals_at_[hidden]> wrote:
>> >> >
>> >> > To add to that, the whole idea is to standardize standard practice.
>> If the first thing you do is to change spec to something else, then you're
>> not standardizing standard practice, you are adding a new feature that
>> inconveniently clashes with an existing one.
>> >> >
>> >> > In this very thread there are examples showing why taking only the
>> content into account doesn't work but it was brushed off as "that can be
>> fixed".
>> >> >
>> >> > None of this make sense to me.
>> >> >
>> >> >
>> >> > Em qui., 29 de ago. de 2024 18:59, Tiago Freire via Std-Proposals <
>> std-proposals_at_[hidden]> escreveu:
>> >> >>
>> >> >> Again, hashing content... totally unnecessary.
>> >> >>
>> >> >> There's no need to identify "same content" which as far as I can
>> see can be defeated by modifications that don't change the interpretation,
>> like spaces, which although not technically a violation of "same content"
>> it clearly defeats the intent.
>> >> >>
>> >> >> An include summons a resource, a pragma once bars that resources
>> from bey re-summonable. That's it. File paths should be more than enough.
>> >> >>
>> >> >> I'm unconvinced that the "bad cases" are not just a product of bad
>> build architecture, if done properly a compiler should never be presented
>> with multiple alternatives of the same file. And putting such requirements
>> on compilers puts an unnecessary burden on developers to support a scenario
>> that it is that is arguably bad practice.
>> >> >>
>> >> >> The argument is "prgma once" is supported everywhere it is good, we
>> should make it official in the standard, effectively no change to a
>> compiler should occur as a consequence.
>> >> >> If a change needs to occur, then in fact "your version" of what you
>> mean by "pragma once" is actually "not supported" by all the major
>> compilers.
>> >> >>
>> >> >> Current compiler support of "pragma once" and it's usage on cross
>> platform projects have a particular way of dealing with dependencies in
>> mind. That workflow works. It's pointless to have this discussion if you
>> don't understand that flow, and you shouldn't tailor the tool to a workflow
>> that doesn't exist to the detriment of all.
>> >> >>
>> >> >>
>> >> >> ________________________________
>> >> >> From: Std-Proposals <std-proposals-bounces_at_[hidden]> on
>> behalf of Jeremy Rifkin via Std-Proposals <std-proposals_at_[hidden]
>> >
>> >> >> Sent: Thursday, August 29, 2024 9:56:18 PM
>> >> >> To: Tom Honermann <tom_at_[hidden]>
>> >> >> Cc: Jeremy Rifkin <rifkin.jer_at_[hidden]>;
>> std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>> >> >> Subject: Re: [std-proposals] Revising #pragma once
>> >> >>
>> >> >> Performance should be fine if using a content definition. An
>> implementation can do inode/path checks against files it already knows of,
>> as a fast path. The first time a file is #included it’s just a hash+table
>> lookup to decide whether to continue.
>> >> >>
>> >> >> Regarding the filesystem definition vs content definition question,
>> while I think a content-based definition is robust I can see there is FUD
>> about it and also an argument about current practice being a
>> filesystem-based definition. It may just be best to approach this as
>> filesystem uniqueness to the implementation’s ability, with a requirement
>> that symbolic links/hard links are handled. This doesn’t cover the case of
>> multiple mount points, but we’ve discussed that that’s impossible with
>> #pragma once without using contents instead.
>> >> >>
>> >> >> Jeremy
>> >> >>
>> >> >> On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden]>
>> wrote:
>> >> >>>
>> >> >>> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote:
>> >> >>>
>> >> >>> Another question is whether the comparison should be post
>> translation
>> >> >>> phase 1.
>> >> >>>
>> >> >>> I gave this some thought while drafting the proposal. I think it
>> comes
>> >> >>> down to whether the intent is single inclusion of files or single
>> >> >>> inclusion of contents.
>> >> >>>
>> >> >>> Indeed. The proposal currently favors the "same contents" approach
>> and offers the following wording.
>> >> >>>
>> >> >>> A preprocessing directive of the form
>> >> >>> # pragma once new-line
>> >> >>> shall cause no subsequent #include directives to perform
>> replacement for a file with text contents identical to this file.
>> >> >>>
>> >> >>> The wording will have to define what it means for contents to be
>> identical. Options include:
>> >> >>>
>> >> >>> The files must be byte-for-byte identical. This makes source file
>> encoding observable (which I would be strongly against).
>> >> >>> The files must encode the same character sequence post translation
>> phase 1. This makes comparisons potentially expensive.
>> >> >>>
>> >> >>> Note that the "same contents" approach obligates an implementation
>> to consider every previously encountered file for every #include directive.
>> An inode based optimization can help to determine if a file was previously
>> encountered based on identity, but it doesn't help to reduce the costs when
>> a file that was not previously seen is encountered.
>> >> >>>
>> >> >>>
>> >> >>> Tom.
>> >> >>>
>> >> >>> Jeremy
>> >> >>>
>> >> >>> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals
>> >> >>> <std-proposals_at_[hidden]> wrote:
>> >> >>>
>> >> >>> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote:
>> >> >>>
>> >> >>> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via
>> Std-Proposals
>> >> >>> wrote:
>> >> >>>
>> >> >>> The fact that gcc took the approach to compare file contents I
>> consider
>> >> >>> a poor choice, and not an argument to standardize this
>> implementation.
>> >> >>>
>> >> >>> Another question is whether a byte comparison of two files of the
>> same size is
>> >> >>> expensive for compilers.
>> >> >>>
>> >> >>> #once ID doesn't need to compare the entire file.
>> >> >>>
>> >> >>> Another question is whether the comparison should be post
>> translation
>> >> >>> phase 1. In other words, whether differently encoded source files
>> that
>> >> >>> decode to the same sequence of code points are considered the same
>> file
>> >> >>> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++
>> does
>> >> >>> not currently allow source file encoding to be observable but a
>> #pragma
>> >> >>> once implementation that only compares bytes would make such
>> differences
>> >> >>> observable.
>> >> >>>
>> >> >>> Tom.
>> >> >>>
>> >> >>> --
>> >> >>> Std-Proposals mailing list
>> >> >>> Std-Proposals_at_[hidden]
>> >> >>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Std-Proposals mailing list
>> >> >> Std-Proposals_at_[hidden]
>> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >> >
>> >> > --
>> >> > Std-Proposals mailing list
>> >> > Std-Proposals_at_[hidden]
>> >> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >> --
>> >> Std-Proposals mailing list
>> >> Std-Proposals_at_[hidden]
>> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >
>> > --
>> > Std-Proposals mailing list
>> > Std-Proposals_at_[hidden]
>> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2024-09-01 07:57:07