Date: Thu, 29 Aug 2024 14:56:00 -0500
Performance should be fine if using a content definition. An implementation
can do inode/path checks against files it already knows of, as a fast path.
The first time a file is #included it’s just a hash+table lookup to decide
whether to continue.
Regarding the filesystem definition vs content definition question, while I
think a content-based definition is robust I can see there is FUD about it
and also an argument about current practice being a filesystem-based
definition. It may just be best to approach this as filesystem uniqueness
to the implementation’s ability, with a requirement that symbolic
links/hard links are handled. This doesn’t cover the case of multiple mount
points, but we’ve discussed that that’s impossible with #pragma once
without using contents instead.
Jeremy
On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden]> wrote:
> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote:
>
> Another question is whether the comparison should be post translation
> phase 1.
>
> I gave this some thought while drafting the proposal. I think it comes
> down to whether the intent is single inclusion of files or single
> inclusion of contents.
>
> Indeed. The proposal currently favors the "same contents" approach and
> offers the following wording.
>
> A preprocessing directive of the form
> # pragma once new-line
> shall cause no subsequent #include directives to perform replacement for
> a file with *text contents identical to this file*.
>
> The wording will have to define what it means for contents to be
> identical. Options include:
>
> - The files must be byte-for-byte identical. This makes source file
> encoding observable (which I would be strongly against).
> - The files must encode the same character sequence post translation
> phase 1. This makes comparisons potentially expensive.
>
> Note that the "same contents" approach obligates an implementation to
> consider every previously encountered file for every #include directive.
> An inode based optimization can help to determine if a file was previously
> encountered based on identity, but it doesn't help to reduce the costs when
> a file that was not previously seen is encountered.
>
>
> Tom.
>
>
> Jeremy
>
> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals<std-proposals_at_[hidden]> <std-proposals_at_[hidden]> wrote:
>
> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote:
>
> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via Std-Proposals
> wrote:
>
> The fact that gcc took the approach to compare file contents I consider
> a poor choice, and not an argument to standardize this implementation.
>
> Another question is whether a byte comparison of two files of the same size is
> expensive for compilers.
>
> #once ID doesn't need to compare the entire file.
>
> Another question is whether the comparison should be post translation
> phase 1. In other words, whether differently encoded source files that
> decode to the same sequence of code points are considered the same file
> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++ does
> not currently allow source file encoding to be observable but a #pragma
> once implementation that only compares bytes would make such differences
> observable.
>
> Tom.
>
> --
> Std-Proposals mailing listStd-Proposals_at_[hidden]://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
>
can do inode/path checks against files it already knows of, as a fast path.
The first time a file is #included it’s just a hash+table lookup to decide
whether to continue.
Regarding the filesystem definition vs content definition question, while I
think a content-based definition is robust I can see there is FUD about it
and also an argument about current practice being a filesystem-based
definition. It may just be best to approach this as filesystem uniqueness
to the implementation’s ability, with a requirement that symbolic
links/hard links are handled. This doesn’t cover the case of multiple mount
points, but we’ve discussed that that’s impossible with #pragma once
without using contents instead.
Jeremy
On Thu, Aug 29, 2024 at 13:06 Tom Honermann <tom_at_[hidden]> wrote:
> On 8/28/24 12:32 AM, Jeremy Rifkin via Std-Proposals wrote:
>
> Another question is whether the comparison should be post translation
> phase 1.
>
> I gave this some thought while drafting the proposal. I think it comes
> down to whether the intent is single inclusion of files or single
> inclusion of contents.
>
> Indeed. The proposal currently favors the "same contents" approach and
> offers the following wording.
>
> A preprocessing directive of the form
> # pragma once new-line
> shall cause no subsequent #include directives to perform replacement for
> a file with *text contents identical to this file*.
>
> The wording will have to define what it means for contents to be
> identical. Options include:
>
> - The files must be byte-for-byte identical. This makes source file
> encoding observable (which I would be strongly against).
> - The files must encode the same character sequence post translation
> phase 1. This makes comparisons potentially expensive.
>
> Note that the "same contents" approach obligates an implementation to
> consider every previously encountered file for every #include directive.
> An inode based optimization can help to determine if a file was previously
> encountered based on identity, but it doesn't help to reduce the costs when
> a file that was not previously seen is encountered.
>
>
> Tom.
>
>
> Jeremy
>
> On Tue, Aug 27, 2024 at 3:39 PM Tom Honermann via Std-Proposals<std-proposals_at_[hidden]> <std-proposals_at_[hidden]> wrote:
>
> On 8/27/24 4:10 PM, Thiago Macieira via Std-Proposals wrote:
>
> On Tuesday 27 August 2024 12:35:17 GMT-7 Andrey Semashev via Std-Proposals
> wrote:
>
> The fact that gcc took the approach to compare file contents I consider
> a poor choice, and not an argument to standardize this implementation.
>
> Another question is whether a byte comparison of two files of the same size is
> expensive for compilers.
>
> #once ID doesn't need to compare the entire file.
>
> Another question is whether the comparison should be post translation
> phase 1. In other words, whether differently encoded source files that
> decode to the same sequence of code points are considered the same file
> (e.g., a Windows-1252 version and a UTF-8 version). Standard C++ does
> not currently allow source file encoding to be observable but a #pragma
> once implementation that only compares bytes would make such differences
> observable.
>
> Tom.
>
> --
> Std-Proposals mailing listStd-Proposals_at_[hidden]://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
>
Received on 2024-08-29 19:56:13