Em sáb., 30 de abr. de 2022 às 13:37, Tom Honermann <tom@honermann.net> escreveu:The idea is that, when the compiler is producing a BMI, that it also record the external environmental properties that were material in guiding how the MIU was parsed. Perhaps it generates a JSON file that reflects what needs to be true for the source file to be parsed again in the same way.I would like to hear from compiler implementers on this topic. But I suspect that creates a completeness problem that I am not confident can be solved. In other words, you need not only a positive indication of the things that did affect the output bmi, but also a complete mapping of all the things that would not affect it if they were present. More importantly, this needs to be communicated in an interoperable way to allow the build systems to compare them.
I don't believe there is a completeness problem.
Given an initial configuration of predefined macros (and enabled
attributes, enabled features that impact semantics, present header
files, etc...), there can be only one parse result for a given
source file. That parse result is determined by the presence,
absence, and values of items in that initial configuration. Though
there are an infinite number of initial configurations that can
produce the same parse result, it is only necessary to prove that
a new configuration contains a superset of the salient properties
of the initial one used to construct a BMI.
For the following examples, the lines containing /* included */ indicate lines that were
conditionally included, and /* EXCLUDED */
indicates the converse. Let the initial configuration used to
construct a BMI consist of the following predefined macros.
PM_FOO=1
PM_BAR=2
PM_BAZ=3
PM_NOP=4
Given this example:
#define MACRO1 1
#if !defined(PM_FOO)
/* included */
#endif
#if PM_BAR == 2
/* included */
#endif
#if defined(PM_BAZ)
/* included */
#endif
#if defined(MACRO1)
/* included */
#endif
#if defined(MACRO2)
/* EXCLUDED */
#endif
the salient properties of the initial configuration are:
!defined(PM_FOO)
PM_BAR == 2
defined(PM_BAZ)
!defined(MACRO2)
Since PM_NOP was not consulted during the parse, its presence or absence is not relevant. Similarly, though MACRO1 was the controlling expression for one of the selected conditional inclusion directives, since it was unconditionally defined in the source file (its definition has a source location), it is not interesting (note that, had its definition been conditionally included, then any macro(s) controlling its definition would have already been considered. MACRO2 is the interesting case since it is neither defined in the initial configuration, nor defined when it its presence is tested. It must be assumed to potentially be a predefined macro and its absence therefore a salient property of the initial configuration.
A compiler that recorded these conditions could even generate a test to determine if a BMI could be reused with a compiler invocation with a different initial configuration. For the example above, it might generate:
#if !(!defined(PM_FOO))
#error BMI reuse requires that PM_FOO is not defined.
#endif
#if !(PM_BAR == 2)
#error BMI reuse requires that PM_BAR is defined with a value of 2.
#endif
#if !(defined(PM_BAZ))
#error BMI reuse requires that PM_BAZ is defined.
#endif
#if !(!defined(MACRO2))
#error BMI reuse requires that MACRO2 is not defined.
#endif
I acknowledge the desire to avoid invoking the compiler to make
such a determination, so consider this the low QoI approach. But
then again, invoking just the preprocessor may perform
sufficiently well.
This algorithm does produce false negatives in some cases. In
this next example, there are two distinct minimal initial
configurations that will produce the same parse output, those are
1) PM_FOO is defined, and 2) PM_BAR is defined. Short circuiting
rules would suggest that, if both macros are defined, the salient
properties would be that PM_FOO is
defined (and PM_BAR is irrelevant).
Thus, though an initial configuration with only PM_FOO defined and another with only PM_BAR defined would produce the same
parse, a BMI produced for the first would be rejected by the
second. I suspect such occurrences would be rare though I'm sure
examples can be found.
#if defined(PM_FOO) || defined(PM_BAR)
/* included */
#endif
The expressions that govern conditional inclusion can be
arbitrarily complex. In practice, relational operators and
elaborate use of disjunction might add considerable complexity.
Macros like __FILE__, __LINE__, __DATE__,
and __TIME__ add more fun, but are
probably only relevant for pathological examples.
I previously stated that this approach could eliminate the need
to consider compiler options by themselves, but that was not
correct. A good example is the MSVC /Zc:char8_t[-]
option. Though that option is associated with a feature test
macro, there is no guarantee that code that is sensitive to use of
the option be guarded by the macro. In the following example, p will conditionally have type const char* or const
char8_t* depending on use of that option. Thus, that
option would need to be reflected as part of the salient
properties of the initial configuration.
const auto *p = u8"";
As you indicate below, include paths and their order are also
important (and a challenge for relocatable builds).
Evaluation would be needed to know for sure, but build systems tend to add a lot of compiler options and system configuration macros in my experience.I don't think this approach scalesThere's definitely a fine line here. I don't think it'd be as bad as you're going for.
Agreed.But you are right that there is an implicit assumption here, which is that the build systems will need to create a separation of what are the flags that are meant to use to the translation units on the target, versus the flags that are meant to use when consuming a module. In the case of CMake, for instance (and I'd like to hear from the CMake folks here), I expect that when creating the target for a library that imports modules, that imported target will not be affected by the target that consumes the module. IOW, the module needs to be parsed with the base toolchain configuration, plus include directories and compiler definitions defined in the module metadata. Otherwise you would risk having different parsings of the same module in the same build, which would very likely result in incoherent builds in the end.
I agree that usability experience is important.The consequence of that line of reasoning is that for any target consuming modules, it would be an error to introduce flags that could change the format of the format of the bmi at a target level. The good news is that the build system would be able to test for that by comparing the baseline hash for the toolchain with the baseline hash (excluding include directories and compiler definitions) to give the user a useful output instead of just having the compiler yell that the bmi is incompatible.
I agree those are separate concerns.One important note to qualify here is that I am making a distinction between the *format* of the bmi output and the capability to fully reproduce a bmi file (which, again, is why I'm making the distinction from the coherence hashing the compilers are doing).
I guess another way of putting it is that with modules in the picture, the build system needs to have a clear distinction between what is the "toolchain configuration" and what is the "target configuration". And that means that module-aware build systems will have to acknowledge that making target-specific toolchain configurations will be invalid when consuming modules, the same way that doing target-specific ABI choices is invalid when consuming libraries that were produced in a different way.
I think that would have a negative impact on the ability to adopt
modules. In particular, it seems to me that it would greatly
reduce the ability to consume a BMI distributed from a compiler
implementor.
Tom.
daniel _______________________________________________ SG15 mailing list SG15@lists.isocpp.org https://lists.isocpp.org/mailman/listinfo.cgi/sg15