On 5/1/22 11:55 AM, Daniel Ruoso via SG15 wrote:
Em sáb., 30 de abr. de 2022 às 13:37, Tom Honermann
<tom@honermann.net> escreveu:
The idea is that, when the compiler is producing a BMI, that it also record the external
environmental properties that were material in guiding how the MIU was parsed. Perhaps
it generates a JSON file that reflects what needs to be true for the source file to be parsed
again in the same way.
I would like to hear from compiler implementers on this topic. But I
suspect that creates a completeness problem that I am not confident
can be solved.

In other words, you need not only a positive indication of the things
that did affect the output bmi, but also a complete mapping of all the
things that would not affect it if they were present.

More importantly, this needs to be communicated in an interoperable
way to allow the build systems to compare them.

I don't believe there is a completeness problem.

Given an initial configuration of predefined macros (and enabled attributes, enabled features that impact semantics, present header files, etc...), there can be only one parse result for a given source file. That parse result is determined by the presence, absence, and values of items in that initial configuration. Though there are an infinite number of initial configurations that can produce the same parse result, it is only necessary to prove that a new configuration contains a superset of the salient properties of the initial one used to construct a BMI.

For the following examples, the lines containing /* included */ indicate lines that were conditionally included, and /* EXCLUDED */ indicates the converse. Let the initial configuration used to construct a BMI consist of the following predefined macros.

PM_FOO=1
PM_BAR=2
PM_BAZ=3
PM_NOP=4

Given this example:

#define MACRO1 1
#if !defined(PM_FOO)
/*
included */
#endif
#if PM_BAR == 2
/* i
ncluded */
#endif
#if defined(PM_BAZ)
/*
included */
#endif
#if defined(MACRO1)
/* included */
#endif
#if defined(MACRO2)
/*
EXCLUDED */
#endif

the salient properties of the initial configuration are:

!defined(PM_FOO)
PM_BAR == 2
defined(PM_BAZ)
!defined(MACRO2)

Since PM_NOP was not consulted during the parse, its presence or absence is not relevant. Similarly, though MACRO1 was the controlling expression for one of the selected conditional inclusion directives, since it was unconditionally defined in the source file (its definition has a source location), it is not interesting (note that, had its definition been conditionally included, then any macro(s) controlling its definition would have already been considered. MACRO2 is the interesting case since it is neither defined in the initial configuration, nor defined when it its presence is tested. It must be assumed to potentially be a predefined macro and its absence therefore a salient property of the initial configuration.

A compiler that recorded these conditions could even generate a test to determine if a BMI could be reused with a compiler invocation with a different initial configuration. For the example above, it might generate:

#if !(!defined(PM_FOO))
  #error BMI reuse requires that PM_FOO is not defined.
#endif
#if !(PM_BAR == 2)
  #error BMI reuse requires that PM_BAR is defined with a value of 2.
#endif
#if !(defined(PM_BAZ))
 
#error BMI reuse requires that PM_BAZ is defined.
#endif
#if !(
!defined(MACRO2))
  #error BMI reuse requires that MACRO2 is not defined.
#endif

I acknowledge the desire to avoid invoking the compiler to make such a determination, so consider this the low QoI approach. But then again, invoking just the preprocessor may perform sufficiently well.

This algorithm does produce false negatives in some cases. In this next example, there are two distinct minimal initial configurations that will produce the same parse output, those are 1) PM_FOO is defined, and 2) PM_BAR is defined. Short circuiting rules would suggest that, if both macros are defined, the salient properties would be that PM_FOO is defined (and PM_BAR is irrelevant). Thus, though an initial configuration with only PM_FOO defined and another with only PM_BAR defined would produce the same parse, a BMI produced for the first would be rejected by the second. I suspect such occurrences would be rare though I'm sure examples can be found.

#if defined(PM_FOO) || defined(PM_BAR)
/* included */
#endif

The expressions that govern conditional inclusion can be arbitrarily complex. In practice, relational operators and elaborate use of disjunction might add considerable complexity. Macros like __FILE__, __LINE__, __DATE__, and __TIME__ add more fun, but are probably only relevant for pathological examples.

I previously stated that this approach could eliminate the need to consider compiler options by themselves, but that was not correct. A good example is the MSVC /Zc:char8_t[-] option. Though that option is associated with a feature test macro, there is no guarantee that code that is sensitive to use of the option be guarded by the macro. In the following example, p will conditionally have type const char* or const char8_t* depending on use of that option. Thus, that option would need to be reflected as part of the salient properties of the initial configuration.

const auto *p = u8"";

As you indicate below, include paths and their order are also important (and a challenge for relocatable builds).


I don't think this approach scales
There's definitely a fine line here. I don't think it'd be as bad as
you're going for.
Evaluation would be needed to know for sure, but build systems tend to add a lot of compiler options and system configuration macros in my experience.

But you are right that there is an implicit assumption here, which is
that the build systems will need to create a separation of what are
the flags that are meant to use to the translation units on the
target, versus the flags that are meant to use when consuming a
module.

In the case of CMake, for instance (and I'd like to hear from the
CMake folks here), I expect that when creating the target for a
library that imports modules, that imported target will not be
affected by the target that consumes the module. IOW, the module needs
to be parsed with the base toolchain configuration, plus include
directories and compiler definitions defined in the module metadata.
Otherwise you would risk having different parsings of the same module
in the same build, which would very likely result in incoherent builds
in the end.
Agreed.

The consequence of that line of reasoning is that for any target
consuming modules, it would be an error to introduce flags that could
change the format of the format of the bmi at a target level. The good
news is that the build system would be able to test for that by
comparing the baseline hash for the toolchain with the baseline hash
(excluding include directories and compiler definitions) to give the
user a useful output instead of just having the compiler yell that the
bmi is incompatible.
I agree that usability experience is important.

One important note to qualify here is that I am making a distinction
between the *format* of the bmi output and the capability to fully
reproduce a bmi file (which, again, is why I'm making the distinction
from the coherence hashing the compilers are doing).
I agree those are separate concerns.

I guess another way of putting it is that with modules in the picture,
the build system needs to have a clear distinction between what is the
"toolchain configuration" and what is the "target configuration". And
that means that module-aware build systems will have to acknowledge
that making target-specific toolchain configurations will be invalid
when consuming modules, the same way that doing target-specific ABI
choices is invalid when consuming libraries that were produced in a
different way.

I think that would have a negative impact on the ability to adopt modules. In particular, it seems to me that it would greatly reduce the ability to consume a BMI distributed from a compiler implementor.

Tom.


daniel
_______________________________________________
SG15 mailing list
SG15@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg15