On 4/30/22 12:40 PM, Daniel Ruoso wrote:
On Fri, Apr 29, 2022, 17:13 Tom Honermann <tom@honermann.net> wrote:
I'm not sure how to interpret your response here.
I think we're not talking about the same thing. What you're describing
is what in the paper I described as the "hashing" mechanism that
compilers are implementing. They do what you're describing, however
you would end up with a unique hash per module, meaning that we would
have to fully preprocess the module interface unit prior to deciding
if that binary interface file is usable in this context or not.

No, that is not what I'm suggesting. Let me try again.

The idea is that, when the compiler is producing a BMI, that it also record the external environmental properties that were material in guiding how the MIU was parsed. Perhaps it generates a JSON file that reflects what needs to be true for the source file to be parsed again in the same way. As indicated, this would include predefined macros (both those predefined by the compiler, possibly dependent on compiler options, plus those specified on the command line, but not macros defined or undefined in the MIU). This would be a minimal subset of all external factors (e.g., macro FOO must be defined, macro BAR must not be defined, macro BAZ must have a value of 3, etc...).

With that minimal subset in mind, a build system can compare it to the "baseline" configuration to see if it satisfies the minimal subset. If it does, then the BMI can be reused and if it does not, then a new BMI is needed. As desired, no preprocessing or other analysis of the MIU would be required by the build system.

The goal of this paper is to allow this step to be skipped, by having
a "baseline" configuration for the compiler as it is being used in a
given target, at which point any bmi matching that identifier can be
presumed to be compatible without ever having to read the module
interface file. That identifier would then be communicated in the
module metadata as it was distributed with the library.

I don't think this approach scales. For example:

  1. The identifier will change in response to command line options that may have no impact on whether a BMI could be reused. For example, whether the _M_FP_STRICT macro is predefined depends on whether /fp:strict is passed on the MSVC command line. Since a MIU *could* be dependent on that macro being defined, use of that option would require the identifier to change in which case all BMIs built without a consistent use of that option would have to be rejected regardless of whether they are actually dependent on it. Note that this option does not affect how code is parsed (unless dependent on the associated macro); it only affects code generation. This is a case where the identifier approach would unnecessarily discard compatible BMIs.
  2. Any macros defined on the command line would have to impact the identifier because a MIU *could* be dependent on them. Thus adding a -DZIGGY_ZIGGY_ZOO option to the command line would invalidate all existing BMIs despite it being exceedingly likely that none of them are actually dependent on such a macro.

This means the build system would be able to even short-circuit
dependency analysis on that point forward and just consume that BMI
file as is.

That would include pre-built bmi files for the standard library
modules, or for module header units.

Yes, that is the goal. As indicated above though, I think a solution that reflects a MIU's actual requirements for a consistent parse is required to achieve that in practice.

Tom.


daniel