ISOCPP SG15 List: Re: P2581R0: Specifying the Interoperability of Binary Module Interface Files

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 2 May 2022 00:24:52 -0400

On 5/1/22 11:55 AM, Daniel Ruoso via SG15 wrote:
> Em sáb., 30 de abr. de 2022 às 13:37, Tom Honermann
> <tom_at_[hidden]> escreveu:
>> The idea is that, when the compiler is producing a BMI, that it also record the external
>> environmental properties that were material in guiding how the MIU was parsed. Perhaps
>> it generates a JSON file that reflects what needs to be true for the source file to be parsed
>> again in the same way.
> I would like to hear from compiler implementers on this topic. But I
> suspect that creates a completeness problem that I am not confident
> can be solved.
>
> In other words, you need not only a positive indication of the things
> that did affect the output bmi, but also a complete mapping of all the
> things that would not affect it if they were present.
>
> More importantly, this needs to be communicated in an interoperable
> way to allow the build systems to compare them.

I don't believe there is a completeness problem.

Given an initial configuration of predefined macros (and enabled
attributes, enabled features that impact semantics, present header
files, etc...), there can be only one parse result for a given source
file. That parse result is determined by the presence, absence, and
values of items in that initial configuration. Though there are an
infinite number of initial configurations that can produce the same
parse result, it is only necessary to prove that a new configuration
contains a superset of the salient properties of the initial one used to
construct a BMI.

For the following examples, the lines containing /* included */ indicate
lines that were conditionally included, and /* EXCLUDED */ indicates the
converse. Let the initial configuration used to construct a BMI consist
of the following predefined macros.

    PM_FOO=1
    PM_BAR=2
    PM_BAZ=3
    PM_NOP=4

Given this example:

    #define MACRO1 1
    #if !defined(PM_FOO)
    /* included */
    #endif
    #if PM_BAR == 2
    /* included */
    #endif
    #if defined(PM_BAZ)
    /* included */
    #endif
    #if defined(MACRO1)
    /* included */
    #endif
    #if defined(MACRO2)
    /* EXCLUDED */
    #endif

the salient properties of the initial configuration are:

    !defined(PM_FOO)
    PM_BAR == 2
    defined(PM_BAZ)
    !defined(MACRO2)

Since PM_NOP was not consulted during the parse, its presence or absence
is not relevant. Similarly, though MACRO1 was the controlling expression
for one of the selected conditional inclusion directives, since it was
unconditionally defined in the source file (its definition has a source
location), it is not interesting (note that, had its definition been
conditionally included, then any macro(s) controlling its definition
would have already been considered. MACRO2 is the interesting case since
it is neither defined in the initial configuration, nor defined when it
its presence is tested. It must be assumed to potentially be a
predefined macro and its absence therefore a salient property of the
initial configuration.

A compiler that recorded these conditions could even generate a test to
determine if a BMI could be reused with a compiler invocation with a
different initial configuration. For the example above, it might generate:

    #if !(!defined(PM_FOO))
       #error BMI reuse requires that PM_FOO is not defined.
    #endif
    #if !(PM_BAR == 2)
       #error BMI reuse requires that PM_BAR is defined with a value of 2.
    #endif
    #if !(defined(PM_BAZ))
    #error BMI reuse requires that PM_BAZ is defined.
    #endif
    #if !(!defined(MACRO2))
       #error BMI reuse requires that MACRO2 is not defined.
    #endif

I acknowledge the desire to avoid invoking the compiler to make such a
determination, so consider this the low QoI approach. But then again,
invoking just the preprocessor may perform sufficiently well.

This algorithm does produce false negatives in some cases. In this next
example, there are two distinct minimal initial configurations that will
produce the same parse output, those are 1) PM_FOO is defined, and 2)
PM_BAR is defined. Short circuiting rules would suggest that, if both
macros are defined, the salient properties would be that PM_FOO is
defined (and PM_BAR is irrelevant). Thus, though an initial
configuration with only PM_FOO defined and another with only PM_BAR
defined would produce the same parse, a BMI produced for the first would
be rejected by the second. I suspect such occurrences would be rare
though I'm sure examples can be found.

    #if defined(PM_FOO) || defined(PM_BAR)
    /* included */
    #endif

The expressions that govern conditional inclusion can be arbitrarily
complex. In practice, relational operators and elaborate use of
disjunction might add considerable complexity. Macros like __FILE__,
__LINE__, __DATE__, and __TIME__ add more fun, but are probably only
relevant for pathological examples.

I previously stated that this approach could eliminate the need to
consider compiler options by themselves, but that was not correct. A
good example is the MSVC /Zc:char8_t[-]
<https://docs.microsoft.com/en-us/cpp/build/reference/zc-char8-t?view=msvc-170>
option. Though that option is associated with a feature test macro,
there is no guarantee that code that is sensitive to use of the option
be guarded by the macro. In the following example, p will conditionally
have type const char* or const char8_t* depending on use of that option.
Thus, that option would need to be reflected as part of the salient
properties of the initial configuration.

    const auto *p = u8"";

As you indicate below, include paths and their order are also important
(and a challenge for relocatable builds).

>
>> I don't think this approach scales
> There's definitely a fine line here. I don't think it'd be as bad as
> you're going for.
Evaluation would be needed to know for sure, but build systems tend to
add a lot of compiler options and system configuration macros in my
experience.
>
> But you are right that there is an implicit assumption here, which is
> that the build systems will need to create a separation of what are
> the flags that are meant to use to the translation units on the
> target, versus the flags that are meant to use when consuming a
> module.
>
> In the case of CMake, for instance (and I'd like to hear from the
> CMake folks here), I expect that when creating the target for a
> library that imports modules, that imported target will not be
> affected by the target that consumes the module. IOW, the module needs
> to be parsed with the base toolchain configuration, plus include
> directories and compiler definitions defined in the module metadata.
> Otherwise you would risk having different parsings of the same module
> in the same build, which would very likely result in incoherent builds
> in the end.
Agreed.
>
> The consequence of that line of reasoning is that for any target
> consuming modules, it would be an error to introduce flags that could
> change the format of the format of the bmi at a target level. The good
> news is that the build system would be able to test for that by
> comparing the baseline hash for the toolchain with the baseline hash
> (excluding include directories and compiler definitions) to give the
> user a useful output instead of just having the compiler yell that the
> bmi is incompatible.
I agree that usability experience is important.
>
> One important note to qualify here is that I am making a distinction
> between the *format* of the bmi output and the capability to fully
> reproduce a bmi file (which, again, is why I'm making the distinction
> from the coherence hashing the compilers are doing).
I agree those are separate concerns.
>
> I guess another way of putting it is that with modules in the picture,
> the build system needs to have a clear distinction between what is the
> "toolchain configuration" and what is the "target configuration". And
> that means that module-aware build systems will have to acknowledge
> that making target-specific toolchain configurations will be invalid
> when consuming modules, the same way that doing target-specific ABI
> choices is invalid when consuming libraries that were produced in a
> different way.

I think that would have a negative impact on the ability to adopt
modules. In particular, it seems to me that it would greatly reduce the
ability to consume a BMI distributed from a compiler implementor.

Tom.

>
> daniel
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15

Received on 2022-05-02 04:24:55