ISOCPP SG15 List: Re: unimported implementation partitions

From: Nathan Sidwell <nathan_at_[hidden]>
Date: Wed, 8 Jun 2022 15:31:28 -0400

On 6/8/22 14:59, Mathias Stearn via SG15 wrote:
>
>
> On Wed, Jun 8, 2022 at 7:23 PM Nathan Sidwell via SG15
> <sg15_at_[hidden] <mailto:sg15_at_[hidden]>> wrote:
>
> Q1) Is there a use case for implementation partitions (a non-interface
> partition) that are not imported in any module unit? How would that
> differ from a regular implementation partition?
>
>
> I don't know if this really qualifies as a "use case", but I think a lot
> of people will misunderstand implementation partitions and make all
> implementation units partitions with unique names, except possibly the
> implementation unit for things declared in the primary interface unit.
> There are two things that aren't obvious: 1) there is an
> asymmetry between interface and implementation units without a partition
> name, you can only have 1 of the former, and I think people will assume
> that you can also only have 1 "primary implementation unit" (I know it
> took a while to get that notion out of my head...) 2) implementation
> partitions are a bit weird and not something you often need. They can
> almost always be replaced with an interface partition that doesn't
> export anything. And the only reason for the “almost” there is because
> we artificially require the PMI to transitively reexport all interface
> units even if they don’t export anything.
>
>
>
> We require all interface partitions be [transitively] imported in the
> primary interface. I don't think we require implementation partitions
> be imported at least once in a program.
>
> Q2) Is there a use case for a program to include a module interface
> that
> is not imported in any other TU (and has no implementation units)?
>
>
> Yes. I can imagine lumping many modules into a single library. My main
> executable may import all of them somewhere, however I may want to have
> separate unit test binaries for modules A and B that only import their
> own module, but link against a lib that has both. And for the sake of
> argument, let’s assume that both modules have some self-registering
> objects in dynamic init, and it is important that they both run in both
> unit tests.

If you're linking against such a lib, then that lib will cause the
initializers to run (from its own initialization). But there's no need
for the modules to arrange their initializers be run at start up in
their own right. (Have I understood your example correctly?)

>
>
> The reason I ask is there an ABI issue with the global initializer
> function needed for p1874. I'm going to describe it in ELF terms,
> but I
> imagine the same choices appear in other ABIs.
>
> 1874 is solved by having each module primary interface, and all
> partitions, emit an idempotent initialization function that (a) calls
> the init fn of all imports and then (b) performs all dynamic inits of
> namespace scope.
>
>
> That sounds like it will result in roughly one function call per import
> statement in the program prior to main. How ABI resilient does this need
> to be? Could the PMI collect up all imports from all of its interface
> units and imported partitions to remove duplicates, or do you need to
> support replacing a single interface partition without recompiling the
> PMI? Is a TU allowed to omit calling the initializer for one of its
> imports A if it can see that another import B also has an interface
> dependency on A, or do you need to be prepared for B to be replaced with
> a version that doesn’t import A transitively? If some leaf TU doesn’t
> have any dynamic init and says so in its BMI, do importers still need to
> call its init function in case it is replaced? If not can this be
> transitively done for any TU where all interface dependencies have no
> dynamic init?
>
> Also, is there any way that this could instead be written in some
> metadata tables showing the direct dependencies and resolved by the
> linker into a single ordered list of either flat per-TU or per-object
> initializers for each DSO? That seems a bit better to me than trying to
> do it pessimistically via function calls.

Ah, well, if we want to go modifying linker technology then yes, but
that's not a thing we (those implementing modules in clang & GCC) wanted
to mandate. Thus we cannot rely on the linker to topologically sort the
set of module initializations -- it doesn't know the import graph you
mention.

Thus we're left with the exciting set of different ways static
initialization is already available. There's a .init_array section,
which is filled with pointers to initialization functions. That array
is assembled by the static linker, and ordering is determined by the
order of object files (and archives) on the link line. An object file
gets to place as many pointers as it likes. (Other
initialize-by-concatentating-snippets schemes exist, they're all morally
the same, and I'm ignoring extensions like initializer priority.)

So, what we have is that any TU that imports things, gets to emit an
initialization function that calls the initialization functions of its
imports, and then performs its own dynamic initializations.

Any module interface unit does the same, except that this function may
be called more than once (at most by each of its importers, plus
maybe-once via .init-array -- that's my question), so needs an
idempotency bool to make sure the initialization is done exactly once.

There are some optimizations one can do within that framework though,
and you mention one -- if I import A & B, and B imports A, I only need
emit a call to B's initializer. This precludes recompiling a module
interface and not recompiling its importers (because you might have
changed the import graph).

The other two I want to implement are

a) if the intializer fn is empty, don't emit the idempotency code. Just
an empty fn.

b) if the import's initializer fn is known to be empty, don't call it.
This requires that information be in the CMI, and it means one cannot
(in general) add a dynamic initializer to a module interface, without
recompiling all importers of that module. (But you can't do that in
general anyway.)

We have chosen to always emit the initializer fn, even when empty, so
that optimization #b is not mandated.

nathan

>
>
>
> In a non-module world a function that performs #b would then arrange to
> be called at startup via .init or similar. In a module-world, we do
> not
> (need to) do this, as we know it'll be called somewhere by an import.
>
> However, Q1 raises the possibility that an implementation partition may
> not be imported anywhere, so its global initializer fn is never called
> from another global init. We therefore have to arrange for it to be
> called from .init as a regular initializer function. A small
> pessimization.
>
>
> That optimization would only help for implementation partitions that
> have their own dynamic init. That seems likely to be enough of an edge
> case to not be worth microoptimizing for, even if it were valid to do so.
>
>
>
> Q2 raises the same question wrt primary interfaces. If they might
> never
> be imported, again, their initializer fn would never be called by that
> mechanism. We have to emit it via .init.
>
> nathan
>
> --
> Nathan Sidwell
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden] <mailto:SG15_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg15>
>
>
> _______________________________________________
> SG15 mailing list
> SG15_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg15

-- 
Nathan Sidwell

Received on 2022-06-08 19:31:30