sg12: Re: [ub] Draft 2 of Enhanced C/C++ memory and object model

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Fri, 29 Mar 2019 10:33:07 +0000

> | But down the line, absolutely yes Modules need to become objects. They
> | would be as-if a giant struct full of data objects and function objects.
> | They would be attached, and detached, exactly as any other object under
> | this proposal. This would effectively implement shared libraries into C++.
> |
> | We then can redefine C++ programs to be a graph of binary Modules to be
> | bootstrapped by a Modules loader. So your .exe program might be a few
> | Kb, and it would simply map into memory all the binary Modules its
> | manifest specifies from the system binary Modules database.
>
> What problems would such a proposal attempt to solve?

The answer is "too many to usefully list". But I can give you my
personal top three, for brevity and clarity.

1. It would solve the shared library problem for C and C++, without
using broken proprietary shared libraries to do it.

zlib is probably the most used third party library dependency in the
world. Everything from OS kernels through to router firmwares use it not
just once, but often multiple copies of it as each component will link
in its own special copy of zlib.

A few years back, zlib was found to have a critical security
vulnerability affecting everything which used it. Upstream its authors
were able to issue a fix within days. Some weeks and months later, those
software components which include zlib started to incorporate the fixed
edition into their stable builds. Some weeks and months after that, the
next layer of dependency start to include the fixed dependencies in
their stable builds. And some weeks and months after that, it percolates
down to the next layer.

The problem is that when you add together each of the update latencies
for each of these layers of dependency, a critical bugfix to a deep
dependency takes *years* to *never* to reach production. Even today,
there are tens of millions of devices out there running broken zlibs.
Worse, in some case *parts* of their firmwares have fixed zlib. But some
parts of the exact same firmware have an older, broken zlib.

I picked out zlib purely due to its ubiquity. But same applies to any
popular fundamental library. How many of us have seen ancient versions
of Boost being mixed in with newer versions of Boost in our work? I know
I have. One team may be good about updating dependencies, others not.

Here's where a database of shared binary Modules changes things. Now
zlib is a shared binary Module uploaded to a common package repository.
All software consuming zlib uses it from the system Modules database (if
the system Modules database is missing zlib, it is fetched). Each
machine executes C++ programs from its system Modules database.

zlib's authors fix a critical showstopper bug, and push the fix to the
common package repository. The next time someone executes a piece of
software using zlib, the new binary Module is used, rather than the old
broken version. In one fell swoop, *all* software using zlib is fixed.
No more waiting for maintainers to update each layer, until the fix
percolates through to all use cases in production.

I appreciate this may seem insanely ambitious. However, compiler vendors
are going to have to build out a database of Modules for their toolchain
in any case. So, they can keep that within the toolchain, or they can
split it off into a standalone component which system vendors can
incorporate into every OS distribution, such that the OS does the
linking and LTO step per system. Either way, it's much the same
implementation effort.

Don't get me wrong here, there remain lots of missing parts. Somebody is
going to have to pay for a common package repository. There are lots of
security and verification issues to solve, never mind versioning and
delivering Module ABI stability. Also this dream suits LLVM far more
than it suits MSVC or GCC, indeed during my time at BlackBerry, we
actively investigated distributing BB10 programs as LLVM bytecode, and
having the device do the final stage of compilation, linking and
optimisation on device. I had a working prototype which worked very
well, it's very feasible, even more so with Modules.

All these missing parts is why it's a dream. I dream of the day when C
and C++ is like Python, when I can push a bugfix and not have to rely on
multiple stages of downstream to propagate my bugfix in a timely
fashion. I think it would be a major leap forward in software quality
and reliability for all computing systems.

2. It enables cloud-assisted compilation, which will enable much more
complex C++ programming.

This would help solve the "Boost problem" where some core libraries end
up becoming a tangled nest of interdependency, such that they
effectively become a single monolithic library. The non-core libraries
then become isolates, and really ought to be detached entirely from
Boost into standalone libraries. See
https://github.com/Mike-Devel/boost_dep_graph/blob/master/Boostdep.png
for a graph showing this visually.

This "Boost problem" is not anything to do with Boost, but is actually a
feature of all software development. Some libraries draw one another
closer until they become a monolithic whole. Other libraries "repel"
others, and become ever more standalone. We actually see the same in the
clustering of stars and galaxies, so this is almost certainly a
fundamental property of the universe. All we can do is nudge a library
(Module) in one direction or another.

A common package repository of shared Modules enables the cloud to
precompute various canned combinations of them which are in common use
by C++ users. C++ users then just download those precomputed canned
combinations of Modules and get to work (i.e. "super Modules").

We thus effectively accelerate, through caching, those agglomerations of
libraries which are very often used together. Effectively "precompiled
super Modules" which collapse these bundles into a blob with a
simplified outer ABI which is much faster to link against, because
linking up the internal complexity has been precomputed away.

3. I think it will increase the importance people give to binary stability.

I know a lot of C++ folk take the view "just recompile everything", and
thus maintaining ABI stability is not seen as important.

However this just doesn't scale well. You can't recompile the world
forever, at some point available compilation resources run low and start
to impact productivity for even the biggest multinationals. At some
point, you need to draw a line around libraries which ought to not
change frequently, and from thence onwards maintain its ABI stability
assiduously.

As much as C++ 20 Modules currently don't do much to help maintain ABI
stability, I think that they would if redistributable binary Modules
become a thing. Same as how DLLs forced people to pay attention to ABI,
such that there are binaries shipped in Windows 10 whose ABI has not
changed one bit since 1996.

I think binary Modules would do a lot of good in helping people think
more clearly about ABI. Not least that ABI breaks would fail to link and
LTO on each end user's machine, it would be a great motivator.

I appreciate this was a long email. I could keep going for many more
pages, hence me restricting the benefits of system-linked binary Modules
to just a top three benefits in my opinion.

Niall

Received on 2019-03-29 11:33:23