sg12: Re: [ub] Draft 2 of Enhanced C/C++ memory and object model

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Mon, 1 Apr 2019 10:07:18 +0100

On 31/03/2019 12:45, Florian Weimer wrote:
>> Here's where a database of shared binary Modules changes things. Now
>> zlib is a shared binary Module uploaded to a common package repository.
>> All software consuming zlib uses it from the system Modules database (if
>> the system Modules database is missing zlib, it is fetched). Each
>> machine executes C++ programs from its system Modules database.
>>
>> zlib's authors fix a critical showstopper bug, and push the fix to the
>> common package repository. The next time someone executes a piece of
>> software using zlib, the new binary Module is used, rather than the old
>> broken version. In one fell swoop, *all* software using zlib is fixed.
>> No more waiting for maintainers to update each layer, until the fix
>> percolates through to all use cases in production.
>
> The flip side is that if this version of zlib is incompatible in some
> way (not just in an API/ABI sense, but some application might assume
> incorrectly that the compressed output never changes), then you can't
> update zlib for those parts of the system that are exposed in a way
> that increases risk, without also breaking those applications.
>
> In general, existing module systems therefore do not automatically
> upgrade to newer dependencies. If they support it all, the practice
> is strongly discouraged. So in your zlib example, many applications
> would still have to be updated to opt in to the newer zlib version.

The choice would remain with the library author, of course.

However the aversion to being dependent on external package management
is uniquely a C and C++ phenomenon. In most other languages, you specify
the packages you depend on and the minimum versions you require, and you
release a version of your package. 99.9% of the time, this does not
become a problem - if you break your library for downstream, people
blame you, not downstream, and people fix breaking downstream quickly. I
know I have and do for my Python packages on pypy.

The reason it is different for C and C++ is because our shared libraries
export insufficient information for tooling to identify the cause of
downstream breakage. All people know is that memory corruption suddenly
appears, and it's very hard to identify and track down.

I would make the claim that much of this brittleness is due to our
shared libraries having a public interface of an array of variable
length char strings. If our shared libraries had a public interface
specified in IPR, things would be much better. However if our shared
libraries were Modules of uncompiled AST, and if all downstream Modules
caused a compilation of the ASTs they depend upon in upstream, then
breakage of downstream dependencies would be equally as identifiable as
if end users of downstream were using a local copy of the dependency.

Would that be sufficient for C and C++ users to relax, and to not feel
urgency in shipping embedded copies of all their dependencies? In other
words, is C and C++ sufficiently self checking that this would work in
real world use conditions?

I don't know the answer to that. But I would say that if the language
doesn't prevent, or at least identify, downstream breakage, then it
ought to be taught how to do so.

> But this is just a minority view. Many programmers really want to
> avoid using system libraries such as zlib and bundle their own copies.
> Maybe this practice would change if they could be certain that they
> would get exactly the version they expect (without any random patches,
> and thus without security fixes). But that conflicts with the goal of
> applying bug fixes.

I think programmers want to know early when things could have become broken.

If they can gain trust that that is so, then there is no reason to
expect that binary Modules in C and C++ would be any different to
packages in other programming languages. Sure, some packages in Python
ship embedded copies of packages available on pypy, rather than using
the package on pypy. But it's very much a minority use case. Not at all
common.

Niall

Received on 2019-04-01 11:07:33