sg15: [Tooling] Criticism to existing and thoughts on a standard build system

From: Nagy-Egri Máté Ferenc <nagy-egri.mate_at_[hidden]>
Date: Fri, 20 Apr 2018 17:41:02 +0200

Hi fellow list members!

My name is Máté Nagy-Egri, physics PhD student, open-source & cross-platform HPC programmer (GPGPU, graphics, cluster-parallel), Linux sysadmin and teaching programming to physicist at university. I recently joined the list, having read through the 4 months of discussion, I'll try to reflect on a few points plus add my problems. Forgive me if it becomes a little too "tactical" towards the end and less "strategic", but I'm still "young" (eager) and sometimes get carried away by the details. The message is long, but I hope there is enough value in it to spawn thoughts/discussion.

I cannot engage in discussion the technical details of writing a C++ language tool, because I know far too little about that. I'll focus on build systems and packages from an end-user perspective, because I feel that is the biggest pain in both teaching C++ and using it.

I am deeply saddened by the fact that CMake became a de facto standard of cross-platform builds of OSS C++ projects. I too loathe the scripting language (I even proposed a way to get rid of it [1]), allow me to omit ranting/cursing. I became the 'CMake guru' on campus, everyone bugs me with their issues with it, hence I wrote a CMake tutorial [2] (didn't find a single good example) to save myself some time.

At university, imagine having a group of freshmen with glowing eyes, and try getting everyone to get a program to build on their notebooks (Windows, Linux, Mac). Even using the same IDE (Visual Studio Code is the current choice, CMake Tools and MS C++ extensions), it is a horrible experience. We'll shift to teaching in Jupyter via the C++ language kernel, because setting up an environment other than Visual Studio (Install, New C++ project, F7), is horror.

> Gabriel Dos Reis wrote:
>
> 1. Commonly agreed up way to describe C++ components for build systems – no, I am not aiming for a “universal” build system (not a goal)
> 2. Packing for C++ components – this ties into (1) I suspect.
> 3. Tools for migration, especially as we consider modules.

Totally agree with you and with Isabella Muerte on building C++ [3]. I feel the aims are drifting towards having a complete feature set, everyone wants their corner case covered. Free quoting Isabella: "but then people come and ask, can I build my project using it? Can it do this? Can it do that? If you start making everyone happy, you will gravitate toward implementing an already existing tool." Indeed, if someone wants to get some backward, unorthodox library to build, they already have a tool for that. There's no need to create another one. I too generate C++ headers from symbolic mathematic scripting languages as part of a build via CMake, but I wouldn't expect a sensible tool targeted at C++ to cater to this particular case.

Rusts Cargo imposes very strong restrictions on the layout of files within a project, and I don't think that the current liberalism in compiling C++ code is more useful than harmful to both end-users and tool vendors. Sure, library authors love freedom, but hell breaks loose on other points of the system.

> Robin Rowe wrote:
>
> While not common practice, it can be done with CMake. I write CMake build systems that download and build dependencies.

I believe you are referring to Vcpkg (at least you could be). True, it works, but it's flaky as hell. Not because of Vcpkg itself, but because it's a grandiose community effort in hacking together builds of the most diverse sort. It's undoing the decade long damage we (the C++ community) have done to ourselves. Packages can break very easily. The aforementioned liberalism in compiling C/C++ code creates an insane amount of work for package maintainers in places like Vcpkg, Conan, Linux distro maintainers, etc. Getting the plplot package to be Vcpkg conformant was 5 days of work for "the CMake expert". This is ridiculous, spending 5 days only for getting a dependency to build.

> Thoughts on 3 year release cadence and tooling

Indeed, 3 years to update ISO standard tooling would be too far apart, but because the tools are many, there's no way to sync up with all of them.

> Thoughts on enumerating the missing pieces

I too think that work on the build systems part should begin with creating a table of "features" like:

- easy to learn
- easy to master
- fast
- has GUI front-end
- portable
- extensible
- multi-language
- language-aware
- tool-aware
- (HW) resource-aware
- imperative/declarative
- etc.

and each column should be an existing tool and see what feature set is free to tackle with a new tool (should SG15 want to create one, and I think it should).

==============================

My take on this issue is creating one that is all of the above, and here is roughly what I mean by it.

>From all of the build systems I have come across and looked at (GNU Make, NMake, MSBuild, Ninja, CMake, QMake, Buck, Meson, Waf, Scons, Cargo, psake, fake, cake) some vague idea of an ideal build system has begun to shape in my mind. Not an ultimate tool, but a very good one. Each of them do something very well, and other things not so well.

Here are some things existing tools do well:

- Ninja has good/decent foundations in terms of vocabulary (what constitutes a build "step"), and is very smart to favor execution speed vs. human readability.
- MSBuild and Cake are very smart in defining an (extension) API that is usable enough to define even the first supported language through it, just like all coming languages/tools are going to have to.
- MSBuild is smart in not having a script language of its own, but using a "stateless" format (XML). Tooling is extremely simple to write for it. (GUI front-end can be generated from the schemas, to name only one)
- MSBuild was smart to choose XML, because it is highly extensible without having to touch the build tool itself.
- Buck and Meson are smart to implement shortcuts/short-circuits into their execution to accelerate incremental builds (symbol tables to decide re-linking, hashing object files)
- CMake is smart in providing a nice hook for IDEs to query information about the build.
- NMake (maybe GNU Make, not sure) is smart in having explicit batch-mode support for tools that are capable of such a thing (one tool invocation executes multiple build steps: cl.exe, copy.exe). Without it, installation of large projects like Qt and Boost take unnecessarily long.

Some things they don't do well:

- CMake (beside being horrible to use) is not extensible. As a physicist, I would very much like to see LaTeX being a first class CMake citizen (among others), not just UseLatex.cmake [4], but that requires implementing it inside CMake, which requires deep understanding of CMake and how the generators work. However, cooking up LaTeX support for MSBuild only requires understanding the Schema (still not easy) of MSBuild, and if I provide one, MSBuild can execute it.
- DSL-s (Cake, Fake, psake) are nice, but there really should be no build script. I don't want to debug a build script. If we all agree that C++ is unviable to describe a build process (while there would be merits to it (reuse debugging tools to debug the build script), it's lunacy), and we also agree that C++ should stop relying on other languages to get 'stuff' done (CMake, Python), the build definition should use something well known. W3C recommendations seem like a good choice.
- Outside Buck, no tool really cares about the amount of resources a given build step takes. Cloning a Git repo could be done in parallel to building on all cores, as they really don't interfere that much. They use different resources which the given tools could advertise about themselves (See later).
- Not many take into account the environment. Very few tools provide reproducible builds. In most, environmental variables can alter the behavior of build tools. The build system should spawn empty environments and tools should advertise all such "inputs", not just source files, that have an effect on their behavior, and the build system should capture this information. In capturing the entirety of the build environment, it becomes very easy to get distributed builds right.
- It should be trivial to query for end-users the tweaks and controls that customize the behavior of a given build tool and also that customizes a library. Managing the binary compatible set of flags should be easy.

<essence>

The way I imagine such a tool, is that it defines a vocabulary (schema) for task execution which tools can depend on. Instead jumping straight to the schema of vcxproj for eg. (which is how to invoke cl.exe and link.exe to create a C++ program), languages can provide another vocabulary that defines the compilation model of the language (source/header, object files, pre-compiled header, modules, you name it). And last but not least, tool vendors provide XML files that instructs the build system how the tools map to the language compilation model. In short, when the user specified they want to build this source file in this language standard, that means the build system has to invoke this exe with that set of command-line options. If a given toolset can implement Buck/Meson-like shortcuts in the build, that is something that is an implementation detail.

</essence>

So long as I do not wish to invoke tool-specific voodoo, I should only have to care about the language semantics.

The build system is OSS, hosted on Github (or whatever) and probably could be part of OS pacakge repos/SDKs; the commonly agreed upon compilation model of C++ is part of the IS (ISO C++) and ships with my compiler. (Even without such a build system, an easily parse able, commonly agreed upon workflow can be useful for tool vendors); the XML matching the two are shipped with the compilers.

With proper and strict schema support, XML parsing for large projects can be accelerated when stored in EXI [5], which also support JSON if people like that better. Textual format when submitting into VCS, binary (EXI) format for incremental builds. Although ultimately I really don't want to author any makefile, but rather use a GUI/IDE for that and have most of it autogenerated.

(Such work could drive standardization of an XML/JSON/tree datatype in the STL.)

If the basic execution schema is defined right, depending on packages shouldn't be too hard for end users. Just like C++ has its own Schema of a compilation model, Git/Conan/etc can have their own, specifying repos/branches, or servers/recipes/binaries.

Being a GPGPU developer, I often deal with language extensions, sometimes having their own separate toolsets. Think not just ISO C++11 vs. GNU C++11, but OpenMP extensions and its versions, CUDA extensions and its versions, SYCL support, C++AMP extensions... I don't expect an ISO tool to cater to all the dialects of C++, but if the extension API were forgiving to such tool vendors, that would be nice. These extensions and tools have their own compilation model, more often than not just adding on top of C++; if they could just provide another schema that extends ISO C++, their integration would be seamless for the most part.

Schema for:
- ISO C++
- OpenMP-C++ extends ISO C++

and

- Clang/G++/MSVC can advertize implementing both, with given invocations.

Also, having agreed upon a compilation model for C++ that is stored in a process able format (JSON/XML/EXI), it is minimal effort to provide an interface inside the STL to invoking the compiler, enabling reliable run-time code generation without external tooling, something that could much later drive run-time reflection, but there are many other uses to this feature.

Nothing prohibits this model to work as a one-shot build system like most, or as a build service like Buck.

I am not sure of such a tool would be smart to have an idea of the following notions:

- Building/Compiling
- Linking
- Test
- Benchmark
- Packaging
- Deploying
- Installing
- Publishing
- Fetching (the opposite of Publish)

whether this is something the community would find useful to have "canonical build target flavors" for these sort of tasks, or whether this should remain outside the scope of such a build tool.

[1]: https://cmake.org/pipermail/cmake/2015-July/061225.html
[2]: https://github.com/Wigner-GPU-Lab/Teaching/tree/master/CMake
[3]: https://www.youtube.com/watch?v=7THzO-D0ta4
[4]: https://github.com/kmorel/UseLATEX
[5]: http://www.w3.org/TR/exi/

Received on 2018-04-20 17:50:21