Writing a large build system with GNU make

By David Röthlisberger. Comments welcome at david@rothlis.net.

Published 24 Jun 2013. This article is Creative Commons licensed.

This is a summary of techniques for implementing a large build system, written in plain GNU make, for a Unix-based C or C++ codebase. By “large” I mean a build system that coordinates the building of many different “projects” or “modules”; this is sometimes called a “meta build system”.

Provide the standard targets

all (and make it the default target)
check
clean
install

See “Standard Targets for Users” in the GNU make manual.

There is also the dist target to package up your source code for distribution. Even if you don’t want to distribute your source code, this can be useful if you want to make binary packages using RPM or similar packaging systems, which usually take a source tarball as their input.

Standard environment variables

Recipes that compile C or C++ code must obey the environment (or make) variables CC, CXX, CPPFLAGS, CFLAGS, CXXFLAGS. Recipes that invoke the linker must obey LDFLAGS. These variables are for the user to customise. See “Variables Used by Implicit Rules” in the GNU make manual, and “Variables for Specifying Commands” in the GNU Coding Standards.

Why bother? This makes it trivial for a developer to use tools like distcc or colorgcc, or to experiment with different compiler options or even a different compiler like clang or its static analysis tools.

The install recipe must obey the environment (or make) variables prefix and DESTDIR. This allows developers to control where they install the built binaries on their own systems, and to package them with RPM and the like.

Fine-grained dependencies

Use “non-recursive make” so that a single make process can have a global, fine-grained view of dependencies across all of the “projects” or “modules” in your codebase. For example, if you change a shared library’s .cpp file you want make to re-compile the corresponding .o file and re-link the .so, but you don’t want make to rebuild other programs and libraries that use that library — unless that library’s interface (i.e. a header file) changes.

The technique is described in Peter Miller’s 1997 paper “Recursive Make Considered Harmful”. See also Emile van Bergen’s implementation notes.

You can often simplify Emile’s technique, if each of your sub-projects has approximately the same layout, by using static pattern rules and templates in the root makefile to eliminate boilerplate from the sub-project makefile fragments.

This might seem like a lot of hassle, but note that automake doesn’t solve this problem for you either; “recursive automake” suffers from the same problems as “recursive make”, and there is far less information available about how to write a non-recursive autotools-based build system.

Neither Peter nor Emile address the issue of rebuilding a binary or shared library when a source file is removed; this is handled by the technique described below in the section “Clean output”.

Automatic dependencies on C and C++ header files

make will need to know the header files, not just the .c and .cpp source files, that each .o file depends on. You can generate this dependency information automatically, using the same technique that automake uses, as documented here by Paul D. Smith (the GNU make maintainer).

Use implicit rules

Put all your clever logic in implicit rules:

%.o: %.cpp
    ...

to keep the rest of the makefile (the part that specifies what to build, not how) as simple as possible. Customise the behaviour for specific targets by specifying variables used by the implicit rule (but don’t use CFLAGS etc. which are for the user to customise). See “Target-specific Variable Values” in the GNU make manual.

Dependencies on third-party packages: pkg-config

If you depend on third-party libraries, don’t build them using the same top-level makefile that you use to build your own project, even if you need to maintain your own patches to those libraries. Use a separate top-level makefile to patch and build all the third-party packages you require. This makefile can use recursive make to invoke each third-party library’s own build system.

Then use a packaging system like RPM to deploy these packages to developers’ machines. You’ll save a lot of developer time by not having to rebuild those third-party packages each time you build your own project.

The makefiles for your main codebase should use pkg-config to find the flags required to compile and link code that depends on third-party libraries. Let’s say that your package A depends on a third-party package B, which in turn depends on C. Linking libB.so requires -lC; linking libA.so requires -lB -lC. pkg-config makes it so that A doesn’t need to know about C: libA.so’s link line becomes $(LD) `pkg-config --libs B` ...

pkg-config is one tool in the autotools ecosystem that is uncontroversially good. Use it.

Detect changes in environment

Record the exact compilation command line (including CFLAGS etc.) in a file and add it as a dependency of the target built by that command line, so that when the environment changes, everything that depends on it is rebuilt.

The technique is described by DJB and used in git’s Makefile.

Clean output

Automake generates makefiles with “silent rules”. So does CMake.

You can implement this in your own makefiles the same way automake does: disable command echoing and add an echo line at the beginning of each recipe. This echo line can check the value of the environment (or make) variable V, if you want to behave exactly like automake.

Alternately, you can build on the technique from the previous section (“Detect changes in environment”): Save the compilation command into an executable file, and have your make recipe execute that file. Without disabling make’s command echoing, the output would look something like this:

mylib1/compile a.o
mylib1/compile b.o
mylib1/link mylib1.so
mylib2/compile x.o
...

If one of those steps fails, you can copy the line of output and paste it into your shell to re-run just that step; add “-v” to show the full compilation command. This idea comes from the “redo” build system.

Separate build directory

Automake and CMake both allow the user to choose a build directory separate from the source directory. This allows you to build separate “debug” and “release” versions from the same source directory, or to cross-compile to several different targets, without the builds trampling on each other.

GNU make’s solution is VPATH. You write the makefile rules as you normally would:

target.o: source.c
    $(CC) $(CPPFLAGS) $(CFLAGS) $< -o $@

and —if VPATH is set and source.c isn’t found in the current (build) directory— make will search for source.c in VPATH’s list of directories. See “Searching Directories for Prerequisites” in the GNU make manual.

VPATH has limitations. Ambiguities arise when two source files in different directories (both in the VPATH) have the same name, but this shouldn’t be a problem if your makefiles are non-recursive (so all targets and prerequisites specify the full pathname relative to the top-level makefile, and your VPATH only contains the single top-level source directory). Recipes can’t assume that any necessary subdirectories exist (in the build directory). C preprocessor double-quote includes (#include "config.h") that refer to generated files (which will be in the build directory) won’t find those files unless the recipe uses the preprocessor’s -iquote option to add the build location to the preprocessor’s search path.

Instead of implementing build directory support in your makefiles, consider leaving it in your users’ hands. Users can overlay a writable build directory over a read-only source directory using a union filesystem. There are several implementations of union filesystems: aufs, funionfs, unionfs-fuse. At least one of them is likely to be available in your Linux distribution. Choose an implementation that supports copy on write (modifying a file from the read-only branch will create a corresponding file in the read-write branch that hides the read-only file) and that allows you to modify the source files in the read-only branch via their original location (outside of the union filesystem) while the union filesystem is mounted.

Building for multiple systems

Each architecture that you want to build on has its own toolchain. These toolchains can have different names for the compiler, linker, etc., or take different command-line arguments. The autotools’ solution is a configure script that, when run by the user, generates a makefile with the appropriate compiler and linker names and options hard-coded. Autoconf also allows you to cross-compile for a host system with a different architecture than the build system, by specifying --host when you run the configure script.

If this is the only reason you need autoconf, you can instead have your makefile include an architecture-specific makefile fragment that defines CC and other architecture-specific variables and implicit rules. This idea is from Plan9’s mk build tool: See section 2 of “Maintaining Files on Plan 9 with Mk”.

Building on different operating systems (Linux, BSD, other Unixes…) sometimes requires changes to the source code: Some functions might not be available and need to be emulated, or a given function behaves in incompatible ways across operating systems. Autoconf’s solution is to generate a “config.h” file full of #define directives, and your source code uses #ifdef conditionals to provide different implementations for different operating systems. Many codebases don’t need this feature of autoconf; I’ve seen large projects that only use autoconf because automake requires it.

In “The Practice of Programming”, Kernighan & Pike recommend that you use the intersection of different systems’ features, not the union of features, to avoid conditional compilation. If you do have to deal with differences between systems, hide those differences behind an interface, and implement that interface in source files that are separate from the rest of your codebase.

Note that you can use autoconf without automake (but not vice-versa). Note also that you can use autoconf to generate a config.h file, or a makefile fragment that you include from your main makefile, without having to create Makefile from Makefile.in. See git’s Makefile, which you can choose to use without running configure, and config.mak.in.

libtool

Libtool is one of the least understood parts of the autotools (at least by me). It is also the one that causes the most trouble, perhaps because of the wrapper scripts it generates that force you to learn the libtool way instead of the way you already know of running and debugging executables. Like most GNU tools, libtool has a decent manual but, as with the other autotools, the additional complexity simply isn’t necessary for many projects.

As far as I can tell, libtool offers the following functionality:

Libtool hardcodes the “runpath” in your shared object files and generates wrapper scripts for your executables so that you can run, for example, unit tests from the build location as well as running the complete program from the final installed location. Instead, you could just set the runpath yourself in your makefile’s recipes (if you have to; it might be better to install into the system’s standard locations), and use LD_LIBRARY_PATH to run unit tests from the build location. See ld-linux(8), and “RPATH, RUNPATH, and dynamic linking” by W. Trevor King.

Libtool figures out the right compiler & linker commands to generate a shared library. Instead, put the specific commands in the implicit rules in your system-specific makefile fragments (see the previous section, “Building for multiple systems”). This is something you only need to figure out once for each system you want to target.

Libtool supports systems that don’t support shared libraries, falling back to static libraries. Many of us don’t need to support such systems.

Libtool provides a “simple library version number abstraction”. It still requires a lot of discipline and work from the library developers — maybe just as much work as maintaining library versioning without libtool.

Enable sensible `make` defaults

Set .DELETE_ON_ERROR in your top-level makefile (and in every sub-makefile if you’re using recursive make). See “Errors” in the GNU make manual.

Consider setting .ONESHELL too. It allows you to write multi-line recipes without explicitly escaping newlines; see “Using One Shell” in the GNU make manual. To enable .ONESHELL in an already-written makefile you’ll have to audit the error handling in each and every recipe. .ONESHELL requires GNU make 3.82 or newer.

Documenting uses of obscure `make` features

Static pattern rules and templates are sure to confuse your not-expert-at-gnu-make colleagues. I like to document them with a comment showing one of the real rules that they expand to:

# e.g. builtin/checkout.o: builtin/checkout.c
$(BUILTIN_OBJS): %.o: %.c

Unit tests

Consider writing unit tests for your build system. Think about this seriously before you dismiss it out of hand! All it takes is a couple of dummy sub-modules built using your build system’s common makefiles, plus some shell scripts to test common and not-so-common situations like: Each expected target is rebuilt (and other targets aren’t) when a source file changes; the target is rebuilt if one of its sources no longer exists; etc.

If you’re well-versed in shell scripting this should barely take one day, and the payoffs are huge. You want to have confidence in your build system, or you’ll lose time to running make clean “just in case…”.