Helpful programming tools

Case studies

As the modern technology landscape becomes more and more complex, the generalist programmer needs to master more (and more disparate) tools, languages and techniques. I believe that our tools could do more to ease the learning curve.

Bash completion

The bash shell provides a mechanism for programming its tab-completion (zsh does too). The bash-completion package provides tab-completion programs for the common gnu utilities, including make:

$ make c<TAB> check clean

Pressing tab runs the bash-completion script for make, which invokes make --print-data-base to parse the makefile and print its rules; the bash-completion script then parses this output to determine the available targets.

It also knows about make’s command-line options, by a similar process (invoking make --help and parsing the output):

$ make --d<TAB> --debug --directory --dry-run

Other tools, such as git, provide their own bash-completion scripts.

Lessons learned:

  1. Bash-completion doesn’t know how to parse makefiles; it leaves that job to make.
  2. Bash-completion lets make report its own command-line options.
  3. Tools that parse other files need to provide the parse tree in a machine-readable form.
  4. Information such as the parse tree just mentioned, which once may have been though of as only a debugging aid, now needs to be in a stable format — it’s part of the tool’s public API.
  5. Help information, such as the output of make --help, also needs to be in a stable format. (Even better if it were in a machine-readable format that specifies the types of each option’s arguments.)

Clang

As an example of point 3 above, clang (a C/C++/Objective-C compiler) makes its parser available as a separate library. The goals of the clang project include: to “support diverse clients (refactoring, static analysis, code generation, etc)” and “allow tight integration with IDEs”.

C++ in particular is so difficult to parse that many development tools either cheat by pretending it’s C and making educated guesses, or implement their own parser that simply doesn’t handle all of C++’s many edge cases: This is certainly the case for the widely-used code navigation tools cscope and exuberant ctags.

Although clang says it is “production quality”, the tools taking advantage of its modular architecture have yet to emerge. The wish-list on clang’s website includes “a tool to generate code documentation”, and “implement better versions of existing tools [such as] distcc, the delta testcase reduction tool, and the ‘indent’ source reformatting tool.” Watch this space.

Go

The Go programming language’s standard library includes packages for scanning and parsing Go source files. I see this as an encouraging trend: Language authors realise that tooling is just as important as the capabilities of the language itself.

Go specifies a style guide for source code, and ships with the gofmt utility to enforce it. Enforcing standard formatting and indentation allows tools to make certain assumptions to avoid using a full-blown parser — see my treatment of diff, below.

gofmt can also perform expression rewriting and simplification. For example, running gofmt -r 'fn1(a, b(c, d)) -> fn1(a + c + d)' on the code below left, produces the code on the right:

s := fn1("hello", fn2("wor" + "ld", "\n")) fmt.Printf(s)
s := fn1("hello" + ("wor" + "ld") + "\n") fmt.Printf(s)

Unfortunately gofmt only works on single expressions, not on statements (so you can’t manipulate the s on the left of the := and the expression on the right, at the same time), and certainly not on multiple statements at a time. This limits its usefulness.

Enter gofix. As Go is still a somewhat experimental language, its standard library is evolving; so Go provides the gofix utility to update programs that use old APIs. gofix -? reports that it knows 16 transformations. One example: “Adapt 3-result calls to net.LookupHost to use 2-result form.”

Extending gofix with your own transformations involves writing a plugin that manipulates the abstract syntax tree using the ast package from Go’s standard library. The “net.LookupHost” transformation is implemented in 25 lines of code, but other transformations can be arbitrarily complex.

Go also ships with a doxygen-style documentation extractor (godoc) and its standard library includes an automated testing framework.

I love Go’s “codewalk” style of documentation — though it’s only used on the Go website for a couple of examples.

diff

The GNU version of the standard diff utility can show “section headings” indicating the section containing the differing lines. In the case of C-like code, the section is the enclosing function:

$ diff -u --show-c-function discount/main.c{.old,} --- discount/main.c.old 2011-01-29 11:00:54.000000000 +0100 +++ discount/main.c 2011-02-23 10:07:40.000000000 +0000 @@ -125,6 +125,7 @@ main(int argc, char **argv) int version = 0; int with_html5 = 0; int use_mkd_line = 0; + char *extra_footnote_prefix = 0; char *urlflags = 0; char *text = 0; char *ofile = 0;

diff works out the section heading with a simple regular expresion: The first preceding line that is hard against the left margin. For C++ code this also finds the name of the class if the differing line is inside a class definition — as long as you adhere to certain formatting rules: Any access specifiers (like public:) must be indented at least by one space, and namespace blocks shouldn’t increase the indentation level of their contents.

You can supply diff with custom regular expressions for determining section headings (with --show-function-line=regexp), but as far as I know the only way to enable specific regular expressions for specific types of files is to write a wrapper around diff.

git’s diff and log --patch commands include such a header in their output by default (git grep can too, with --show-function). Furthermore, the regular expresion to determine the section header is configurable for different file extensions, with default patterns for a dozen languages. See the gitattributes man page.

This feature of git grep, in particular, can serve as a poor man’s “find all callers/users of this symbol”. (And these days I rarely use grep or diff except via git.)

Lessons learned:

  1. A consistent formatting helps the tools out. But enforcement is difficult — at least in no commercial codebase I have seen, and few open-source ones, are the chosen guidelines consistently enforced.
  2. If you mandate a certain formatting for your project, consider providing a tool to automatically check or even reformat accordingly (cf. Go’s gofmt).

Makefiles

Makefiles provide a concise and powerful way to express dependencies between source files and the programs generated from them. This conciseness is a good thing for experts, but a significant barrier for everyone else (and I would argue that few developers are truly make experts — most leave the build infrastructure to someone else, or know just enough to hack in a few changes).

Imagine a tool that automatically generates the following annotations for a makefile:

all:  
The first target is the default target.  
CFLAGS = -g -O2 -Wall LDFLAGS =  
CFLAGS and LDFLAGS are for users to override from the command-line. [manual] [gnu conventions]  
ALL_CFLAGS = $(CFLAGS) -I.  
BUILT_INS = $(patsubst builtin/%.o,git-%, $(BUILTIN_OBJS))  
$(patsubst pattern,replacement,text) [manual] BUILT_INS = git-checkout git-help
BUILTIN_OBJS = \ builtin/checkout.o \ builtin/help.o  
    $(BUILTIN_OBJS): %.o: %.c   $(CC) -c \ -o $*.o \ $(ALL_CFLAGS) \ $(EXTRA_CPPFLAGS) \ $<    
Static pattern rule [manual]. e.g., builtin/checkout.o: builtin/checkout.c ^ (target) ^ (prerequisites)   Recipe [manual]. CC is the program for compiling C programs (default 'cc') [manual] $* is the part of the file name that matched the '%' in the target pattern, e.g. builtin/checkout [manual]   $< is the name of the first prerequisite, e.g. builtin/checkout.c [manual]
builtin/help.o: EXTRA_CPPFLAGS = \ '-DGIT_HTML_PATH="$(htmldir)"'  
Target-specific variable [manual]  
    git-%: %.o GIT-LDFLAGS libgit.a   $(CC) -o $@ \ $(ALL_CFLAGS) \ $(LDFLAGS) \ $(filter %.o,$^) \ libgit.a    
Pattern rule [manual]. e.g., git-checkout: checkout.o GIT-LDFLAGS libgit.a ^ (target) ^ (prerequisites)   Recipe [manual]. $@ is the file name of the target, e.g. git-checkout [manual]   $(filter pattern,text) [manual] $^ is the names of all the prerequisites [manual]
install: all   for p in $(BUILT_INS); do \ ln "$(BINDIR)/git" \ "$(BINDIR)/$$p"; \ done    
  Recipe [manual]. The shell sees: for p in git-checkout git-help; do \ ln "/usr/local/bin/git" \ "/usr/local/bin/$p"; \ [Using variables in recipes] done
all: $(BUILT_INS)
Add prerequisites to existing target [manual] BUILTINS = git-checkout git-help

(Makefile snippets taken from git’s makefile and simplified.)

In implementing such a tool, it would be best if make itself did the parsing and variable expansion — if it provided a queryable way to relate specific line and column ranges within the makefile to the corresponding semantic information (similar to what clang’s libraries provide for C++ code).

You would be able to control which annotations are displayed — you’d quickly learn to recognise targets, prerequisites and recipes, and the verbose annotations would just get in the way. But you’d still appreciate help on the more obscure features of make.

Note also that simple syntax highlighting can help to avoid common classes of errors, such as those due to make’s variable expansion: $BINDIR means the value of variable B followed by the text INDIR (whereas in shell it means the value of variable BINDIR).

Ruby on Rails

The source code for the Ruby on Rails web application framework includes the following snippet:

[ 'SERVER_NAME', 'SERVER_PROTOCOL', 'HTTP_ACCEPT', 'HTTP_USER_AGENT' ]. each do |env| define_method(env.sub(/^HTTP_/n, '').downcase) do @env[env] end end

This defines the methods server_name, server_protocol, accept, and user_agent, but in such a way that you’ll never find the definitions in the source code when you need to.

Rails calls itself a framework “optimized for programmer happiness”, yet many programmers will not be happy using a tool they don’t thoroughly understand. When the tool is an open-source framework, I want to use my editor’s code navigation to quickly find the definition of the framework code I’m calling or extending, and the code that calls, etc.

The above example is a totally gratuitous use of meta-programming, but in other cases Rails does use meta-programming for legitimate reasons. Whatever the reason, the difficulty of finding the source code for a particular functionality is an obstacle to understanding the framework. Maybe the Ruby interpreter/VM could keep track of the source locations of each method definition, in such a way that can be queried by editing programs.

Compiler error messages

In a C++ inheritance hierarchy as shown on the right,1 where the middle class has a virtual base class and that base class has no default constructor, the most-derived class must always explicitly call the virtual base’s constructor.

Though the reason makes sense I’d probably forget this within a week. In a recent refactoring I removed EventProducer’s default constructor so that no-one could ever forget to inject2 the EventDispatcher it required (the best API being the one that’s impossible to use incorrectly). Even with each class explicitly calling its direct base’s constructor, gcc complained:

FakePowerManager.cpp: In constructor 'FakePowerManager::FakePowerManager(EventDispatcher)': FakePowerManager.cpp:33:34: error: no matching function for call to 'EventProducer::EventProducer()' FakePowerManager.cpp:33:34: note: candidates are: EventProducer.h:13:5: note: EventProducer::EventProducer(EventDispatcher) EventProducer.h:13:5: note: candidate expects 1 argument, 0 provided EventProducer.h:11:8: note: EventProducer::EventProducer(const EventProducer&) EventProducer.h:11:8: note: candidate expects 1 argument, 0 provided

Clang is famous for its helpful error messages. Let’s see how it fares:

FakePowerManager.cpp:32:5: error: constructor for 'FakePowerManager' must explicitly initialize the base class 'EventProducer' which does not have a default constructor FakePowerManager(EventDispatcher dispatcher) ^ In file included from FakePowerManager.cpp:10: In file included from ./PowerManager.h:7: ./EventProducer.h:11:8: note: 'EventProducer' declared here struct EventProducer ^ 1 error generated.

Certainly much more readable, but it’s still missing the crucial explanation: “because EventProducer is a virtual base.” Without that, I might have (hypothetically, of course) wasted an hour wondering what’s going on. After all, the most-derived class is correctly calling its direct base’s constructor, which in turn is correctly calling its virtual base’s constructor — or so it would seem from inspecting the code.

C++ has so many corner cases that even an expert will be occasionally baffled. I envisage development environments with integrated help on the programming language; in the meantime, compilers can do a little bit to help.

1: Note that I am not endorsing such use of inheritance; I am merely presenting it as a representative example of what the working programmer might encounter.

2: Speaking of dependency injection: Don’t get me started on the failure of our tools to recognise well-known design patterns.

Smalltalk

No treatment of development tools would be complete without mentioning Smalltalk. Even if you never get to use Smalltalk on a real-world project, you owe it to yourself to check it out over a weekend, just to see what development tools could be like. (For starters I suggest the one-click installer provided by Seaside, a Smalltalk web-application framework.)

It’s easier to show these tools than to write about them, so watch this 3-minute video showcasing the Smalltalk code browser, method finder, and debugger: