As the modern technology landscape becomes more and more complex, the generalist programmer needs to master more (and more disparate) tools, languages and techniques. I believe that our tools could do more to ease the learning curve.
The bash shell provides a mechanism for programming its tab-completion
(zsh does too). The bash-completion package provides tab-completion
programs for the common gnu utilities, including make:
Pressing tab runs the bash-completion script for make,
which invokes make --print-data-base to parse the makefile and print
its rules; the bash-completion script then parses this output to
determine the available targets.
It also knows about make’s command-line options, by a similar process
(invoking make --help and parsing the output):
Other tools, such as git, provide their own bash-completion scripts.
Lessons learned:
make.make report its own command-line options.make --help, also needs to
be in a stable format. (Even better if it were in a machine-readable
format that specifies the types of each option’s arguments.)As an example of point 3 above, clang (a C/C++/Objective-C compiler) makes its parser available as a separate library. The goals of the clang project include: to “support diverse clients (refactoring, static analysis, code generation, etc)” and “allow tight integration with IDEs”.
C++ in particular is so difficult to parse that many development tools either cheat by pretending it’s C and making educated guesses, or implement their own parser that simply doesn’t handle all of C++’s many edge cases: This is certainly the case for the widely-used code navigation tools cscope and exuberant ctags.
Although clang says it is “production quality”, the tools taking advantage of its modular architecture have yet to emerge. The wish-list on clang’s website includes “a tool to generate code documentation”, and “implement better versions of existing tools [such as] distcc, the delta testcase reduction tool, and the ‘indent’ source reformatting tool.” Watch this space.
The Go programming language’s standard library includes packages for scanning and parsing Go source files. I see this as an encouraging trend: Language authors realise that tooling is just as important as the capabilities of the language itself.
Go specifies a style guide for source code, and
ships with the gofmt utility to
enforce it. Enforcing standard formatting and indentation allows tools to
make certain assumptions to avoid using a full-blown parser — see my
treatment of diff, below.
gofmt can also perform expression rewriting and simplification. For
example, running gofmt -r 'fn1(a, b(c, d)) -> fn1(a + c + d)' on the
code below left, produces the code on the right:
Unfortunately gofmt only works on single expressions, not on statements
(so you can’t manipulate the s on the left of the := and the
expression on the right, at the same time), and certainly not on multiple
statements at a time. This limits its usefulness.
Enter gofix.
As Go is still a somewhat experimental language, its standard library is
evolving; so Go provides the gofix utility to update programs that use
old APIs. gofix -? reports that it knows 16 transformations. One
example: “Adapt 3-result calls to net.LookupHost to use 2-result form.”
Extending gofix with your own transformations involves writing a plugin
that manipulates the abstract syntax tree using the ast package from
Go’s standard library. The “net.LookupHost” transformation is
implemented in 25 lines of code,
but other transformations can be arbitrarily complex.
Go also ships with a doxygen-style documentation extractor (godoc) and its standard library includes an
automated testing framework.
I love Go’s “codewalk” style of documentation — though it’s only used on the Go website for a couple of examples.
The GNU version of the standard diff utility can show “section
headings”
indicating the section containing the differing lines. In the case of
C-like code, the section is the enclosing function:
diff works out the section heading with a simple regular expresion: The
first preceding line that is hard against the left margin. For C++ code
this also finds the name of the class if the differing line is inside a
class definition — as long as you adhere to certain formatting rules:
Any access specifiers (like public:) must be indented at least by one
space, and namespace blocks shouldn’t increase the indentation level
of their contents.
You can supply diff with custom regular expressions for determining
section headings (with --show-function-line=regexp), but as
far as I know the only way to enable specific regular expressions for
specific types of files is to write a wrapper around diff.
git’s diff and log --patch commands include such a header in their
output by default (git grep can too, with --show-function).
Furthermore, the regular expresion to determine the section header is
configurable for different file extensions, with default patterns for a
dozen languages. See the gitattributes man page.
This feature of git grep, in particular, can serve as a poor man’s
“find all callers/users of this symbol”. (And these days I rarely use
grep or diff except via git.)
Lessons learned:
gofmt).Makefiles provide a concise and powerful way to express dependencies
between source files and the programs generated from them. This
conciseness is a good thing for experts, but a significant barrier for
everyone else (and I would argue that few developers are truly make
experts — most leave the build infrastructure to someone else, or know
just enough to hack in a few changes).
Imagine a tool that automatically generates the following annotations for a makefile:
(Makefile snippets taken from git’s makefile and simplified.)
In implementing such a tool, it would be best if make itself did the
parsing and variable expansion — if it provided a queryable way to
relate specific line and column ranges within the makefile to the
corresponding semantic information (similar to what clang’s libraries
provide for C++ code).
You would be able to control which annotations are displayed — you’d
quickly learn to recognise targets, prerequisites and recipes, and the
verbose annotations would just get in the way. But you’d still appreciate
help on the more obscure features of make.
Note also that simple syntax highlighting can help to avoid common
classes of errors, such as those due to make’s variable expansion:
$BINDIR
means the value of variable B followed by the text INDIR (whereas in
shell it means the value of variable BINDIR).
The source code for the Ruby on Rails web application framework includes the following snippet:
This defines the methods server_name, server_protocol, accept, and
user_agent, but in such a way that you’ll never find the definitions
in the source code when you need to.
Rails calls itself a framework “optimized for programmer happiness”, yet many programmers will not be happy using a tool they don’t thoroughly understand. When the tool is an open-source framework, I want to use my editor’s code navigation to quickly find the definition of the framework code I’m calling or extending, and the code that calls, etc.
The above example is a totally gratuitous use of meta-programming, but in other cases Rails does use meta-programming for legitimate reasons. Whatever the reason, the difficulty of finding the source code for a particular functionality is an obstacle to understanding the framework. Maybe the Ruby interpreter/VM could keep track of the source locations of each method definition, in such a way that can be queried by editing programs.
In a C++ inheritance hierarchy as shown on the right,1 where the middle class has a virtual base class and that base class has no default constructor, the most-derived class must always explicitly call the virtual base’s constructor.
Though the reason makes sense I’d probably forget this within a week. In a recent refactoring I removed EventProducer’s default constructor so that no-one could ever forget to inject2 the EventDispatcher it required (the best API being the one that’s impossible to use incorrectly). Even with each class explicitly calling its direct base’s constructor, gcc complained:
Clang is famous for its helpful error messages. Let’s see how it fares:
Certainly much more readable, but it’s still missing the crucial explanation: “because EventProducer is a virtual base.” Without that, I might have (hypothetically, of course) wasted an hour wondering what’s going on. After all, the most-derived class is correctly calling its direct base’s constructor, which in turn is correctly calling its virtual base’s constructor — or so it would seem from inspecting the code.
C++ has so many corner cases that even an expert will be occasionally baffled. I envisage development environments with integrated help on the programming language; in the meantime, compilers can do a little bit to help.
1: Note that I am not endorsing such use of inheritance; I am merely presenting it as a representative example of what the working programmer might encounter.
2: Speaking of dependency injection: Don’t get me started on the failure of our tools to recognise well-known design patterns.
No treatment of development tools would be complete without mentioning Smalltalk. Even if you never get to use Smalltalk on a real-world project, you owe it to yourself to check it out over a weekend, just to see what development tools could be like. (For starters I suggest the one-click installer provided by Seaside, a Smalltalk web-application framework.)
It’s easier to show these tools than to write about them, so watch this 3-minute video showcasing the Smalltalk code browser, method finder, and debugger: