Have you ever read a book with a “How to read this book” chapter somewhere early in the table of contents? This post explores how revision control can address/solve the same problem in application to reading source code or other documents of similar structure and complexity.

Unlike a book, the source code can rarely be read serially end-to-end. Literate programming works only for short enough programs. But times when listings of programs were printed on the paper and read offline are far behind. In the present time, a programmer diving into a new project can do it in an IDE, easily jumping between declarations, definitions and points-of-use or taking advantage of tooltips with documentation of the symbol under the cursor. That all is very good, but my point of view is that comprehending complex code would still benefit from a well thought out reading guide. And revision control can play a central role here.

As was stated above, code is not linear. It consists of various pieces of functionality, features, etc, some of which are cross-cutting. The latter means that the fragments making them up are not localized in a single place but can be scattered in multiple files with no explicit and easily discoverable connections between them. One way to make those connections explicit is to add special comments. Another way is to rely on revision control.

Let’s take the most recent version R of the source code and remove some feature F from it, deleting all related code. We will arive at another version P1. In a sense P1 is a precursor of R, since R can be obtained from P1 via the opposite (constructive rather than destructive) transformation. With F absent, we can probably somewhat simplify the code in case its design contained elements that were meant to enable F but now became superfluous. That will lead us to a new revision P2.

Continuing in that way we will arrive at an empty program E [1]. By reversing that history (and assuming that informative commit messages were written describing the constructive flow of changes) we will have one version of the reading guide for the revision R! Other variants of reading guides will correspond to other sequences of “destroying” the full program.

Usually, revision controlled systems only contain the history of how they evolved under development that lead to their current state [2]. For large and sophisticated systems that path is rarely the optimal one - most likely it contains trial-and-error, undoing or redoing of solutions that proved to be unsustainable under new requirements, etc. In some cases the information contained in the evolution history of such a system can largely exceed the volume of its most recent version. Though that can have some positive aspects, e.g. substantiating the solutions ending up in the most recent version, it can also be of not that much help in understanding the latter.

The proposed approach asks for a revision control system that allows adding extra paths between any two existing revisions. There must be named revisions and named paths.

Of course, creating those additional paths is extra work. But the more frequently the code is read the sooner that effort will pay off.

Notes

1. Alternatively, we can stop at another reference revision.

2. History rewriting is practised only at a small scale on PR level when reviewers demand from contributors to polish the PR so that it is easier to review.