Literate Programming

October 22, 2015 | coding literate

I’ve always been interested in literate programming, perhaps because I both enjoy writing and coding, but in truth I suspect that it is somewhat a mirage; perhaps because it’s difficult to write, and difficult to code, and the domains do not necessarily map between each other cleanly, and asking someone to do both of them may be more than is reasonable. This is probably why people that can write well about coding are so beloved; the intersection area in that Venn diagram is not overly spacious.

Code as Novel

Consider a metaphor: a large piece of code is a novel. Yes, yes, an imperfect metaphor – the fact that you misremembered the description of a character you introduced in Chapter 1 doesn’t usually cause a book to spontaneously catch fire and burn – but let’s go with it anyway So, what we’re looking for is a format that allows us to write a critique and analysis of the novel in a way that is not overly arduous to read.

What are some methods for generating the critical analysis of our novel?

Code => Documentation

There are a large number of code to documentation converters; the canonical one is JavaDoc (example) but I really like the output from things like Marginalia.

I’ve often wondered whether, ultimately, JavaDoc hasn’t made things worse for the state of software documentation. With little or no effort it produces reams of documentation that are substantial enough by volume that you can convince yourself that you have documentation. It’s junk documentation, though, in many cases.

Also, JavaDoc wants to print out methods with its ordering, and this really throws a hash on any kind of thematic consistency that you might be trying to build with your comments … unless you were crazy enough to try and pick method names that alphabetized the way you wanted, I guess. I can’t imagine a shorter road to insanity, though. That’s why I prefer Marginalia, and to my eye the Marginalia documentation I’ve seen is much better. I suspect this is because it doesn’t slice all comments up by method and spit them out like a blender, but instead, presents them in the same order as in the source file.

Going with our metaphor, JavaDoc would be a simple listing of the characters and when they were introduced. Marginalia would be Cliff’s notes.

Documentation => Code

One level up, we have various schemes that allow documentation to be interspersed with source code. Two examples are Literate Haskell and Docco. These systems have much in common with the previous section; the difference is having code and needing to escape to have comments (i.e., JavaDoc, Marginalia), or have documentation and needing to escape for code (i.e, Docco and LHS). These systems extract code from documentation files at a file-by-file level. So, it makes it easy to document the code, but on the other hand, the documentation must follow the flow of the code.

You can imagine, though, that it would be useful to talk about code in a way that is distinct from the ordering that makes sense in the code. Following our metaphor for a bit, you can see where it might be useful to talk about overall themes in the novel, and reference a number of different chapters to show commonalities, without having to present those themes in the same order that they came up in the novel proper That’s where Knuth’s WEB idea comes in… basically, it erases any one-to-one mapping between documentation and source code, and allows the documentation to incrementally specify code in disparate locations. It ends up looking something like this (example from here):

The purpose of wc is to count lines, words, and/or characters in a list of files. The number of lines in a file is ......../more explanations/

Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw:
    <<*>>=
    <<Header files to include>>
    <<Definitions>>
    <<Global variables>>
    <<Functions>>
    <<The main program>>
    @

We must include the standard I/O definitions, since we want to send formatted output
to stdout and stderr.
    <<Header files to include>>=
    #include <stdio.h>
    @

This is a full blown critical analysis of our metaphorical novel, referencing page and passage as necessary to tie into the arc of the analysis.

I’ve played with a variety of web-like solutions, but I find them unsatisfying for day-to-day use. Fundamentally, tool support is weak. You can trick Emacs into dealing with literate forms (with mmm-mode and the like), but it’s not ideal; you still have to integrate the documentation to source code translation in your build steps, and for everything from compilation errors to debugging you have to make sure that your documentation => code mapping works (or else do a bunch of extra work to back reference stuff).

Documentation <= Code

So is there another way to do our full blown critical analysis?

I’m in the process of writing a fair-sized program in Go, and since I’m not super experienced in the language and/or problem domain, I’m documenting a bit more than usual. Literate programming is a decent fit here, but I’ve inverted the Web model; instead of documentation being converted to code via a utility, I instead have the documentation files be separate files, and they pull in the appropriate code segments.

An example would help explain this. The documentation is in Markdown with annotations, and run through pandoc. Code is brought into the document by running through the shell lines marked with an .execute attribute. For example, a block of documentation:

# Parsing

We use the [goparsec](https://godoc.org/github.com/prataprc/goparsec)
library to provide parsing of the various constructs. Note that there are
several sets of parsing rules:

* patterns
* internal commands
* user-typed commands

All of the parsers generate a base `Token` type that looks like this:

  ```go
  `./plinv.py parser/parser.go def Token`{.execute}
  ```

At documentation time, the plinv.py will looking in parser/parser.go to extract the Token definition and insert it into the file right before it is fed into pandoc.

With respect to the tools, this scheme flows decently. For me, at least, documentation is a different brain mode, and it’s helpful to have it separated out like this. I can be writing documentation in one file, go to coding in another, and just have to make sure the references are clean between the two.

There are some things that aren’t ideal. My current extractor is overly simple, depending on particularities of formatting and heavily oriented towards string matches; I suspect that for production you’d want to have something that was more intelligent about the language.

It would be nice to have back linkage, as well; that is, I should be able to read the documentation and hit a link to go back to the appropriate code. It should in theory be possible to build an index while building the documentation (since we know where we’re pulling the code from, after all), but I haven’t gotten to that level of detail.

Nonetheless, I think overall it’s not a bad way to approach literate programming.

Licensing

Brool brool (n.) : a low roar; a deep murmur or humming

Literate Programming

Code as Novel

Code => Documentation

Documentation => Code

Documentation <= Code

Discussion