Comparing Lists of Works Cited with Regular Expressions

XKCD 208: a programmer swings in on a rope to save the day with regular expressions
Obligatory XKCD (#208, released under a Creative Commons Attribution-NonCommercial 2.5 License)

As part of a recent project, I had to compare my list and a co-author’s list of works cited in a chapter. Since we started five and a half years ago, one co-author dropped out and versions of the files got confused between different people in the project, some forthcoming publications had appeared or changed venues, and the first entries were written long before we had a style guide. It was very important to make sure that every work cited appeared in the bibliography, and to reduce the length of the chapter as much as possible by removing references to works which are not cited. Since the lists of works cited contained 150 and 180 entries, many of which fill several lines in print, comparing the two lists was going to be a tedious task. And so I turned to the powerful arts of a dead tongue which I had not invoked since I learned it from German and Indian adepts in a distant land: the language of shell scripting and regular expressions.

