Charlie Harvey

Hack: Simply Scheme as a single pdf with wget and pdftk

There’s an introductory Scheme programming book called Simply Scheme that is available online for free. I wanted to brush up on my Scheme recently as part of writing my own implementation of the language in Haskell. However, the book is only available as html or as pdfs of the chapters. Which is a little inconvenient for me. What I wanted was one big pdf that I could read on an android tablet. Here’s how to get that pdf using wget, perl and pdftk

First we need to get the contents page, which is simple enough using wget~/Desktop/simplyscheme$ wget http://www.cs.berkeley.edu/~bh/ss-toc2.html

Now the idea is to download the individual pdfs. We use the little known -o argument for grep to extract only the strings that match "ssch..\.pdf" (that is the pdf files). We pipe that to perl -pe (evaluate some code for each line and print the result). With perl we insert "wget http://www.cs.berkeley.edu/~bh/pdf/" at the beginning of the line. Then we pass the output to sort -u to make sure we don’t get any duplicates. Finally we pass the result to sh, which is the same as typing wget at the prompt.~/Desktop/simplyscheme$ grep -o "ssch..\.pdf" ss-toc2.html | perl -pe 's{^}{wget http://www.cs.berkeley.edu/~bh/pdf/}' | sort -u | sh

At this stage we have got all the pdfs as individual files, now we can use pdftk to smoosh them together into one big pdf.~/Desktop/simplyscheme$ pdftk ssch*.pdf cat output simply_scheme.pdf

So now I’ve got a single pdf for reading on my tablet :-) This is a good example of where the ability that UNIX gives you to combine small sharp tools, allows more power than even the most complex GUI could provide. Being able to work in ways that the designer didn’t forsee is the measure of a powerful or generally useful tool. Of course learning how to wield and combine your tools takes more time than pushing buttons on a GUI — there is always tradeoffs between power and obviousness and other factors like convenience and speed.


Comments

  • Be respectful. You may want to read the comment guidelines before posting.
  • You can use Markdown syntax to format your comments. You can only use level 5 and 6 headings.
  • You can add class="your language" to code blocks to help highlight.js highlight them correctly.

Privacy note: This form will forward your IP address, user agent and referrer to the Akismet, StopForumSpam and Botscout spam filtering services. I don’t log these details. Those services will. I do log everything you type into the form. Full privacy statement.