tl;dr GCHQ/NSA seem to keep encrypted data but throw away spam. I made a tool that lets you hide your encrypted data in spam.
In recent weeks it has emerged that GCHQ and NSA have been spying not only on their political allies, but on the population in general. Many of us are shocked, if unsurprised by this revelation. If you are feeling like there is nothing you can do, please head over to the EFF site or Restore the Fourth. If you want to opt out of surveillance as much as possible you can learn how at prism-break.org.
Details of the Tempora programme being operated by GCHQ, in the country where I live, point to the fact that the authorities throw away low-value traffic like spam and bittorrent using a technique called MVR (for Massive Volume Reduction). I have also read that NSA may retain encrypted data almost indefinitely. So, clearly the best way to send your encrypted data is within the stuff that gets thrown away during MVR.
Stegospam is a semi-serious toy stegonagraphy tool that lets you do just that. It uses spam as a carrier for your important data. Now GCHQ will either have to retain all the spam or risk encrypted data getting through the net.
I should start my intro by saying that stegospam is probably not secure against serious security people (like the NSA). At best it obfuscates your traffic. It does not anonymize or secure it. If you need those capabilities, use tor and GPG. You shouldn’t rely on stegospam hiding your traffic; the source is available, but nobody has subjected it to any testing and even the author (me!) can think of several attacks against it that would probably work. In fact, if you have something that needs to stay secret, don’t put it on the internet. Or talk on the phone about it. Talk face to face far away from buildings and other people. You get the idea.
OK, consider your emptor well and truly caveat-ed. So how does this thing work then?
How it works
Steganography is the art of embedding hidden data inside a carrier file, such as an image or video, and later extract that data. Stegospam uses spam as the carrier file, and the parity of the lengths of words in sentences as the encoding mechanism. Sound baffling? I’ll explain
There are a couple of scripts included in the project download. The first one is called import_corpus.pl. A corpus is a fancy word for a body of text. There is nothing that looks more like spam than spam, right? So, we start off with a corpus made from some of the spam corpus from the Enron case. You pipe this through to import_corpus.pl, which constructs a sqlite database with it. Here is what to type
$ wget http://www.aueb.gr/users/ion/data/enron-spam/preprocessed/enron1.tar.gz
$ tar xvzf enron1.tar.gz
$ cat enron1/spam/* | grep -v '^Subject:' > corpus
$ import_corpus.pl < corpus
If that worked you should have a corpus.sqlite with a bunch of spam in it. Congratulations. Let’s take a quick look inside the database.
$ sqlite3 corpus.sqlite3
SQLite version 22.214.171.124 2013-04-12 11:52:43
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .mode line
sqlite> SELECT * FROM sentences LIMIT 1;
id = 1
sentence = Introducing doctor-formulated hgh human growth hormone-also called hgh is referred to in medical science as the master hormone.
length = 127
parity_map = 10011010010000110101
OK, so our first sentence is some nonsense about HGH. The interesting thing here is the parity_map field. It is derived by assigning a zero to even length words and a 1 to odd length words. So, as "Introducing" has an odd number of letters in it, it gets a 1. Computer scientists might see where this is going.
Now that we have our database all set up, we can start using the other script, stegospam.pl. It has two modes — -e enspamifies the data piped to it and -d despamifies the same data. Let’s look at how enspamification works first of all.
When stegospam.pl is called with -e, it first takes whatever you pipe to it and converts it into 1s and 0s. So 'h' becomes '0110100000001010'. Now, this produces a lot of 0s and 1s — 'hello steganography' for example comes out as '01101000011001010110110001101100011011110010000001110011011101000110010101100111 01100001011011100110111101100111011100100110000101110000011010000111100100001010'. We have to split that number into chunks that are about the same sort of size as the lengths as our sentences. Now, we can’t make all the chunks the same length or that would be easy to detect when doing MVR! So, we pick random(ish) chunk lengths. Actually, they are distributed so that more chunks are of a medium length in our range than are either short or long. You can set the maximum and minimum 'sentence' lengths and how precisely the sentence lengths are distributed within the script itself.
Now that we have broken the long list of numbers into chunks, we look up a random sentence whose parity_map matches the 1s and 0s for each chunk. We squish these sentences together, and get a nice spam message. Here is me trying "hello" a couple of times
$ echo -n hello | ./stegospam.pl -e
Products not available in all states. We discovered an unclaimed sum of 24. Sowas habe ich noch nie gesehen. Archpastor fosterlings governs graminifolious aeromarine mutt. Steve marketing team kzl 789 56. Info re movress below:http://www. Get yours today.
$ echo -n hello | ./stegospam.pl -e
Products not available in all states. Diocesan,graduate episcopate,anselmo edwardine. It is not just about saving. Hello,i have visited www. 00 90 pi lls cai llis-348. Yours free with your first appointment. Just enter the pharmma now.
Despamification and encryption
Now that you understand enspamification, it is pretty obvious that we can reverse the process to extract our original message back out from the spam message. We use stegaspam.pl's -d option to do our despamification. Lets try it with one of the strings in the previous example
$ echo -n Products not available in all states. We discovered an unclaimed sum of 24. Sowas habe ich noch nie gesehen. Archpastor fosterlings governs graminifolious aeromarine mutt. Steve marketing team kzl 789 56. Info re movress below:http://www. Get yours today. | ./stegospam.pl -d
Because stegospam is just a UNIX filter, you can enspamify and despamify in the same pipeline. Like this
$ echo 'hello steganography' | ./stegospam.pl -e | ./stegospam.pl -d
This sort of message is not in any sense secure. An attacker could work out how we were hiding our data and extract it quite easily. We want to combine it with proper encryption. I like to use gpg for this. Here is what I would type to encrypt and enspamify a message in one go and put it in a file called email
$ echo not wittingly | gpg -r email@example.com -e | ./stegospam.pl -e > email
To get the text out of email James would type
./stegospam.pl -d < email | gpg -d
You need a passphrase to unlock the secret key for
user: "James Clapper <firstname.lastname@example.org>"
2048-bit RSA key, ID A2E3BFD4, created 2012-07-10
gpg: encrypted with 2048-bit RSA key, ID A2E3BFD4, created 2012-07-10
"James Clapper <email@example.com>"
This was my first attempt to do anything steganographic, it was conceived one day and written in an evening a few days later. I learned that having indexes on fields in your sqlite tables makes them orders of magnitude faster and that inserting can be speeded up by using the asynchronous pragma. I am now playing with using other corpuses. The King James Bible works quite well!