Charlie Harvey

Stegospam: Hiding messages in spam for fun and mischief

tl;dr GCHQ/NSA seem to keep encrypted data but throw away spam. I made a tool that lets you hide your encrypted data in spam.

In recent weeks it has emerged that GCHQ and NSA have been spying not only on their political allies, but on the population in general. Many of us are shocked, if unsurprised by this revelation. If you are feeling like there is nothing you can do, please head over to the EFF site or Restore the Fourth. If you want to opt out of surveillance as much as possible you can learn how at

Details of the Tempora programme being operated by GCHQ, in the country where I live, point to the fact that the authorities throw away low-value traffic like spam and bittorrent using a technique called MVR (for Massive Volume Reduction). I have also read that NSA may retain encrypted data almost indefinitely. So, clearly the best way to send your encrypted data is within the stuff that gets thrown away during MVR.

Stegospam is a semi-serious toy stegonagraphy tool that lets you do just that. It uses spam as a carrier for your important data. Now GCHQ will either have to retain all the spam or risk encrypted data getting through the net.

Enter Stegospam

I should start my intro by saying that stegospam is probably not secure against serious security people (like the NSA). At best it obfuscates your traffic. It does not anonymize or secure it. If you need those capabilities, use tor and GPG. You shouldn’t rely on stegospam hiding your traffic; the source is available, but nobody has subjected it to any testing and even the author (me!) can think of several attacks against it that would probably work. In fact, if you have something that needs to stay secret, don’t put it on the internet. Or talk on the phone about it. Talk face to face far away from buildings and other people. You get the idea.

OK, consider your emptor well and truly caveat-ed. So how does this thing work then?

How it works

Steganography is the art of embedding hidden data inside a carrier file, such as an image or video, and later extract that data. Stegospam uses spam as the carrier file, and the parity of the lengths of words in sentences as the encoding mechanism. Sound baffling? I’ll explain

There are a couple of scripts included in the project download. The first one is called A corpus is a fancy word for a body of text. There is nothing that looks more like spam than spam, right? So, we start off with a corpus made from some of the spam corpus from the Enron case. You pipe this through to, which constructs a sqlite database with it. Here is what to type$ wget $ tar xvzf enron1.tar.gz $ cat enron1/spam/* | grep -v '^Subject:' > corpus $ < corpus

If that worked you should have a corpus.sqlite with a bunch of spam in it. Congratulations. Let’s take a quick look inside the database.$ sqlite3 corpus.sqlite3 SQLite version 2013-04-12 11:52:43 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .mode line sqlite> SELECT * FROM sentences LIMIT 1; id = 1 sentence = Introducing doctor-formulated hgh human growth hormone-also called hgh is referred to in medical science as the master hormone. length = 127 parity_map = 10011010010000110101

OK, so our first sentence is some nonsense about HGH. The interesting thing here is the parity_map field. It is derived by assigning a zero to even length words and a 1 to odd length words. So, as "Introducing" has an odd number of letters in it, it gets a 1. Computer scientists might see where this is going.

Now that we have our database all set up, we can start using the other script, It has two modes — -e enspamifies the data piped to it and -d despamifies the same data. Let’s look at how enspamification works first of all.

When is called with -e, it first takes whatever you pipe to it and converts it into 1s and 0s. So 'h' becomes '0110100000001010'. Now, this produces a lot of 0s and 1s — 'hello steganography' for example comes out as '01101000011001010110110001101100011011110010000001110011011101000110010101100111 01100001011011100110111101100111011100100110000101110000011010000111100100001010'. We have to split that number into chunks that are about the same sort of size as the lengths as our sentences. Now, we can’t make all the chunks the same length or that would be easy to detect when doing MVR! So, we pick random(ish) chunk lengths. Actually, they are distributed so that more chunks are of a medium length in our range than are either short or long. You can set the maximum and minimum 'sentence' lengths and how precisely the sentence lengths are distributed within the script itself.

Now that we have broken the long list of numbers into chunks, we look up a random sentence whose parity_map matches the 1s and 0s for each chunk. We squish these sentences together, and get a nice spam message. Here is me trying "hello" a couple of times$ echo -n hello | ./ -e Products not available in all states. We discovered an unclaimed sum of 24. Sowas habe ich noch nie gesehen. Archpastor fosterlings governs graminifolious aeromarine mutt. Steve marketing team kzl 789 56. Info re movress below:http://www. Get yours today. $ echo -n hello | ./ -e Products not available in all states. Diocesan,graduate episcopate,anselmo edwardine. It is not just about saving. Hello,i have visited www. 00 90 pi lls cai llis-348. Yours free with your first appointment. Just enter the pharmma now.

Despamification and encryption

Now that you understand enspamification, it is pretty obvious that we can reverse the process to extract our original message back out from the spam message. We use's -d option to do our despamification. Lets try it with one of the strings in the previous example$ echo -n Products not available in all states. We discovered an unclaimed sum of 24. Sowas habe ich noch nie gesehen. Archpastor fosterlings governs graminifolious aeromarine mutt. Steve marketing team kzl 789 56. Info re movress below:http://www. Get yours today. | ./ -d hello

Because stegospam is just a UNIX filter, you can enspamify and despamify in the same pipeline. Like this$ echo 'hello steganography' | ./ -e | ./ -d hello steganography

This sort of message is not in any sense secure. An attacker could work out how we were hiding our data and extract it quite easily. We want to combine it with proper encryption. I like to use gpg for this. Here is what I would type to encrypt and enspamify a message in one go and put it in a file called email$ echo not wittingly | gpg -r -e | ./ -e > email

To get the text out of email James would type./ -d < email | gpg -d You need a passphrase to unlock the secret key for user: "James Clapper <>" 2048-bit RSA key, ID A2E3BFD4, created 2012-07-10 gpg: encrypted with 2048-bit RSA key, ID A2E3BFD4, created 2012-07-10 "James Clapper <>" not wittingly

Wrapping up

This was my first attempt to do anything steganographic, it was conceived one day and written in an evening a few days later. I learned that having indexes on fields in your sqlite tables makes them orders of magnitude faster and that inserting can be speeded up by using the asynchronous pragma. I am now playing with using other corpuses. The King James Bible works quite well!


  • Be respectful. You may want to read the comment guidelines before posting.
  • You can use Markdown syntax to format your comments. You can only use level 5 and 6 headings.
  • You can add class="your language" to code blocks to help highlight.js highlight them correctly.

Privacy note: This form will forward your IP address, user agent and referrer to the Akismet, StopForumSpam and Botscout spam filtering services. I don’t log these details. Those services will. I do log everything you type into the form. Full privacy statement.