Steganography with Brainfuck

12th Nov 2011

Steganography is a name given for a set of techniques where the targeted message is hidden inside other data, most often in the context of an image in some form. Common forms of steganography include deviations in the level of individual pixels, or changes to the low-level noise components of the image. In this article, I'll look at a technique that produces an image which is also a program, which can be run to produce the targeted message.

In order to keep the examples here simple, I've sought a form of program that has a simple instruction set. Many processors have a simple instruction set: the 6502 has around 150 unique codes in its architecture, which can be reduced further with careful selection of instructions. However, having written about Brainfuck before, it seems the logical choice: eight instructions, each with a unique code, and all other codes are ignored during execution.

The program to be encoded, then, is as follows:

Message before encoding >>++++++++[<+++[<+++++>-]>-]<<[>+>+>+>+>+<<<<<-]++++++++[>->--> --->---->-----<<<<<-]<++++[>++++++++<-]>>>>>>++++.<<<.>+++++.<< <.>>+++++.++.-.---.>.<<+++++++++.<.>>--.<------.<.>>.+++++.<<.> +.>------.>.<<<.>>>-.<+.<-.>-.<++++.>>---.<<----.>.>++++.<<-. The monkey is in the dishwasher

Encoding program code into pixels

In order for the program to be converted to an image, the first step is to take the program as a stream of numbers. In Brainfuck, the eight operators can be treated as eight numbers to be inserted into a stream:

+	0x2B	-	0x2D
[	0x5B	]	0x5D
<	0x3C	>	0x3E
.	0x2E	,	0x2C

Table 1: Character values for the Brainfuck operators

Hiding these values within image data would be difficult if the data were full-colour (24- or 32-bit), since the colour component would need to correspond exactly at a given point in the image. Instead of using full-colour images, it makes more sense to use palette-based images, where an 8-bit palette index refers to a table of colours defined beforehand.

A simple example of palette data would be a graded greyscale image of sixteen shades, like the one below:

Lena in greyscale — Figure 1: Lena, in sixteen-shade greyscale

If this image is produced in a standard fashion, a palette of sixteen entries and 240 blanks is saved alongside the image. By expanding the palette entries used to the maximum allotted number of 256, it's possible to use the extra entries for hiding information:

	B	C	D	E
0
1
2	+	,	-	.
3		<		>
4
5	[		]
6
7
8
9
A
B
C
D
E
F

Table 2: Sixteen-shade greyscale palette, with Brainfuck operators indicated

As can be seen above, each row of the palette is the same shade of grey, except where the Brainfuck operators map onto the palette. By changing arbitrary pixels in the image, it's possible to replace any given #2-grey pixel with, for example, a + operator without changing the row of the palette used. Shifting pixel values along the palette row in this fashion allows us to change the value without changing the look of the image.

Before applying this concept to the example image in Figure 1, it's important to consider the type of image needed for this to work.

Uncompressed images: the BMP format

The major issue presented by image formats is that of compression: in order to make the image file size smaller, various methods of compression can be used to translate areas of similarity in the image to a much smaller amount of data than that represented by the raw image. Of course, when this is done the resultant compressed data bears no superficial relation to the raw image data; attempting to encode a Brainfuck program into the compressed data stream is beyond the scope of this article.

Instead, I'll be focussing on the raw image data itself; storage of the raw data can be achieved through an uncompressed data format such as TIFF. Examples of uncompressed 8-bit palette image formats include BMP and PCX; the former will be used here, due to its prevalence as a major image file format. When used as an uncompressed 8-bit format, the structure of a BMP file is as follows:

Field	Size (bytes)	Value
Signature	2	"BM"
File size	4	Size of the full BMP
Reserved	4	0x00000000
Data offset	4	Offset of the pixel array
BITMAPINFOHEADER
Header size	4	0x00000028
Image width	4	Width of the image in pixels
Image height	4	Height of the image in pixels
Colour planes	2	0x0001
Color depth	2	0x0008
Compression method	4	0x00000000
Uncompressed image size	4	Width * height * bytes-per-pixel
Horizontal DPM	4	Pixels per metre, horizontal
Vertical DPM	4	Pixels per metre, vertical
Colours used	4	Number of unique colours
Important colours	4	Number of important colours
Colour table (256 entries)
Colour	4	RGBA value
Pixel data, 1 byte per pixel

Table 3: BMP file structure (all numeric fields are little-endian)

The end result of this process is to produce a bitmap file that can be run as a Brainfuck program; to that end, we must be careful that the header data and colour table don't contain Brainfuck operators that may corrupt the initial state of the program. This is, fortunately, relatively easy: all that is needed is to pick an image width and height that don't contain an operator within the byte values, and the rest should automatically fall into place.

One other quirk of the BMP format is the order in which pixel data is stored. Most image formats treat the first row as the top row, in accordance with computer graphics principles; BMP lines up with mathematical graphing principles, and treats the bottom row as the first.

Position of the first row — Figure 2: Position of the rows in a BMP

This means that the Brainfuck operators need to be encoded into the image starting at the bottom, running left-right, then moving up to the next row. By examining each operator in the program in turn, and finding the next appropriate pixel (of the same row in the palette) to replace, the following result is obtained.

Encoded Lena — Figure 3: Lena, encoded with the targeted message, at 400% zoom

Once the appropriate pixels have been replaced with versions shifted along the palette row, the palette can be amended to set the Brainfuck operators to the same colour as the rest of the row in which they reside. Doing this, and producing an uncompressed bitmap of the result, provides the following.

Original Lena — Figure 4: The original image, and the image encoded with the target message

Final encoded Lena — Figure 4: The original image, and the image encoded with the target message

Running the bitmap image, shown on the right, through a Brainfuck interpreter results in the targeted message being revealed.

PHP Brainfuck interpreter <?php list($c, $i) = split('!', file_get_contents($_SERVER['argv'][1])); $c = preg_replace('#[^\[\]\<\>,.+-]#', "", $c); $p = $v = 0; eval(strtr($c, array( "]" => "}", "[" => 'while($m[$p]){', "+" => '$m[$p]++;', "-" => '$m[$p]--;', ">" => '$p++;', "<" => '$p--;', "," => 'if(strlen($i)>$v)$m[$p]=ord($i[$v++]);', "." => 'echo chr($m[$p]);' ))); ?> Sample result $ php -f brainfuck.php http://imrannazar.com/content/img/bf-bmp-final.bmp The monkey is in the dishwasher$

Caveats and improvements

The process outlined above is simple enough that it can be automated: given a sixteen-step greyscale image, seek from the bottom left for a pixel in the desired palette row, replace the pixel and move forward. This, however, isn't ideal for steganographic purposes; the Brainfuck operators leave a unique signature in a bitmap of otherwise uniform numbers. There are two simple ways to increase the level of obfuscation:

Colour: Instead of using a greyscale image, which offers easy colour matching, it's possible to use a coloured palette with any 8-bit image as the source. This involves replacing any existing instances of the operator characters with a close colour match, then further colour matching can be used to fill in the program, hiding the signal in an increased level of noise.
Compression: Noise levels can be increased further by compressing the image once it has been encoded, hiding any relation between the bitmap data and any Brainfuck program contained within. This increases the complexity of both the encoding and decoding algorithms, since a compression step must be employed around the transmission of the image, but it allows the images to be transmitted in a format that avoids suspicion.

Imran Nazar <tf@imrannazar.com>, Nov 2011.