This article is part three of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.
Download the code: http://imrannazar.com/content/files/jpegparse.zip
In the previous part, I mentioned that most JPEG files employ an encoding technique on top of the image compression, in an attempt to remove any trace of redundant information from the image. The technique used by the most common JPEG encoding is an adaptation of one seen throughout the world of data compression, known as Huffman coding, so it's useful to explore in detail the structure and implementation of a Huffman decoder.
Because Huffman coding is the last thing performed by a JPEG encoder when saving an image file, it needs to be the first thing done by our decoder. This can be achieved in two ways:
- Reconstitution: The full image scan is decoded from its Huffman-coded state into a temporary placeholder, and further processing is performed on the temporary copy.
- On-the-fly: The encoded image scan is processed one code at a time, and each JPEG block is handled when enough information is available.
This article will take the second approach, to save memory and sacrifice time; full reconstitution can be implemented using the code built below in a very similar fashion.
The Huffman algorithm
The concept behind Huffman coding and other entropy-based schemes is similar to the concept behind the substitution cipher: each unique character in an input is transformed into a unique output character. The simplest example is the Caesar substitution, which can be represented in tabular form as follows:
An improvement on the standard substitution cipher can be made by noting the relative frequency of characters in the input, and designing a table that contains shorter codes as substitutes for these characters, than for rarer ones. Taking a look at the frequency of letters in the above example, with their ASCII representations included, we can produce a table of increasing unique codes such as the following:
Character | ASCII | Frequency | Code |
---|---|---|---|
Space | 00100000 | 7 | 00 |
a | 01100001 | 5 | 01 |
e | 01100101 | 4 | 100 |
i | 01101001 | 3 | 1010 |
s | 01110011 | 3 | 1011 |
h | 01101000 | 2 | 11000 |
p | 01110000 | 2 | 11001 |
r | 01110010 | 2 | 11010 |
C | 01000011 | 1 | 110110 |
T | 01010100 | 1 | 110111 |
c | 01100011 | 1 | 111000 |
f | 01100110 | 1 | 111001 |
l | 01101100 | 1 | 111010 |
m | 01101101 | 1 | 111011 |
n | 01101110 | 1 | 111100 |
o | 01101111 | 1 | 111101 |
x | 01110111 | 1 | 111110 |
Substituting these codes for the characters in the original text, it can be seen how the encoded data is much smaller than the original.
The main disadvantage of the Huffman coding method is that the table of codes needs to be stored alongside the compressed data: in the above example, the red string of encoded bytes would be meaningless without the corresponding frequency table. The table of codes and their corresponding characters can be recorded in full, but there is a more space-efficient way to save the codes, if attention is paid to the pattern of their occurrence. Two things are of note here: firstly that the codes increase in length, but also that within a group of the same length, codes are sequential. This means the code table can be written down as:
A careful eye on the codes themselves can yield further improvements on how much space it takes to record the encoding table. If we take a look at the codes in conjunction with the list of code length above, we can start counting as follows.
In every case, when the requisite number of codes has been counted for the given code length, all that is needed is to double the counter and continue for the next code length. In other words, there is no need to record the "starting at" part of the code lengths list above, since it can be inferred by starting at zero. The final code list therefore looks as follows.
The JPEG DHT segment
A JPEG file's Huffman tables are recorded in precisely this manner: a list of how many codes are present of a given length (between 1 and 16), followed by the meanings of the codes in order. This information is held in the file's "Define Huffman Table" (DHT) segments, of which there can be up to 32, according to the JPEG standard.
As seen above, data encoded by the Huffman algorithm ends up recorded as a series of codes wedged together in a bit-stream; this also applies to the image scan in a JPEG file. A simple routine for reading codes from the bit stream may look like this:
Pseudocode for reading a Huffman-coded value
In order to facilitate this algorithm, the Huffman codes should be stored in a way that allows us to determine if a code is in the map at a given length. The canonical way to represent a Huffman code list is as a binary tree, where the sequence of branches defines the code and the depth of the tree tells us how long the code is. The C++ STL abstracts this out for us, into the map
construct.
Since there are up to 32 possible Huffman tables that can be defined in a JPEG file, our implementation will require 32 map
s to be available. It's also worth defining at this point how the DHT segment handler will be called by the parseSeg
method developed in the previous part of this series.
jpeg.h: DHT data definition
jpeg.cpp: Passing control to the segment handler
jpeg.cpp: DHT segment handler, builds a Huffman map
As with the previous part, the JPEG class can be instantiated with a filename; if this is done, the above code will produce output along the following lines:
Next time: Reading the bitstream
Once the Huffman maps have been built for a JPEG file, the image scan can be decoded for further processing. In the next part, I'll take a look at the Huffman decoding of the scan, in the wider context of reading blocks from the image and examining the process through which they are transformed.
Imran Nazar <tf@imrannazar.com>, Feb 2013.
Article dated: 24th Feb 2013