Back to Let's Build a JPEG Decoder
This article is part four of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.
Download the code: http://imrannazar.com/content/files/jpegparse.zip
Previously, I discussed the Huffman compression algorithm as implemented by JPEG, and a mechanism by which JPEG encoders pick the substitution values used for the compression. The image itself, in a baseline entropy-coded JPEG file, is stored in one "scan" as a stream of Huffman codes; the codes are of a variable length, and are not necessarily at even byte boundaries.
For this reason, it's a requirement that some kind of queue be introduced between the bytes of the file, and the values as seen by the JPEG decoder: this allows bits to be pushed onto the queue from the bytes of the file, and pulled out in varying amounts by the decoder. A single routine can thus be made the central routing point for any requests for variable-width symbols.
A queue of things, when used in this fashion, generally has two modes of operation: either there are enough things in the queue for the request to deal with, or additional reading has to be done before there are enough things in the queue to handle the request. In the case of a bit-level queue for reading from a file, as we need here, the number of bits requested is the determining factor. In the first example below, a queue holding five bits is asked to return the first three, and is able to handle the request without a problem.
If a request for four bits then comes in, the queue doesn't contain enough bits to handle the request, and must first fetch another full byte from the JPEG file (shown here in red).
Implementing the bit queue
A complication arises when trying to code an implementation of the bit queue as shown above: the output end of the queue is at the left, and if one contiguous value is used to store the queue, the left is the most-significant end with the highest-value bits. If we demarcate the queue as being 32 positions long, and define one 32-bit value as holding its entire contents, the above scenarios break down as follows:
- Pulling three bits
- The request is shorter than the queue length of 5, no reading from file happens.
- The output value is the top three bits of the queue, shifted down (32 - 3) bits to become the bottom three bits of an output value;
- The queue is shifted left three bits, and left with length of 2.
- Pulling another four bits
- The request is longer than the queue length of 2: a byte is read in, and shifted up (24 - 2) bits to join the queue;
- The queue is now 10 bits long, no more bytes need reading;
- The output value is the top four bits of the queue, shifted down (32 - 4);
- The queue is shifted left four bits, and left with length of 6.
At all times during the queue's operation, the next available bits are at the high-value end of the contiguous queue variable. An implementation of this may look like the following.
jpeg.h: Definition of bit-resolution file reader
class JPEG { private:// Read a number of bits from fileu16 readBits(int); };
jpeg.cpp: Bit-resolution file reader
u16 JPEG::readBits(int len) {// The number of bits left in the queue, and the value representedstatic int queueLen = 0; static u32 queueVal = 0; u8 readByte; u16 output; if (len > queueLen) { do {// Read a byte in, shift it up to join the queuereadByte = fgetc(fp); queueVal = queueVal | (readByte << (24 - queueLen)); queueLen += 8; } while (len > queueLen); }// Shift the requested number of bytes down to the other endoutput = ((queueVal >> (32 - len)) & ((1 << len) - 1)); queueLen -= len; queueVal <<= len; return output; }
The JPEG SOF segment
As mentioned in the previous part of this series, a JPEG file can define up to 32 Huffman code tables, each in their own DHT
segment. A JPEG file holds the data corresponding to the image itself in a "frame", denoted by a "Start of Frame" segment header. The SOF header contains a part of the information required to decode the frame, and is structured according to the following table.
Field | Value | Size (bytes) |
---|---|---|
Precision (the number of pixels in a JPEG block) | 8 | 1 |
Image height | Up to 65535 | 2 |
Image width | Up to 65535 | 2 |
Components | Number of colour components | 1 |
For each component (in a YUV-colour file, three) | ||
ID | Identifier for later use | 1 |
Sampling resolution | For later examination | 1 |
Quantisation table | For later examination | 1 |
As can be seen in the above table, some of the fields involve operations that we have not yet examined (the sampling resolution and the quantisation table for each component). For the purposes of completing the SOF segment handler, we can hold onto this information for later use.
A "frame" consists of the SOF header, and a number of "scans"; as the name implies, each scan is a pass over the full rectangle of the image. In an interlaced JPEG file for example, multiple scans would be present for the single frame, each of them having more resolution than the last; in a progressive JPEG file, there is just one scan containing all the information for the image. Since this series is concerned with building a decoder for progressive JPEG files, we'll focus on dealing with a single scan in the frame.
It turns out that the information required by Huffman decoding, in particular which of the DHT
tables to use, is defined by each scan in the frame instead of by the frame itself. We'll look at the scan-level information in more detail in the next part of this series; for now, it's sufficient to decouple our representation of a colour component in the image from the SOF
data, and define the exact metadata for a component later.
Structure definitions
With the information detailing the structure of an SOF
header, it becomes relatively simple to build a segment parser to plug into our existing code. The only complication arises from the fact that multi-byte values in JPEG are stored in big-endian format, which may not necessarily be the host format for large integers. It's useful to have a set of definitions for transparently handling big-endian values, which is presented below.
byteswap.h: Big-endian value handling macros
/** * Let's Build a JPEG Decoder * Big-endian value handling macros * Imran Nazar, May 2013 */#ifndef __BYTESWAP_H_ #define __BYTESWAP_H_ #if __SYS_BIG_ENDIAN == 1 # define htoms(x) (x) # define htoml(x) (x) # define mtohs(x) (x) # define mtohl(x) (x) #else # define htoms(x) (((x)>>8)|((x)<<8)) # define htoml(x) (((x)<<24)|(((x)&0xFF00)<<8)|(((x)&0xFF0000)>>8)|((x)>>24)) # define mtohs(x) (((x)>>8)|((x)<<8)) # define mtohl(x) (((x)<<24)|(((x)&0xFF00)<<8)|(((x)&0xFF0000)>>8)|((x)>>24)) #endif #endif//__BYTESWAP_H_
jpeg.h: SOF header structures
// Prevent padding bytes from creeping into structures#define PACKED __attribute__((packed)) class JPEG { private:// Information in the SOF headerstruct PACKED { u8 precision; u16 height; u16 width; u8 component_count; } sofHead; typedef struct PACKED { u8 id; u8 sampling; u8 q_table; } sofComponent;// Internal information about a colour componenttypedef struct PACKED { u8 id;// There is likely to be more data here...} Component;// The set of colour components in the imagestd::vectorcomponents; // The SOF segment handlerint SOF(); };
jpeg.cpp: Passing control to the segment handler
int JPEG::parseSeg() { ... switch (id) {// The SOF segment defines the components and resolution // of the JPEG frame for a baseline Huffman-coded imagecase 0xFFC0: size = READ_WORD() - 2; if (SOF() != size) { printf("Unexpected end of SOF segment\n"); return JPEG_SEG_ERR; } break; ... } return JPEG_SEG_OK; }
jpeg.cpp: SOF segment handling
int JPEG::SOF() { int ctr = 0, i; fread(&sofHead, sizeof(sofHead), 1, fp); ctr += sizeof(sofHead); sofHead.width = mtohs(sofHead.width); sofHead.height = mtohs(sofHead.height); printf("Image resolution: %dx%d\n", sofHead.width, sofHead.height); for (i = 0; i < sofHead.component_count; i++) { sofComponent s; fread(&s, sizeof(sofComponent), 1, fp); ctr += sizeof(sofComponent); Component c; c.id = s.id; components.push_back(c); } return ctr; }
Next time: The Minimum Coded Unit
As mentioned above, the image frame in a progressive JPEG file is encoded as a scan, composed of a series of blocks; depending on the sampling resolution of the components in the image, these blocks can be larger than the 8x8-pixel base block size of the JPEG algorithm. In the next part of this series, I'll examine the relationship between these larger units and the colour components of the image.
Imran Nazar <tf@imrannazar.com>, May 2013.