Let's Build a JPEG Decoder: File Structure

20th Jan 2013

This article is part two of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Download the code: http://imrannazar.com/content/files/jpegparse.zip

In the previous part, I gave a brief overview of the techniques used by JPEG to compress an image. Before examining the detailed implementation of those techniques, it's useful to look at the overall structure of a JPEG file, for two reasons:

Some of the processes used by the encoder employ tables of values, which are stored alongside the image information itself, so it's sensible to retrieve these into memory before they're needed.
We'll also need somewhere to put the implementation of the image decompression algorithm, and having a framework in place for that facilitates this.

The implementation developed in this series of articles will be written in C++, but the constructs can be transplanted to a language of your choice with little additional complexity.

Types of JPEG image

It should be stated at this juncture that the implementation developed here will only apply to one common subset of all the possible types of JPEG image. Firstly, there are four types of compression supported by the standard:

Baseline: The most common compression type, where all the image information is contained in one series of 8x8 blocks.
Extended Sequential: Most used in medical imaging, this type allows for more levels per pixel.
Progressive: Information from the frequency domain is written in a series of scans, the most important values for each block coming first in the file. This allows the whole image to be rendered at a low resolution, and the details filled in as the image downloads.
Lossless: A rare encoding based on a predicted difference between the target pixel and its surroundings.

Further, there are two forms of encoding that are applied on top of the image compression, to further compress the file data:

Huffman-based entropy: The image is made into a bit-stream, with the most common values encoded as short (2- or 3-bit) stream entries, and less common values recorded as longer strings of bits.
Arithmetic: A formerly patented encoding method, where the image data is represented as a series of probabilities of the values occurring, and combined into one fractional number.

This series will implement an entropy-coded baseline JPEG decoder.

File segments

A JPEG file is made up of segments of varying length, each of which starts with a "marker" to denote which kind of segment it is. There are 254 possible types of segment, but only a few are found in the type of image we'll be decoding:

Name	Short Name	Marker	Description	Length (bytes)
Start of Image	SOI	FF D8	Delimits the start of the file	2
Define Quantisation Table	DQT	FF DB	Values used by the decoder	69
Define Huffman Table	DHT	FF C4	Values used by the decompressor	Variable
Start of Frame	SOF	FF C0	Information for an entropy-coded baseline frame	10
Start of Scan	SOS	FF DA	Encoded and compressed image bitstream	Variable
End of Image	EOI	FF D9	Delimits the end of the file	2

Table 1: Segments present in an entropy-coded baseline JPEG file

Most of the different types of segment have a "length" value just after the marker, which denotes how long the segment is in bytes (including the length value); this can be used to skip over segments that a decoder doesn't know about. There are three exceptions to this general rule:

SOI and EOI: Because these are more delimiters than markers, they consist only of the marker value.
SOS: The scan is a bitstream, and automatically "ends" when the image is fully coded. As such, there is no length written into the file for an SOS segment. There are two strategies for dealing with this: either we can assume the rest of the file is part of the scan, or we can read through the file looking for markers which would denote the start of a new segment.

For this article, I'll assume the rest of the file is part of a scan if we run into an SOS segment, and skip straight to the EOI.

Implementation: Listing the segments in a JPEG file

As a first step, it makes sense to write a program to open a JPEG file, and run through it looking for segment markers. The structure of such a program can be expanded upon with implementation for processing of the different kinds of segments, and the mechanism for skipping over segments given their size can be used later to skip over the parts of the file which are non-essential to the decoding process.

Since the sizes of values in a JPEG file are specified in absolute terms of number of bytes, it's a good idea to abstract the basic integer types into types which refer to size. For this, we'll use a short header file.

inttypes.h: Architecture-independent integer size definitions #ifndef __INTTYPES_H_ #define __INTTYPES_H_ typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef signed char s8; typedef signed short s16; typedef signed int s32; #endif//__INTTYPES_H_

The above file is set up for 32-bit compilation, but can be adapted if 64- or 16-bit code is required. The advantage of this is that references to integers in the JPEG decoder implementation itself can be agnostic of architecture, and simply refer to u16 and other types defined here.

With these abstractions in place, the implementation of a segment listing is quite simple. Since we'll be building the decoding functionality into a class, it's worth defining the class itself at this time.

jpeg.h: JPEG decoder class definition #ifndef __JPEG_H_ #define __JPEG_H_ #include "inttypes.h" #include <string> #include <vector> #include <map> #include <stdio.h> // Macro to read a 16-bit word from file #define READ_WORD() ((fgetc(fp) << 8) | fgetc(fp)) // Segment parsing error codes #define JPEG_SEG_ERR 0 #define JPEG_SEG_OK 1 #define JPEG_SEG_EOF -1 class JPEG { private: // Names of the possible segments std::string segNames[64]; // The file to be read from, opened by constructor FILE *fp; // Segment parsing dispatcher int parseSeg(); public: // Construct a JPEG object given a filename JPEG(std::string); }; #endif//__JPEG_H_ jpeg.cpp: JPEG segment listing implementation #include "jpeg.h" #include <stdlib.h> #include <string.h> #include <math.h> //------------------------------------------------------------------------- // Function: Parse JPEG file segment (parseSeg) // Purpose: Retrieves 16-bit block ID from file, shows name int JPEG::parseSeg() { if (!fp) { printf("File failed to open.\n"); return JPEG_SEG_ERR; } u32 fpos = ftell(fp); u16 id = READ_WORD(), size; if (id < 0xFFC0) { printf("Segment ID expected, not found.\n"); return JPEG_SEG_ERR; } printf( "Found segment at file position %d: %s\n", fpos, segNames[id-0xFFC0].c_str()); switch (id) { // The SOI and EOI segments are the only ones not to have // a length, and are always a fixed two bytes long; do // nothing to advance the file position case 0xFFD9: return JPEG_SEG_EOF; case 0xFFD8: break; // An SOS segment has a length determined only by the // length of the bitstream; for now, assume it's the rest // of the file less the two-byte EOI segment case 0xFFDA: fseek(fp, -2, SEEK_END); break; // Any other segment has a length specified at its start, // so skip over that many bytes of file default: size = READ_WORD(); fseek(fp, size-2, SEEK_CUR); break; } return JPEG_SEG_OK; } //------------------------------------------------------------------------- // Function: Array initialisation (constructor) // Purpose: Fill in arrays used by the decoder, decode a file // Parameters: filename (string) - File to decode JPEG::JPEG(std::string filename) { // Debug messages used by parseSeg to tell us which segment we're at segNames[0x00] = std::string("Baseline DCT; Huffman"); segNames[0x01] = std::string("Extended sequential DCT; Huffman"); segNames[0x02] = std::string("Progressive DCT; Huffman"); segNames[0x03] = std::string("Spatial lossless; Huffman"); segNames[0x04] = std::string("Huffman table"); segNames[0x05] = std::string("Differential sequential DCT; Huffman"); segNames[0x06] = std::string("Differential progressive DCT; Huffman"); segNames[0x07] = std::string("Differential spatial; Huffman"); segNames[0x08] = std::string("[Reserved: JPEG extension]"); segNames[0x09] = std::string("Extended sequential DCT; Arithmetic"); segNames[0x0A] = std::string("Progressive DCT; Arithmetic"); segNames[0x0B] = std::string("Spatial lossless; Arithmetic"); segNames[0x0C] = std::string("Arithmetic coding conditioning"); segNames[0x0D] = std::string("Differential sequential DCT; Arithmetic"); segNames[0x0E] = std::string("Differential progressive DCT; Arithmetic"); segNames[0x0F] = std::string("Differential spatial; Arithmetic"); segNames[0x10] = std::string("Restart"); segNames[0x11] = std::string("Restart"); segNames[0x12] = std::string("Restart"); segNames[0x13] = std::string("Restart"); segNames[0x14] = std::string("Restart"); segNames[0x15] = std::string("Restart"); segNames[0x16] = std::string("Restart"); segNames[0x17] = std::string("Restart"); segNames[0x18] = std::string("Start of image"); segNames[0x19] = std::string("End of image"); segNames[0x1A] = std::string("Start of scan"); segNames[0x1B] = std::string("Quantisation table"); segNames[0x1C] = std::string("Number of lines"); segNames[0x1D] = std::string("Restart interval"); segNames[0x1E] = std::string("Hierarchical progression"); segNames[0x1F] = std::string("Expand reference components"); segNames[0x20] = std::string("JFIF header"); segNames[0x21] = std::string("[Reserved: application extension]"); segNames[0x22] = std::string("[Reserved: application extension]"); segNames[0x23] = std::string("[Reserved: application extension]"); segNames[0x24] = std::string("[Reserved: application extension]"); segNames[0x25] = std::string("[Reserved: application extension]"); segNames[0x26] = std::string("[Reserved: application extension]"); segNames[0x27] = std::string("[Reserved: application extension]"); segNames[0x28] = std::string("[Reserved: application extension]"); segNames[0x29] = std::string("[Reserved: application extension]"); segNames[0x2A] = std::string("[Reserved: application extension]"); segNames[0x2B] = std::string("[Reserved: application extension]"); segNames[0x2C] = std::string("[Reserved: application extension]"); segNames[0x2D] = std::string("[Reserved: application extension]"); segNames[0x2E] = std::string("[Reserved: application extension]"); segNames[0x2F] = std::string("[Reserved: application extension]"); segNames[0x30] = std::string("[Reserved: JPEG extension]"); segNames[0x31] = std::string("[Reserved: JPEG extension]"); segNames[0x32] = std::string("[Reserved: JPEG extension]"); segNames[0x33] = std::string("[Reserved: JPEG extension]"); segNames[0x34] = std::string("[Reserved: JPEG extension]"); segNames[0x35] = std::string("[Reserved: JPEG extension]"); segNames[0x36] = std::string("[Reserved: JPEG extension]"); segNames[0x37] = std::string("[Reserved: JPEG extension]"); segNames[0x38] = std::string("[Reserved: JPEG extension]"); segNames[0x39] = std::string("[Reserved: JPEG extension]"); segNames[0x3A] = std::string("[Reserved: JPEG extension]"); segNames[0x3B] = std::string("[Reserved: JPEG extension]"); segNames[0x3C] = std::string("[Reserved: JPEG extension]"); segNames[0x3D] = std::string("[Reserved: JPEG extension]"); segNames[0x3E] = std::string("Comment"); segNames[0x3F] = std::string("[Invalid]"); // Open the requested file, keep parsing blocks until we run // out of file, then close it. fp = fopen(filename.c_str(), "rb"); if (fp) { while(parseSeg() == JPEG_SEG_OK); fclose(fp); } else { perror("JPEG"); } }

When constructed with a file, an object of this JPEG class will provide output similar to the following.

Output of JPEG segment listing Found segment at file position 0: Start of image Found segment at file position 2: JFIF header Found segment at file position 20: Quantisation table Found segment at file position 89: Quantisation table Found segment at file position 158: Baseline DCT; Huffman Found segment at file position 177: Huffman table Found segment at file position 208: Huffman table Found segment at file position 289: Huffman table Found segment at file position 318: Huffman table Found segment at file position 371: Start of scan Found segment at file position 32675: End of image

Next time: Examining the encoding of a scan

As can be seen above, the "scan" constitutes the majority of an entropy-coded baseline JPEG; since the entirety of the image data is encoded within the scan, this makes sense. Entropy coding is based on the Huffman compression algorithm, so in the next article I'll examine the parts of a JPEG file that provide the information needed to decode the scan from a bitstream into something usable for further processing.

Imran Nazar <tf@imrannazar.com>, Jan 2013.