Let's Build a JPEG Decoder: File Structure

Display mode

Back to Let's Build a JPEG Decoder

This article is part two of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Download the code: http://imrannazar.com/content/files/jpegparse.zip

In the previous part, I gave a brief overview of the techniques used by JPEG to compress an image. Before examining the detailed implementation of those techniques, it's useful to look at the overall structure of a JPEG file, for two reasons:

The implementation developed in this series of articles will be written in C++, but the constructs can be transplanted to a language of your choice with little additional complexity.

Types of JPEG image

It should be stated at this juncture that the implementation developed here will only apply to one common subset of all the possible types of JPEG image. Firstly, there are four types of compression supported by the standard:

Further, there are two forms of encoding that are applied on top of the image compression, to further compress the file data:

This series will implement an entropy-coded baseline JPEG decoder.

File segments

A JPEG file is made up of segments of varying length, each of which starts with a "marker" to denote which kind of segment it is. There are 254 possible types of segment, but only a few are found in the type of image we'll be decoding:

NameShort NameMarkerDescriptionLength (bytes)
Start of ImageSOIFF D8Delimits the start of the file2
Define Quantisation TableDQTFF DBValues used by the decoder69
Define Huffman TableDHTFF C4Values used by the decompressorVariable
Start of FrameSOFFF C0Information for an entropy-coded baseline frame10
Start of ScanSOSFF DAEncoded and compressed image bitstreamVariable
End of ImageEOIFF D9Delimits the end of the file2
Table 1: Segments present in an entropy-coded baseline JPEG file

Most of the different types of segment have a "length" value just after the marker, which denotes how long the segment is in bytes (including the length value); this can be used to skip over segments that a decoder doesn't know about. There are three exceptions to this general rule:

For this article, I'll assume the rest of the file is part of a scan if we run into an SOS segment, and skip straight to the EOI.

Implementation: Listing the segments in a JPEG file

As a first step, it makes sense to write a program to open a JPEG file, and run through it looking for segment markers. The structure of such a program can be expanded upon with implementation for processing of the different kinds of segments, and the mechanism for skipping over segments given their size can be used later to skip over the parts of the file which are non-essential to the decoding process.

Since the sizes of values in a JPEG file are specified in absolute terms of number of bytes, it's a good idea to abstract the basic integer types into types which refer to size. For this, we'll use a short header file.

inttypes.h: Architecture-independent integer size definitions

#ifndef __INTTYPES_H_
#define __INTTYPES_H_

typedef unsigned char  u8;
typedef unsigned short u16;
typedef unsigned int   u32;

typedef signed char  s8;
typedef signed short s16;
typedef signed int   s32;

#endif//__INTTYPES_H_

The above file is set up for 32-bit compilation, but can be adapted if 64- or 16-bit code is required. The advantage of this is that references to integers in the JPEG decoder implementation itself can be agnostic of architecture, and simply refer to u16 and other types defined here.

With these abstractions in place, the implementation of a segment listing is quite simple. Since we'll be building the decoding functionality into a class, it's worth defining the class itself at this time.

jpeg.h: JPEG decoder class definition

#ifndef __JPEG_H_
#define __JPEG_H_

#include "inttypes.h"
#include <string>
#include <vector>
#include <map>
#include <stdio.h>

// Macro to read a 16-bit word from file
#define READ_WORD() ((fgetc(fp) << 8) | fgetc(fp))

// Segment parsing error codes
#define JPEG_SEG_ERR  0
#define JPEG_SEG_OK   1
#define JPEG_SEG_EOF -1

class JPEG {
    private:
        // Names of the possible segments
        std::string segNames[64];

        // The file to be read from, opened by constructor
        FILE *fp;

        // Segment parsing dispatcher
        int parseSeg();

    public:
        // Construct a JPEG object given a filename
        JPEG(std::string);
};

#endif//__JPEG_H_

jpeg.cpp: JPEG segment listing implementation

#include "jpeg.h"
#include <stdlib.h>
#include <string.h>
#include <math.h>

//-------------------------------------------------------------------------
// Function: Parse JPEG file segment (parseSeg)
// Purpose: Retrieves 16-bit block ID from file, shows name

int JPEG::parseSeg()
{
    if (!fp) {
        printf("File failed to open.\n");
        return JPEG_SEG_ERR;
    }

    u32 fpos = ftell(fp);
    u16 id = READ_WORD(), size;
    if (id < 0xFFC0)
    {   
        printf("Segment ID expected, not found.\n");
        return JPEG_SEG_ERR;
    }

    printf(
        "Found segment at file position %d: %s\n",
        fpos, segNames[id-0xFFC0].c_str());

    switch (id) {
        // The SOI and EOI segments are the only ones not to have
        // a length, and are always a fixed two bytes long; do
        // nothing to advance the file position
        case 0xFFD9:
            return JPEG_SEG_EOF;
        case 0xFFD8:
            break;

        // An SOS segment has a length determined only by the
        // length of the bitstream; for now, assume it's the rest
        // of the file less the two-byte EOI segment
        case 0xFFDA:
            fseek(fp, -2, SEEK_END);
            break;

        // Any other segment has a length specified at its start,
        // so skip over that many bytes of file
        default:
            size = READ_WORD();
            fseek(fp, size-2, SEEK_CUR);
            break;
    }

    return JPEG_SEG_OK;
}

//-------------------------------------------------------------------------
// Function: Array initialisation (constructor)
// Purpose: Fill in arrays used by the decoder, decode a file
// Parameters: filename (string) - File to decode

JPEG::JPEG(std::string filename)
{   
    // Debug messages used by parseSeg to tell us which segment we're at
    segNames[0x00] = std::string("Baseline DCT; Huffman");
    segNames[0x01] = std::string("Extended sequential DCT; Huffman");
    segNames[0x02] = std::string("Progressive DCT; Huffman");
    segNames[0x03] = std::string("Spatial lossless; Huffman");
    segNames[0x04] = std::string("Huffman table");
    segNames[0x05] = std::string("Differential sequential DCT; Huffman");
    segNames[0x06] = std::string("Differential progressive DCT; Huffman");
    segNames[0x07] = std::string("Differential spatial; Huffman");
    segNames[0x08] = std::string("[Reserved: JPEG extension]");
    segNames[0x09] = std::string("Extended sequential DCT; Arithmetic");
    segNames[0x0A] = std::string("Progressive DCT; Arithmetic");
    segNames[0x0B] = std::string("Spatial lossless; Arithmetic");
    segNames[0x0C] = std::string("Arithmetic coding conditioning");
    segNames[0x0D] = std::string("Differential sequential DCT; Arithmetic");
    segNames[0x0E] = std::string("Differential progressive DCT; Arithmetic");
    segNames[0x0F] = std::string("Differential spatial; Arithmetic");
    segNames[0x10] = std::string("Restart");
    segNames[0x11] = std::string("Restart");
    segNames[0x12] = std::string("Restart");
    segNames[0x13] = std::string("Restart");
    segNames[0x14] = std::string("Restart");
    segNames[0x15] = std::string("Restart");
    segNames[0x16] = std::string("Restart");
    segNames[0x17] = std::string("Restart");
    segNames[0x18] = std::string("Start of image");
    segNames[0x19] = std::string("End of image");
    segNames[0x1A] = std::string("Start of scan");
    segNames[0x1B] = std::string("Quantisation table");
    segNames[0x1C] = std::string("Number of lines");
    segNames[0x1D] = std::string("Restart interval");
    segNames[0x1E] = std::string("Hierarchical progression");
    segNames[0x1F] = std::string("Expand reference components");
    segNames[0x20] = std::string("JFIF header");
    segNames[0x21] = std::string("[Reserved: application extension]");
    segNames[0x22] = std::string("[Reserved: application extension]");
    segNames[0x23] = std::string("[Reserved: application extension]");
    segNames[0x24] = std::string("[Reserved: application extension]");
    segNames[0x25] = std::string("[Reserved: application extension]");
    segNames[0x26] = std::string("[Reserved: application extension]");
    segNames[0x27] = std::string("[Reserved: application extension]");
    segNames[0x28] = std::string("[Reserved: application extension]");
    segNames[0x29] = std::string("[Reserved: application extension]");
    segNames[0x2A] = std::string("[Reserved: application extension]");
    segNames[0x2B] = std::string("[Reserved: application extension]");
    segNames[0x2C] = std::string("[Reserved: application extension]");
    segNames[0x2D] = std::string("[Reserved: application extension]");
    segNames[0x2E] = std::string("[Reserved: application extension]");
    segNames[0x2F] = std::string("[Reserved: application extension]");
    segNames[0x30] = std::string("[Reserved: JPEG extension]");
    segNames[0x31] = std::string("[Reserved: JPEG extension]");
    segNames[0x32] = std::string("[Reserved: JPEG extension]");
    segNames[0x33] = std::string("[Reserved: JPEG extension]");
    segNames[0x34] = std::string("[Reserved: JPEG extension]");
    segNames[0x35] = std::string("[Reserved: JPEG extension]");
    segNames[0x36] = std::string("[Reserved: JPEG extension]");
    segNames[0x37] = std::string("[Reserved: JPEG extension]");
    segNames[0x38] = std::string("[Reserved: JPEG extension]");
    segNames[0x39] = std::string("[Reserved: JPEG extension]");
    segNames[0x3A] = std::string("[Reserved: JPEG extension]");
    segNames[0x3B] = std::string("[Reserved: JPEG extension]");
    segNames[0x3C] = std::string("[Reserved: JPEG extension]");
    segNames[0x3D] = std::string("[Reserved: JPEG extension]");
    segNames[0x3E] = std::string("Comment");
    segNames[0x3F] = std::string("[Invalid]");

    // Open the requested file, keep parsing blocks until we run
    // out of file, then close it.
    fp = fopen(filename.c_str(), "rb");
    if (fp) {
        while(parseSeg() == JPEG_SEG_OK);
        fclose(fp);
    }
    else {
        perror("JPEG");
    }
}

When constructed with a file, an object of this JPEG class will provide output similar to the following.

Output of JPEG segment listing

Found segment at file position 0: Start of image
Found segment at file position 2: JFIF header
Found segment at file position 20: Quantisation table
Found segment at file position 89: Quantisation table
Found segment at file position 158: Baseline DCT; Huffman
Found segment at file position 177: Huffman table
Found segment at file position 208: Huffman table
Found segment at file position 289: Huffman table
Found segment at file position 318: Huffman table
Found segment at file position 371: Start of scan
Found segment at file position 32675: End of image

Next time: Examining the encoding of a scan

As can be seen above, the "scan" constitutes the majority of an entropy-coded baseline JPEG; since the entirety of the image data is encoded within the scan, this makes sense. Entropy coding is based on the Huffman compression algorithm, so in the next article I'll examine the parts of a JPEG file that provide the information needed to decode the scan from a bitstream into something usable for further processing.

Imran Nazar <tf@imrannazar.com>, Jan 2013.