Many file formats contain data in a packed manner: a series of values which encode information, placed into the file one after another. As an example, TrueType font files encode information about the font represented in the file, such as the name of the font and how many characters are contained within the font face. As another example, Windows BMP files encode the dimensions and formatting of the image within the format.

The Need: Structures in C/C++

In most instances, the information in a file is encoded as a series of structures, grouping related information such that a programming interface can retrieve them easily. In C and C++, a special type is set aside for just such a reason: the struct.

BMP header structure, in C

typedef struct { unsigned char r; /* Red */ unsigned char g; /* Green */ unsigned char b; /* Blue */ unsigned char reserved; } RGBQUAD;

The above is the C representation of one palette entry in a Windows BMP. Once this structure has been defined as a type with typedef, using it is very simple:

Using a structure, in C

RGBQUAD palette[256]; fread(palette, sizeof(RGBQUAD), 256, file_handle); printf("Colour #0 is %02X%02X%02X.\n", palette[0].r, palette[0].g, palette[0].b);

It can be seen above that the data contained within a struct can be accessed in much the same way as methods can be accessed within a class, in C++ or any other object-oriented programming language. Indeed, in C++ the keywords class and struct are equivalent, and mean much the same thing.

The Problem: Structures in PHP

When using languages such as PHP, a problem arises: PHP does not support a native struct type. Further, since PHP is a loosely-typed language, it's not possible to read data from an encoded file format directly into PHP and manipulate it. This can be alleviated by using PHP's class keyword to build a class containing the structure members:

A PHP class with structure members

class RGBQUAD { // Byte size of the structure const SIZE = 16; // Structure members public $r; public $g; public $b; public $reserved; // Initialise members given packed data public function __construct($data) { if($data) { list($this->r, $this->g, $this->b, $this->reserved) = unpack('v4', $data); } } }

This representation of the structure makes use of the unpack function, to take a string of binary data and load it into the class members. This can be used in a similar fashion to the C representation:

Using a structure, in PHP

$palette = array(); for($i=0; $i<256; $i++) $palette[$i] = new RGBQUAD(fread($file_handle, RGBQUAD::SIZE)); printf("Colour #0 is %02X%02X%02X.\n", $palette[0]->r, $palette[0]->g, $palette[0]->b);

This approach has distinct advantages over the C struct type. In particular, the PHP implementation is a class, and can contain methods other than the simple constructor; furthermore, complex types can be contained within the structure in a way that C cannot accomplish.

Advanced Usage: TrueType File Format

An example of complex usage of the structure pattern is in parsing of the TrueType file format. A TrueType file defines a vector or bitmap font face, and contains a series of data tables: a table of names, a table of Windows-specific font information, and tables of glyph definitions. Also contained in the file structure is a header defining the table "directory", which allows a parser to find these tables within the file.

A TrueType file begins with information about the version of the TrueType specification, followed by the table directory. This can be represented in PHP in a simple manner:

TrueType file header and table directory, in PHP

// File header. This contains the number of tables in the TTF. class ttfHeader { const SIZE = 12; public $majorVersion; public $minorVersion; public $tableCount; public $searchRange; public $entrySelector; public $rangeShift; public $tableDirectory; public function __construct($file) { $this->tableDirectory = array(); $header = fread($file, self::SIZE); list($t, $this->majorVersion, $this->minorVersion, $this->tableCount, $this->searchRange, $this->entrySelector, $this->rangeShift) = unpack('n*', $header); for($i=0; $i<$this->tableCount; $i++) { $this->tableDirectory[$i] = new ttfTableDirectoryEntry(fread($file, ttfTableDirectoryEntry::SIZE)); } } } // Table directory. Describes the location and size of a tables in the TTF. class ttfTableDirectoryEntry { const SIZE = 16; public $tag; public $checksum; public $offset; public $length; public function __construct($data) { list($t, $tag, $this->checksum, $this->offset, $this->length) = unpack('N*', $data); // Build a string tag from the numeric value $this->tag = chr(($tag>>24)&255). chr(($tag>>16)&255). chr(($tag>> 8)&255). chr(($tag>> 0)&255); } }

This example shows how it's possible for a structure to contain more information than the sum of its members. In the case of the ttfHeader, the constructor can pull in the structure of an entry in the table directory, and build its own array to represent the directory.

All in all, the Structure pattern makes it possible for PHP to represent structures in a similar fashion to C structs; it also allows PHP to be more versatile, and represent more complex information related to the data, which would normally have to be held outside the structure.

© Imran Nazar <tf@oopsilon.com>, 2008

Article dated: 20th Jul 2008

Get the RSS feed