Parsing the DIME Message Format

Display mode

Back to Articles

Talking to web interfaces with SOAP has been made very simple in PHP, thanks to the inclusion of a SOAP module in the standard PHP 5 build. Having the functionality of a SOAP client built-in means it's very quick and easy to call an interface through either the raw RPC or the provided WSDL, which can return XML or plain text values.

The trouble starts to come in when the interface wishes to return a complicated result: a file that isn't XML or plain text, or a series of files in the same return package. There are various encodings used in the transfer of complex SOAP results; one of the more common is Direct Internet Message Encapsulation, or DIME.

The DIME Format

DIME is essentially a wrapper over MIME, allowing multiple MIME parts to be sent in one package. The format was developed by Microsoft as a draft standard, and was adopted by a good number of SOAP interfaces before the official data standard was drawn up. The concept of the format is very simple: a series of files, either XML or binary data, with a short header on each file.

Each part can be marked as the first and/or last part of the message, and the type of data it contains can also be marked as XML or binary. Being a Microsoft standard, the definition of the header for each part involves bitfields and binary fiddling. There's also scope provided for extensions to the DIME format, but none of these were ever defined, so you're unlikely to find any messages with options filled in.

FieldLengthDescription
Version5 bitsDIME format version (always 1)
First Record1 bitSet if this is the first part in the message
Last Record1 bitSet if this is the last part in the message
Chunk Record1 bitThis file is broken into chunked parts
Type Format4 bitsType of file in the part (1 for binary data, 2 for XML)
Reserved4 bits(Classic Microsoft)
The following fields are big-endian numbers
Options Length2 bytesLength of the "options" field
ID Length2 bytesLength of the "ID" or "name" field
Type Length2 bytesLength of the "type" field
Data Length4 bytesSize of the included file
The following fields are variable-length, and padded to the next 4-byte boundary
OptionsPart-specific option data, if any is defined (safely answer "no")
IDName of the file/part
TypeIf typeformat is 1: MIME type of the part data
If typeformat is 2: URI of the DTD file for the XML enclosed
DataThe file
Table 1: DIME part header

As you can see, it's possible for a DIME message to contain only one part, by marking it as both the first and the last. Each part follows directly on from the last, so it's easy enough to run through a DIME message in a loop, working out the position of each part by finding out where you end up after adding up the sizes of the four sections and the header (which is 12 bytes).

One complication of the format is that each section in a part (options, ID, type, data) is padded, so that it takes up an even multiple of 4 bytes; this is generally done by filling the gap with "0" bytes. For example, if the type of the file was given as "text/html", you'd end up with the following in the message:

Padded "type" field

74 65 78 74 2F 68 74 6D 6C 00 00 00 -- text/html

The green area above is the field data itself, defined by the header as 9 bytes long. The next multiple of 4 from there is 12, so three bytes of padding are added to push the field to an even boundary; these bytes are not counted as part of the field data.

Reading DIME messages in PHP

Using the structure pattern in PHP, it's quite a simple endeavour to build a class capable of reading in DIME messages and extracting the parts. The basis of this is the DIMERecord structure.

DIMERecord: Structure holding information about a DIME part

class DIMERecord
{
    public $version;
    public $first;
    public $last;
    public $chunked;
    public $type_format;
    public $options;
    public $id;
    public $type;
    public $data;
}

Filling in this structure can be done from another class, acting as the DIME parser itself. It's this class which holds the array of DIMERecords referencing the parts.

DIME: Building the record array

class DIME
{
    const TYPE_BINARY = 1;
    const TYPE_XML = 2;

    public $records;

    function __construct($input)
    {
	$this->records = array();
	$pos = 0;

	do
	{
	    $r = new DIMERecord;

	    // Shift out bitfields for the first fields
	    $b = ord($input[$pos++]);
	    $r->version     = ($b>>3) & 31;
	    $r->first       = ($b>>2) & 1;
	    $r->last        = ($b>>1) & 1;
	    $r->chunked     = $b & 1;
	    $r->type_format = (ord($input[$pos++]) >> 4) & 15;

	    // Fetch big-endian lengths
            $lengths = array();
            $lengths['options']  = ord($input[$pos++]) << 8;
            $lengths['options'] |= ord($input[$pos++]);

            $lengths['id']  = ord($input[$pos++]) << 8;
            $lengths['id'] |= ord($input[$pos++]);

            $lengths['type']  = ord($input[$pos++]) << 8;
            $lengths['type'] |= ord($input[$pos++]);

            $lengths['data']  = ord($input[$pos++]) << 24;
            $lengths['data'] |= (ord($input[$pos++]) << 16);
            $lengths['data'] |= (ord($input[$pos++]) << 8);
            $lengths['data'] |= ord($input[$pos++]);

	    // Read in padded data
	    foreach($lengths as $lk => $lv)
	    {
	        $r->$lk = substr($input, $pos, $lv);
		$pos += $lv;

		if($lv & 3)
		    $pos += (4-($lv & 3));
	    }

	    $this->records[] = $r;
	}
	while($pos < strlen($input));
    }
}

Chunking: Breaking up files across parts

The DIME standard also accommodates the ability to break up a file across multiple parts, in case the client or server don't have the processing power to fill out a header for a big file all at once. Parsing a chunked file from its parts involves checking the "chunked" bit for the part being checked, and the part before it, making a decision based on the values:

This part chunked?Previous part chunked?Action
NoNoThis is a normal file; save
YesNoThis is the first chunk part; start a data buffer
YesYesThis is a continuation chunk; append to the data buffer
NoYesThis is the last chunk part; append to the data buffer and save
Table 2: Actions taken for a chunked part

The type and id for the file are taken from the first chunk; any chunks after that have these fields set to zero, and have to be ignored. Implementing chunking involves extending the parser function, so that it holds a series of files as well as a series of records.

DIMEFile: Unchunking files

class DIMEFile
{
    public $type_format;
    public $type;
    public $id;
    public $data;
}

class DIME
{
    const TYPE_BINARY = 1;
    const TYPE_XML = 2;

    public $records;
    public $files;

    function __construct($input)
    {
	$this->records = array();
	$pos = 0;

        // Break out parts from the message string
	do
	{
	    $r = new DIMERecord;

	    // Shift out bitfields for the first fields
	    $b = ord($input[$pos++]);
	    $r->version     = ($b>>3) & 31;
	    $r->first       = ($b>>2) & 1;
	    $r->last        = ($b>>1) & 1;
	    $r->chunked     = $b & 1;
	    $r->type_format = (ord($input[$pos++]) >> 4) & 15;

	    // Fetch big-endian lengths
            $lengths = array();
            $lengths['options']  = ord($input[$pos++]) << 8;
            $lengths['options'] |= ord($input[$pos++]);

            $lengths['id']  = ord($input[$pos++]) << 8;
            $lengths['id'] |= ord($input[$pos++]);

            $lengths['type']  = ord($input[$pos++]) << 8;
            $lengths['type'] |= ord($input[$pos++]);

            $lengths['data']  = ord($input[$pos++]) << 24;
            $lengths['data'] |= (ord($input[$pos++]) << 16);
            $lengths['data'] |= (ord($input[$pos++]) << 8);
            $lengths['data'] |= ord($input[$pos++]);

	    // Read in padded data
	    foreach($lengths as $lk => $lv)
	    {
	        $r->$lk = substr($input, $pos, $lv);
		$pos += $lv;

		if($lv & 3)
		    $pos += (4-($lv & 3));
	    }

	    $this->records[] = $r;
	}
	while($pos < strlen($input));

        // Unchunk records into files, as required
	$previous_chunk = 0;
	foreach($this->records as $r)
	{
	    if(!$r->chunked)
	    {
	        if(!$previous_chunk)
		{
		    // Normal part
		    $f = new DIMEFile;
		    $f->type_format = $r->type_format;
		    $f->type        = $r->type;
		    $f->id          = $r->id;
		    $f->data        = $r->data;

		    $this->files[] = $f;
		}
		else
		{
		    // Final chunk
		    $f->data .= $r->data;
                    $this->files[] = $f;
		}
	    }
	    else
	    {
	        if(!$previous_chunk)
		{
		    // First chunk
		    $f = new DIMEFile;
		    $f->type_format = $r->type_format;
		    $f->type        = $r->type;
		    $f->id          = $r->id;
		    $f->data        = $r->data;
		}
		else
		{
		    // Continuation
		    $f->data .= $r->data;
		}
	    }
	    $previous_chunk = $r->chunked;
	}
    }
}

Example: Requesting a Jasper report

The JasperServer reporting service uses SOAP to allow requests for reports, and a DIME-encoded message to return the status message XML and the report itself as one result. The details for our example JasperServer are as follows:

WSDL URIhttp://localhost:8080/jasperserver/services/repository?wsdl
Namespacehttp://www.jaspersoft.com/namespaces/php
RequestrunReport
Report URI/reports/inventory_list
Table 3: SOAP access details for an example JasperServer

Using these access details, and passing them through PHP's native SOAP client, it's a simple matter to retrieve the DIME-encoded return message.

SOAP client code to retrieve the report

$request =
'<?xml version="1.0" encoding="UTF-8"?>
<request operationName="runReport" locale="en">
 <argument name="RUN_OUTPUT_FORMAT">XLS</argument>
 <argument name="USE_DIME_ATTACHMENTS"><![CDATA[1]]></argument>
 <resourceDescriptor name="" wsType="reportUnit" uriString="/reports/inventory_list" isNew="false">
  <label></label>
 </resourceDescriptor>
</request>';

$c = new SoapClient(
    'http://localhost:8080/jasperserver/services/repository?wsdl',
    array('trace' => true));

try
{
    $c->__soapCall(
        'runReport',
        array('request' => $request),
	array('namespace' => http://www.jaspersoft.com/namespaces/php));
}
catch(SoapFault $cf)
{
    // A DIME-encoded message has no text, generating an exception
    // Parse out the traced response, and get the file from there
    // Response should be one XML file, and one binary

    $dp = new DIME($c->__getLastResponse());
    foreach($dp->files as $f)
    {
        if($f->type_format == DIME::TYPE_BINARY)
	{
	    header('Content-type: '.$f->type);
	    header('Content-disposition: attachment; filename="'.$f->id.'"');
	    echo $f->data;
	}
    }
}

That's how you can use the DIME parser I've introduced here, to pull data out of a DIME-encoded SOAP response. As can be seen from the sample invocation here, all that's needed is to make a new DIME object from the message string, and check the array of files that's generated as a result.

Imran Nazar <tf@imrannazar.com>, Jul 2009