Talking to web interfaces with SOAP has been made very simple in PHP, thanks to the inclusion of a SOAP module in the standard PHP 5 build. Having the functionality of a SOAP client built-in means it's very quick and easy to call an interface through either the raw RPC or the provided WSDL, which can return XML or plain text values.
The trouble starts to come in when the interface wishes to return a complicated result: a file that isn't XML or plain text, or a series of files in the same return package. There are various encodings used in the transfer of complex SOAP results; one of the more common is Direct Internet Message Encapsulation, or DIME.
The DIME Format
DIME is essentially a wrapper over MIME, allowing multiple MIME parts to be sent in one package. The format was developed by Microsoft as a draft standard, and was adopted by a good number of SOAP interfaces before the official data standard was drawn up. The concept of the format is very simple: a series of files, either XML or binary data, with a short header on each file.
Each part can be marked as the first and/or last part of the message, and the type of data it contains can also be marked as XML or binary. Being a Microsoft standard, the definition of the header for each part involves bitfields and binary fiddling. There's also scope provided for extensions to the DIME format, but none of these were ever defined, so you're unlikely to find any messages with options filled in.
Field | Length | Description |
---|---|---|
Version | 5 bits | DIME format version (always 1) |
First Record | 1 bit | Set if this is the first part in the message |
Last Record | 1 bit | Set if this is the last part in the message |
Chunk Record | 1 bit | This file is broken into chunked parts |
Type Format | 4 bits | Type of file in the part (1 for binary data, 2 for XML) |
Reserved | 4 bits | (Classic Microsoft) |
The following fields are big-endian numbers | ||
Options Length | 2 bytes | Length of the "options" field |
ID Length | 2 bytes | Length of the "ID" or "name" field |
Type Length | 2 bytes | Length of the "type" field |
Data Length | 4 bytes | Size of the included file |
The following fields are variable-length, and padded to the next 4-byte boundary | ||
Options | Part-specific option data, if any is defined (safely answer "no") | |
ID | Name of the file/part | |
Type | If typeformat is 1: MIME type of the part data If typeformat is 2: URI of the DTD file for the XML enclosed | |
Data | The file |
As you can see, it's possible for a DIME message to contain only one part, by marking it as both the first and the last. Each part follows directly on from the last, so it's easy enough to run through a DIME message in a loop, working out the position of each part by finding out where you end up after adding up the sizes of the four sections and the header (which is 12 bytes).
One complication of the format is that each section in a part (options, ID, type, data) is padded, so that it takes up an even multiple of 4 bytes; this is generally done by filling the gap with "0" bytes. For example, if the type of the file was given as "text/html", you'd end up with the following in the message:
Padded "type" field
74 65 78 74 2F 68 74 6D 6C00 00 00 -- text/html
The green area above is the field data itself, defined by the header as 9 bytes long. The next multiple of 4 from there is 12, so three bytes of padding are added to push the field to an even boundary; these bytes are not counted as part of the field data.
Reading DIME messages in PHP
Using the structure pattern in PHP, it's quite a simple endeavour to build a class capable of reading in DIME messages and extracting the parts. The basis of this is the DIMERecord structure.
DIMERecord: Structure holding information about a DIME part
class DIMERecord { public $version; public $first; public $last; public $chunked; public $type_format; public $options; public $id; public $type; public $data; }
Filling in this structure can be done from another class, acting as the DIME parser itself. It's this class which holds the array of DIMERecords referencing the parts.
DIME: Building the record array
class DIME { const TYPE_BINARY = 1; const TYPE_XML = 2; public $records; function __construct($input) { $this->records = array(); $pos = 0; do { $r = new DIMERecord;// Shift out bitfields for the first fields$b = ord($input[$pos++]); $r->version = ($b>>3) & 31; $r->first = ($b>>2) & 1; $r->last = ($b>>1) & 1; $r->chunked = $b & 1; $r->type_format = (ord($input[$pos++]) >> 4) & 15;// Fetch big-endian lengths$lengths = array(); $lengths['options'] = ord($input[$pos++]) << 8; $lengths['options'] |= ord($input[$pos++]); $lengths['id'] = ord($input[$pos++]) << 8; $lengths['id'] |= ord($input[$pos++]); $lengths['type'] = ord($input[$pos++]) << 8; $lengths['type'] |= ord($input[$pos++]); $lengths['data'] = ord($input[$pos++]) << 24; $lengths['data'] |= (ord($input[$pos++]) << 16); $lengths['data'] |= (ord($input[$pos++]) << 8); $lengths['data'] |= ord($input[$pos++]);// Read in padded dataforeach($lengths as $lk => $lv) { $r->$lk = substr($input, $pos, $lv); $pos += $lv; if($lv & 3) $pos += (4-($lv & 3)); } $this->records[] = $r; } while($pos < strlen($input)); } }
Chunking: Breaking up files across parts
The DIME standard also accommodates the ability to break up a file across multiple parts, in case the client or server don't have the processing power to fill out a header for a big file all at once. Parsing a chunked file from its parts involves checking the "chunked" bit for the part being checked, and the part before it, making a decision based on the values:
This part chunked? | Previous part chunked? | Action |
---|---|---|
No | No | This is a normal file; save |
Yes | No | This is the first chunk part; start a data buffer |
Yes | Yes | This is a continuation chunk; append to the data buffer |
No | Yes | This is the last chunk part; append to the data buffer and save |
The type and id for the file are taken from the first chunk; any chunks after that have these fields set to zero, and have to be ignored. Implementing chunking involves extending the parser function, so that it holds a series of files as well as a series of records.
DIMEFile: Unchunking files
class DIMEFile { public $type_format; public $type; public $id; public $data; } class DIME { const TYPE_BINARY = 1; const TYPE_XML = 2; public $records; public $files; function __construct($input) { $this->records = array(); $pos = 0;// Break out parts from the message stringdo { $r = new DIMERecord;// Shift out bitfields for the first fields$b = ord($input[$pos++]); $r->version = ($b>>3) & 31; $r->first = ($b>>2) & 1; $r->last = ($b>>1) & 1; $r->chunked = $b & 1; $r->type_format = (ord($input[$pos++]) >> 4) & 15;// Fetch big-endian lengths$lengths = array(); $lengths['options'] = ord($input[$pos++]) << 8; $lengths['options'] |= ord($input[$pos++]); $lengths['id'] = ord($input[$pos++]) << 8; $lengths['id'] |= ord($input[$pos++]); $lengths['type'] = ord($input[$pos++]) << 8; $lengths['type'] |= ord($input[$pos++]); $lengths['data'] = ord($input[$pos++]) << 24; $lengths['data'] |= (ord($input[$pos++]) << 16); $lengths['data'] |= (ord($input[$pos++]) << 8); $lengths['data'] |= ord($input[$pos++]);// Read in padded dataforeach($lengths as $lk => $lv) { $r->$lk = substr($input, $pos, $lv); $pos += $lv; if($lv & 3) $pos += (4-($lv & 3)); } $this->records[] = $r; } while($pos < strlen($input));// Unchunk records into files, as required$previous_chunk = 0; foreach($this->records as $r) { if(!$r->chunked) { if(!$previous_chunk) {// Normal part$f = new DIMEFile; $f->type_format = $r->type_format; $f->type = $r->type; $f->id = $r->id; $f->data = $r->data; $this->files[] = $f; } else {// Final chunk$f->data .= $r->data; $this->files[] = $f; } } else { if(!$previous_chunk) {// First chunk$f = new DIMEFile; $f->type_format = $r->type_format; $f->type = $r->type; $f->id = $r->id; $f->data = $r->data; } else {// Continuation$f->data .= $r->data; } } $previous_chunk = $r->chunked; } } }
Example: Requesting a Jasper report
The JasperServer reporting service uses SOAP to allow requests for reports, and a DIME-encoded message to return the status message XML and the report itself as one result. The details for our example JasperServer are as follows:
WSDL URI | http://localhost:8080/jasperserver/services/repository?wsdl |
Namespace | http://www.jaspersoft.com/namespaces/php |
Request | runReport |
Report URI | /reports/inventory_list |
Using these access details, and passing them through PHP's native SOAP client, it's a simple matter to retrieve the DIME-encoded return message.
SOAP client code to retrieve the report
$request = '<?xml version="1.0" encoding="UTF-8"?> <request operationName="runReport" locale="en"> <argument name="RUN_OUTPUT_FORMAT">XLS</argument> <argument name="USE_DIME_ATTACHMENTS"><![CDATA[1]]></argument> <resourceDescriptor name="" wsType="reportUnit" uriString="/reports/inventory_list" isNew="false"> <label></label> </resourceDescriptor> </request>'; $c = new SoapClient( 'http://localhost:8080/jasperserver/services/repository?wsdl', array('trace' => true)); try { $c->__soapCall( 'runReport', array('request' => $request), array('namespace' => http://www.jaspersoft.com/namespaces/php)); } catch(SoapFault $cf) {// A DIME-encoded message has no text, generating an exception // Parse out the traced response, and get the file from there // Response should be one XML file, and one binary$dp = new DIME($c->__getLastResponse()); foreach($dp->files as $f) { if($f->type_format == DIME::TYPE_BINARY) { header('Content-type: '.$f->type); header('Content-disposition: attachment; filename="'.$f->id.'"'); echo $f->data; } } }
That's how you can use the DIME parser I've introduced here, to pull data out of a DIME-encoded SOAP response. As can be seen from the sample invocation here, all that's needed is to make a new DIME object from the message string, and check the array of files that's generated as a result.
Imran Nazar <tf@imrannazar.com>, Jul 2009