Slow-Roll Livecoding: Proxying a REST API with PHP

Display mode

Back to Articles

Now and then I run into a situation where I'd like to access a REST API from Postman (or one of its less account-encumbered alternatives for API testing) running locally, but the remote is firewalled and the API provider has only added production and staging IPs to the allowlist. This might be a familiar situation to integration developers, and it leaves you with only a few options:

Testing from the server
As production and staging are in the allowlist, you can simply connect to those IPs and perform tests from there. This does, however, presume a couple of things:
  • You have access to those servers: A fly-by-night webshop may hand out root access to the server machines and allow their integration developers to hack directly thereupon, but a more organised place may have access restrictions and only allow their infrastructure team direct access.
  • Those servers exist: An increasingly common setup is for APIs and integrations to run on ephemeral servers (often referred to as "serverless" architecture) with Networking Magic to allow them to present as a fixed IP; in this case there's no machine to connect to in order to perform tests.
Forwarding the remote
Another option is to use NAT-based forwarding to expose the remote API: for example, if the third-party REST API is served from https://their-end.io/api you can use NAT to expose this as https://your-end.com:8443/api by forwarding connections on port 8443 to their-end.io port 443.
This does, however, involve roping in the infrastructure people to either perform iptables-based incantations to set up the NAT, or even more arcane spells to set up containerised NAT. Direct forwarding of the connection also means you lose the opportunity to log usage of the remote API, if you were seeking to keep track of who's connecting.
Proxying the remote
Perhaps the most flexible option is to proxy the remote API: to present an interface to the interface. This allows us to access the API from locations outside the allowlist, while giving us the chance to log access if so required, and has the advantage of being possible to set up in two ways: either through a dedicated proxying package like nginx-proxy, or through code-level changes only.

Having explored the options, here we'll be looking at a code-level proxy written in PHP.

What we're looking to proxy

Our example remote API to which we'd like access is fairly standard as REST goes, offering the following operations:

We'll be using the curl extension to PHP, which is bundled by default, to send the request on behalf of what was received by our proxy, and to print out what was returned. Let's look at how that might work, as a first cut:

First cut at a proxy script

define('REMOTE_URL', 'https://their-end.io/api/');
define('OUR_URL', 'https://your-end.com/apiproxy.php');

$params = $_GET;
$curl_params = [
  CURLOPT_URL            => REMOTE_URL . '?' . http_build_query($params),
  CURLOPT_TIMEOUT        => 30,
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_SSL_VERIFYHOST => true,
  CURLOPT_SSL_VERIFYPEER => true,
  CURLOPT_CUSTOMREQUEST  => $_SERVER['REQUEST_METHOD'],
];

// Anything with a POSTed body needs the body transferring
if ($_SERVER['REQUEST_METHOD'] !== 'GET') {
  $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input');
}

$c = curl_init();
curl_setopt_array($c, $curl_params);
$r = curl_exec($c);
curl_close($c);
echo $r;

If we send a request to our apiproxy.php, we might get back something like this:

Output of our first-cut

$ curl https://your-end.com/apiproxy.php
<html><body>
<h1>Error 404 Not Found</h1>
</body></html>

So that didn't work so well; we've got two problems here. Our request went to /api at the remote end, but we need some additional qualifier to get the request to /api/accounts, ideally as part of the URL itself. Additionally, our request to the remote came back with a 404 error, but we just printed the output: that means our proxy script returned a 200, and discarded any other headers that may have come back.

Passing the URL with Rewrite

This is where we cheat a little: our proxy ends up needing a small amount of webserver configuration as well as the PHP script. In our case, we're running on a LAMP stack so Apache is our webserver; that means we can slot a .htaccess file in place which translates the URL into something the script can use:

.htaccess Rewrite rule to translate the URL

RewriteEngine On
RewriteRule ^/api-proxy/(.*)$ /apiproxy.php?__url=$1 [QSA]

This instructs Apache to listen out for URLs starting /api-proxy/, and extract everything after that prefix into a __url parameter to pass on to the script; the QSA flag means any GET parameters provided to the proxy will also be passed on.

Now we can address the other issue we ran into: the headers being lost on their way back. The cURL extension to PHP provides a CURLOPT_HEADER flag that can be set, which returns the headers associated with the response. Let's see how that might look:

Second cut of our script

define('REMOTE_URL', 'https://their-end.io/api/');
define('OUR_URL', 'https://your-end.com/api-proxy/');

$params = $_GET;
unset($params['__url']);
$curl_params = [
  // Include the requested endpoint
  CURLOPT_URL            => (
    REMOTE_URL .
    $_GET['__url'] .
    '?' .
    http_build_query($params)
  ),
  CURLOPT_TIMEOUT        => 30,
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_SSL_VERIFYHOST => true,
  CURLOPT_SSL_VERIFYPEER => true,
  // Return the remote's headers
  CURLOPT_HEADER         => true,
  CURLOPT_CUSTOMREQUEST  => $_SERVER['REQUEST_METHOD'],
];

// Anything with a POSTed body needs the body transferring
if ($_SERVER['REQUEST_METHOD'] !== 'GET') {
  $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input');
}

$c = curl_init();
curl_setopt_array($c, $curl_params);
$r = curl_exec($c);
curl_close($c);
echo $r;

Output of the second run

$ curl https://your-end.com/api-proxy/accounts
HTTP/1.1 401 Unauthorized
Set-Cookie: JSESSIONID=node0qrer3re8934rf0.node0
WWW-Authenticate: Basic realm="API: authentication required"
Cache-Control: must-revalidate,no-cache,no-store
Content-Type: text/html;charset=iso-8859-1
Content-Length: 58
Server: Jetty(9.4.48.v20220622)

<html><body>
<h1>Error 401 Unauthorized</h1>
</body></html>

Progress! We've now received data from the correct remote URL, but we have the headers mixed in with the body in one returned block. The next step is to break out the headers and return those separately.

Header extraction and provision

Fortunately, the HTTP standard makes it fairly easy to programmatically spot where the headers stop and the response body starts. Headers are specified as ending with \r\n (that's carriage-return and new-line), and a blank header (two of these sequences in a row) is the marker for the end of the headers.

Once we have the headers as a block, we can use the fact that each ends with this newline sequence to output the headers as given:

Header extraction and return

$split_point = strpos($r, "\r\n\r\n");
$headers = trim(substr($r, 0, $split_point));
$body = trim(substr($r, $split_point));

foreach(explode("\r\n", $headers) as $h) {
  header($h);
}
echo $r;

Output after header extraction

$ curl -s -o /dev/null -w "%{http_code}\n" \
https://your-end.com/api-proxy/accounts
401

Now we're starting to see real results: a request that fails to authorise returns a 401. So let's pass in the credentials we were given by the remote API provider:

Passing credentials to our proxy

$ curl -s -o /dev/null -w "${http_code}\n" \
-u testuser:Passw0rd \
https://your-end.com/api-proxy/accounts
401

Error 401, "Authorization Required". Seems our credentials aren't being handed over to the remote; what's happening here is that our username and password are parsed out by the PHP proxy into server-side variables, which we then need to include in the request forwarded to the remote API.

Those server-side variables are PHP_AUTH_USER and PHP_AUTH_PW, and the HTTP specification states that those should be included in an Authorization header, but with a very particular format:

To receive authorization, the client

  1. obtains the user-id and password from the user,
  2. constructs the user-pass by concatenating the user-id, a single colon (":") character, and the password,
  3. encodes the user-pass into an octet sequence (see below for a discussion of character encoding schemes),
  4. and obtains the basic-credentials by encoding this octet sequence using Base64 (RFC4648, Section 4) into a sequence of US-ASCII characters (RFC20).

-- RFC 7617, The 'Basic' HTTP Authentication Scheme

As long as we follow this same scheme, the remote will have no trouble decoding the username and password we're proxying through. PHP's cURL extension allows us to provide custom headers alongside the request, though somewhat confusingly it uses CURLOPT_HTTPHEADER as the option for these headers (as opposed to CURLOPT_HEADER which we've already seen, and specifies that the response headers should be provided).

Including credentials in the request

$req_headers = [];
if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) {
  $req_headers[] = 'Authorization: Basic ' . base64_encode(
    $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW']
  );
}
// ...
$curl_params = [
  // ...
  CURLOPT_HTTPHEADER     => $req_headers,
];

Trying out our credentials

$ curl -u testuser:Passw0rd https://your-end.com/api-proxy/accounts | json_pp
{
   "count" : 1,
   "data" : [
      {
         "id" : "ec3ac5d4-34b6-4086-b88e-21eaff05b23b",
         "name" : "Testing Tester",
         "uri" : "https://their-end.io/api/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b"
      }
   ]
}

Oh goodness, a real return from our remote API. But there's something wrong...

Content replacement and types

We see from our GET request that there is one account record stored in the remote database, and the API provides a unique identifier to perform operations against the record in question. However, the identifier is a URI, and it points (understandably) to the remote's REST API: the one we're having to proxy in the first place.

Fortunately, this particular issue is fairly simple to resolve: as we have the full return string in $body, we just need to switch out any instances of the remote API's root URL with our own, and then we can try making some new requests to our proxied API:

URL replacement in the output

$split_point = strpos($r, "\r\n\r\n");
$headers = trim(substr($r, 0, $split_point));
$body = trim(substr($r, $split_point));
$body = str_replace(REMOTE_URL, OUR_URL, $body);

foreach(explode("\r\n", $headers) as $h) {
  header($h);
}
echo $r;

Our final GET result

$ curl -u testuser:Passw0rd https://your-end.com/api-proxy/accounts | json_pp
{
   "count" : 1,
   "data" : [
      {
         "id" : "ec3ac5d4-34b6-4086-b88e-21eaff05b23b",
         "name" : "Testing Tester",
         "uri" : "https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b"
      }
   ]
}

Proxying an update call

$ curl -s -o /dev/null -w "%{http_code}\n" \
-H 'Content-Type: application/json' \
-X PUT -d '{"name":"A New Name"}' \
https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b
415

Hold on, 415? That's "Unsupported media type", but we're providing a content type. As it turns out, the value of Content-Type provided to the proxy will need to be passed through to the cURL call if a POST or PUT is being made, or cURL will assume text/plain.

PHP receives this header as CONTENT_TYPE, so let's pull that in:

Providing content type

$req_headers = [];
if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) {
  $req_headers[] = 'Authorization: Basic ' . base64_encode(
    $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW']
  );
}
if (isset($_SERVER['CONTENT_TYPE'])) {
  $req_headers[] = 'Content-Type: ' . $_SERVER['CONTENT_TYPE'];
}

Proxying an update, with the content type

$ curl -s -o /dev/null -w "%{http_code}\n" \
-H 'Content-Type: application/json' \
-X PUT -d '{"name":"A New Name"}' \
https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b
200

Thanks for watching

It lives! We can now proxy any call to the firewalled remote API, through a script we've hacked together in PHP, and it'll handle any part of the REST protocol. Along the way, we've learned a little about REST itself, authorisation headers in HTTP, and the various options and settings that can be given to cURL calls in PHP.

This post has been an experiment with a "slow-roll livecoding" format, where a live coding session is written out in sanitised long form. I think this was fun, we might do it again sometime. For reference, here's the final proxy script we came up with:

define('REMOTE_URL', 'https://their-end.io/api/');
define('OUR_URL', 'https://your-end.com/api-proxy/');

$req_headers = [];
if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) {
  $req_headers[] = 'Authorization: Basic ' . base64_encode(
    $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW']
  );
}
if (isset($_SERVER['CONTENT_TYPE'])) {
  $req_headers[] = 'Content-Type: ' . $_SERVER['CONTENT_TYPE'];
}

$params = $_GET;
unset($params['__url']);
$curl_params = [
  CURLOPT_URL            => (
    REMOTE_URL .
    $_GET['__url'] .
    '?' .
    http_build_query($params)
  ),
  CURLOPT_TIMEOUT        => 30,
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_SSL_VERIFYHOST => true,
  CURLOPT_SSL_VERIFYPEER => true,
  CURLOPT_HEADER         => true,
  CURLOPT_HTTPHEADER     => $req_headers,
  CURLOPT_CUSTOMREQUEST  => $_SERVER['REQUEST_METHOD'],
];

if ($_SERVER['REQUEST_METHOD'] !== 'GET') {
  $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input');
}

$c = curl_init();
curl_setopt_array($c, $curl_params);
$r = curl_exec($c);
curl_close($c);

$split_point = strpos($r, "\r\n\r\n");
$headers = trim(substr($r, 0, $split_point));
$body = trim(substr($r, $split_point));
$body = str_replace(REMOTE_URL, OUR_URL, $body);

foreach(explode("\r\n", $headers) as $h) {
  header($h);
}
echo $r;