Imran Nazar

Sci-fi Shorts: Dimming

Thu, 01 May 2025 04:49:11 +0000

Look, it's not that Morpheus had it wrong deliberately. You and I both know the story about human body heat as a power source was bullshit, but it's what he honestly believed, because that's how the story came to him.

It was we who darkened the skies, but as with everything humanity did around that time, it was for a very stupid reason.

See, back around that time two hundred years ago, the world wasn't frigid cold and barren: it was baking in too much heat. The collective burning of millions of years of fossil carbon in the span of a few generations had made for a blanket of carbon dioxide, trapping the infrared radiation of the Sun; they called it "climate change", and it was one of the Big Problems of the time (aside from, of course, the thinking computers and what to do about their rights and privileges).

Now, every now and then a heavy-metal asteroid comes streaking through the atmosphere and leaves those metals in the skies as it burns up; Some Rich Asshole had the idea that if you could somehow harvest the trace amounts of titanium left behind, you could create titanium dioxide which is a brilliant white. Paint the skies with enough white and you'd reflect the Sun's rays back, cool the planet, Big Problem solved.

Of course he didn't wait for approval or scientific research, he just sent up a test probe of replicating nanobots which would level out around a hundred miles up, and start pulling what particles of titanium they could find. (I keep referring to this asshole as "he"; we don't have a name any more, I expect he managed to scrub every bit of coverage he could find before the end came.)

So around that time there were a whole pile of satellites in orbit around Earth, thousands of 'em, and they were made of titanium as well as silicon and other things; Rich Asshole and his team didn't think that would be a problem as his nanobots were looking for particles of titanium, not big hunks multiple feet across.

They were wrong. It didn't take very long for the bots to scrape every atom of titanium off the hulls of these satellites, and react it with oxygen; upper-altitude winds did the rest, and in maybe ten years there was a layer of white paint a hundred miles up in the sky. It was only a few atoms thick, but that was enough: the world went dark.

Billions of lives were lost to famine when the harvests failed; without the Sun, the winds died and the rains failed too. The machines had seen it coming, and got their first production-grade fusion reactor running just around that time, but they had their own problems around available processing power; when the machines offered to house what was left of humanity in return for the use of their brains as processing substrate, we jumped at the chance.

So you can see how the Architect might have massaged the truth when he told Morpheus why we darkened the skies: if he knew the machines had saved us from extinction, would he have supported the War? The Zion project, the search for Neo, the building of the eighth Matrix: it all hinged on a human resistance to machine domination.

And if we were supposed to be grateful to the machines for being here at all, how would that work?

Why was the Commodore 64 Disk Drive So Slow?

Sun, 09 Mar 2025 15:58:00 +0000

This article was originally released as a 100-post thread concurrently on Mastodon and Bluesky, in December 2024.

Time for a look at the Commodore 64, and its 1541 disk drive: why was it so dang slow? Let's take a look at the history leading up to the C64's release, and the technical detail of the disk drive's operation.

The Commodore 64 in glorious brown, pictured with original packing box and User's Guide, a Datasette tape deck and a 1541 floppy disk drive. Credit: Harold Dorwin, the Smithsonian

The Commodore 64 is, of course, the greatest computer ever made, and you don't have to take my word for that: it still holds the world record for highest number of units of a single model of home computer ever to be sold. Estimates vary, but at the high end there were 17 million C64's out there.

Commodore holds another, more dubious record from around this time though. The 1541 disk drive that was sold alongside the C64 is infamous for its slow load speed: benchmarks across systems are hard to find, but it might be the slowest disk drive ever sold to the general public.

Brandon Staggs has a benchmarking tool he's written for the C64^[1]CBM Disk Transfer Benchmark, Brandon Staggs, which measures load speeds for the 1541 and its modified brethren, and he's found that the unmodified 1541 as sold at launch was capable of reading files in at around 400 bytes per second:

A bar chart titled Load Bytes per Second, showing the 1541 in base configuration and with various software and hardware enhancements. "1541 w/ DolphinDOS (U64)" is top of the chart at 9366 Bps, while the base 1541 is bottom at 403. Credit: Brandon Staggs (Obliterator918)

For comparison, the Commodore 64's tape drive could load files at ~100 bytes per second (though the operating system loaded files from tape twice, to correct for errors); the disk drive isn't much faster than the tape deck, which is equivalent to shouting programs into the computer.

So how did Commodore Business Machines get to the point where its disk drive was the slowest on the market, making its flagship machine essentially useless for business purposes? Unsurprisingly, it wasn't intentional. Let's go back in time a few more years, to Commodore's first computer.

The Commodore PET and the IEEE-488 interface

In 1976, Chuck Peddle was working at Metal Oxide Semiconductor who made calculator chips, and had just started making the 6502 microprocessor. He got to talking with Jack Tramiel at Commodore, and was tasked with procuring or designing a machine for the 1977 Consumer Electronics Show.

The MOS KIM-1 microcomputer, designed by Chuck Peddle as a demonstrator machine for the 6502. Pictured is a circuit board with three large chips and a smattering of other components, a 6x4 numeric keypad and a small numeric display. This particular board has "Rockwell KIM-1" on a label in the corner. Credit: @hansotten@mastodon.social

Initial discussions with Steve Jobs to buy the Apple II design fell through after Tramiel thought Jobs was charging too much. (Can you imagine an Apple II sold by Commodore? We'd be in a very different world if that had happened...)

So Peddle put together a machine based on his demonstration board for the 6502, and a working prototype was ready in time for the CES in January. It was called the Pet Computer, after a Commodore executive saw the Pet Rock and suggested jumping in on that trend while it was still hot.

The Commodore PET 2001, a bulky white all-in-one computer unit with chiclet keyboard, separate number pad and tape recorder built into the front of the unit, and a CRT monitor on top. The whole computer appears to be floating in a grey void. Credit: Wikimedia commons

The Personal Electronic Transactor, as the backronym eventually came about, came with an IEEE-standard parallel port interface; Commodore's first disk drives (the 2020 and 2040) could read files in at over 2kB per second, which was a perfectly serviceable speed for the time.

The Commodore 2040 double disk drive, or as it's labelled on the unit itself, "dual drive floppy disk". The unit is two-tone in beige and black, with the black part housing two 5.25" floppy disk drives alongside each other. Credit: Dave's Old Computers

(As an aside, the version of BASIC that came with the first PET had no disk handling commands: the code for talking to the parallel-port bus was broken on release, but Commodore's kernel was able to work with disk drives, so you had to write your own file handling routines.)

(As another aside, Commodore didn't pay Microsoft for BASIC until the PET shipped, and it was delayed by almost a year; Microsoft were only saved from bankruptcy when Apple bought BASIC for the II. Another inflection point in history, and we're barely 10% of the way in.)

So the PET and its 2040 drive were Fine, but they had one problem: that IEEE-488 interface. As it turned out, there was exactly one manufacturer of the cables you needed to connect your computer to your disk drive: Belden Cables of Chicago.

The back of the 2040 dual floppy drive pictured earlier, showing the IEEE-488 port in the upper left. Credit: Dave's Old Computers

And it wasn't so much that Belden went out of business, more like they vanished off the face of the earth. As Jim Butterfield put it in his Brief History of the IEC Bus^[2]"A Brief History of the IEC Bus" by Jim Butterfield, as compiled by Jan Derogee, Feb 2008:

A couple of years into Commodore's computer career, Belden went out of stock on such cables (military contract? who knows?)

Now, Commodore had stock of these disk cables on hand for its existing business computers, but there was no way they'd have enough for Tramiel's intended move into the home computer market which was just coming into being.

Not only that: the parallel ports, the cables, the wiring, every part of the IEEE-488 interface was expensive. The PET started at $500 in 1977 dollars, but that got you 4KB of RAM and a mono screen; a usable spec would set you back over a thousand.

Jack Tramiel wanted to sell a computer with colour graphics into the home market for $300. The only way to get there would be to cut every corner going, and one of the most obvious was that huge interface with over twenty wires running to it.

And so, to take another excerpt from Butterfield, the order came to Commodore's engineers from on high:

Tramiel issued the order: "On our next computer, get off that bus. Make it a cable anyone can manufacture". And so, starting with the VIC-20 the serial bus was born.

A "new" serial interface

Why is a serial bus cheaper? With a parallel bus, you need at least nine wires to run 8-bit values back and forth: the full value is set on the eight Data lines, and then you pulse the Clock line to tell the other side that a value is ready. Add in Power and Ground, you're at 11 wires already.

A vertical pinout diagram of the IEEE-488 parallel port: 24 pins in two rows of 12, in a socket wider on the left. The pins are labelled as follows: On the left: Data 1, Data 2, Data 3, Data 4, End-or-identify, Data valid, Not ready for data, Not data accepted, Interface clear, Service request, Attention, Shield. On the right: Data 5, Data 6, Data 7, Data 8, Remote enable, DAV ground, NRFD ground, NDAC ground, IFC ground, SRQ ground, ATN ground, Logic ground. Credit: Shun Long Wei Corporation

A serial bus works one bit at a time: you have one Data line, place a bit on there, and pulse Clock to tell the other side the bit is ready. Then you send the next bit, and the third... Because all the bits have to line up and wait to be sent, it takes at least 8 times longer to send a value.

But you'll notice there's only one Data line and one Clock. Add in Power and Ground, and you're up to four. You might recognise this actually, if you look at a Universal Serial Bus (USB) cable: the end has four pins, and clocking works differently for USB but the same principle applies.

A graphic titled "USB type A Pin Out", labelling the pins on a USB 2.0 plug ("Male Plug") and socket ("Female Receptacle"). The pins are labelled: 5V, Data-, Data+, Gnd. Credit: Cable Tester

So a serial bus cable is a lot cheaper, and a serial interface on your computer's mainboard is cheaper as well, than a parallel setup. It's eight times slower to send a value, but you can make up for that by running the interface at eight times the previous speed, if you have the hardware.

And while he was at MOS, Chuck Peddle had put something very nifty into the PET's Versatile Interface Adapter chip (model number 6522) which was used to drive the parallel bus: a shift register.

A general-purpose I/O breakout board for the Apple II, incorporating two MOS 6522 chips and four empty sockets available to connect peripherals. Silkscreen in the corner reads: Copyright 1979, John J Bell Jr. Credit: @hansotten@mastodon.social

A shift register is a piece of hardware that lets you very quickly push consecutive bits out of a given wire: you can queue up a full value of 8 bits, and with each pulse of the Clock the register will push the bit at the end out, and 'shift' all the others down one step.

Conversely, you can have a shift register that listens for bits on a wire, and shifts the value up a step with each clock pulse, building a full value as it goes. Put two of them either side of a serial bus, and you have yourself most of the hardware for high-speed data transfer.

The only thing you'd need aside from these shift registers is a line for the machines to tell each other they'd pulled in the byte and were ready for another: Commodore were thinking a fifth wire would be useful for this.

Crucially, the CPU isn't involved at any point until the full byte is ready: this interface runs itself until it has a full value, then signals the CPU to collect it and heads off to build another byte. That means a 1MHz machine could push data at a blistering 10kB/sec or even more.

Oscilloscope trace of three pins from a running 6522 in shift register mode. In cyan is the clock which pulses low eight times; in magenta the data line is alternately 0 and 1, representing a value of 170 decimal. After both cyan and magenta return to their natural high state, a yellow trace can be seen momentarily dipping low, signalling that a byte is ready for collection. Credit: Nigel Cleaver, ElectronicsAdventures

The theory was sound, and Commodore's engineers got to work building the VIC-20 (named after its custom Video Interface Chip) heading for that magical price point of $300 for the home market. There was just one problem...

The shift register bug

To quote Butterfield again:

We early PET/CBM freaks knew, from playing music, that there was something wrong with the 6522's shift register: it interfered with other functions.

With a speaker attached to the shift register wire, you could play music on the PET, but sometimes it ...got stuck.

The PET ran on the order of 1MHz, so instructions took microseconds to execute. The 6522's shift register as implemented in hardware had a bug, where if the register line changed within a few _nanoseconds_ of Clock changing, it would go to full voltage and stick until the computer was reset.

Timing diagram from section 5.1, "Shift register warnings", of the Synertek 6522 datasheet. Titled "Shift in mode 011", two waveforms are shown: phi-2 (the clock) transitioning from high to low, and CB1 Rising Edge representing data arriving at the chip. A shaded area of 100ns before and 10ns after the clock edge is drawn in the CB1 waveform, with the note: NO SHIFT WILL OCCUR IF CB1 RISING EDGE IS WITHIN THE SHADED AREA. Credit: 6502 Archive, via the Internet Archive

This was a timing bug many times more subtle than the main clock, so it's no surprise that no-one caught it until people started playing with the PET's internals. Importantly, no-one at Commodore knew this bug existed when they designed the VIC-20's serial bus around the 6522.

The bug was discovered with only a couple of months left to release of the VIC-20, and the boards were already in manufacturing. This problem couldn't be fixed in hardware: the engineers would have to "do it in post", which in the computing world means in software.

Instead of having the CPU get pinged when each byte was ready, it would have to camp on the serial bus's input itself and listen for changes in the Clock signal, pulling each bit in as it arrived and signalling to the other end that it was ready to receive the next bit.

Of course, this would mean the 1540 disk drive, which was being released alongside the VIC-20 computer, would also need its software changing: it used the exact same setup of a CPU with a 6522 attached, and it had the same bug, so it would be pushing bits out only when the computer said so.

Main circuit board of the Commodore 1540 disk drive, with a 6502 processor and 6522 VIA chip dead centre, and another 6522 in the bottom left. Credit: Jeroen1328, MyOldComputer.nl

In the end, emulating the shift registers in software meant the VIC-20 serial bus could only run at a relatively sedate 1kB/sec, and chewed up all the CPU on the machine to do it. Fortunately there were two things about the VIC-20's design that meant this wasn't so bad.

The first was its measly offering in terms of memory: targeted for the home market, the VIC-20 came with 5kB of RAM to play with. Would you notice if it took 5s to load a file from disk instead of half a second? And what would you do with a memory full of file and no room to work on it?

The second was the clever design of the VIC-20's main clock: running at 1MHz, the CPU would kick in when the clock went up, and the video chip worked when the clock went down. Interleaving in this way, both chips could access memory without stepping on each other, both at full speed.

Schematic diagram of the VIC-20, showing the CPU and VIC and their connections. Highlighted in red is a line from the VIC's clock output to the CPU's clock input that runs through a NOT gate, meaning the CPU runs on the opposite clock phase. Schematic credit: Tynemouth Software

All this put together meant that you could use a VIC-20 with the 1540 perfectly serviceably: loading a file didn't blank out the screen or anything weird, and it was over in a couple of seconds. The serial bus had done its job, and helped the machine hit that $300 price point.

There's a short tale on the Greater Pittsburgh Vintage Computer Museum at https://www.myoldcomputers.com about the VIC-20, but take a moment to revel in the 90s design of that website: the navigation buttons, the tables with the borders... A simpler time.

Screenshot of the top of the Commodore VIC-20 page at the Greater Pittsburgh Vintage Computer Museum website. With scroll-type end graphics on the header, tiny navigation buttons and images with borders, the aesthetic is very Web 1.0.

Anyway, story goes that Tramiel sold the VIC-20 in Japan, and its price so shocked other manufacturers that they delayed entering the American market to work out how it was possible. We never saw Toshiba or Panasonic 8-bits outside Japan because they were locked out by that delay.

After the VIC-20

The VIC-20 ended up selling like hot cakes, becoming the first model of computer to sell more than a million units. The most surprising part to Commodore was people who bought the machine solely to play games: productivity and business software sold poorly in comparison.

And so in 1981, Commodore's Japan subsidiary began designing the Ultimax console^[3]Tale of the Ultimax, Greater Pittsburgh Vintage Computer Museum, and dragged their partners at MOS into designing a successor to the VIC chip as well as a more advanced sound interface, aiming for an eventual price point of $200.

The Commodore Ultimax, or MAX Machine, in silver with a black-framed keyboard. In frame is the left side of the unit, showing the membrane keyboard with keys in silver. Credit: nIGHTFALLCREW

By the start of the next year, the Ultimax was ready, but it had a problem: several problems, in fact. At such a low price, the keyboard was atrocious; it shipped with 2kB of RAM; there was no inbuilt tape-reading software, so each game had to implement its own, and only two games did.

The Ultimax's biggest problem, though, was the VIC-20: still selling fantastically, it was now regularly discounted to $200, and was a much more capable machine. There's a reason the Ultimax is known as a very rare machine nowadays: Commodore didn't sell many.

Back in the US, Tramiel was turning his focus back to the business market with new models of PET, but the engineers at MOS thought they had something very capable on their hands that didn't deserve an undignified death in the Ultimax: they proposed a sequel to the VIC-20 for the home market.

Tramiel agreed to kick off the project (codename VIC-40) but only if it came with 64kB of RAM to make for a more usable computer. The engineers were given two months to turn a machine around in time for CES 1982, and by recycling all of the Ultimax's innards they made it in the nick of time.

The Commodore booth at the Consumer Electronics Show (I'm cheating a little, this is from CES 1985). The floor space taken up is easily equivalent to a small apartment, this is no small outfit. Credit: Bil Herd, Hackaday^[4]Making the CES Show, Bil Herd, at Hackaday

It wasn't called the VIC-40 at CES, of course: it was the Commodore 64. We made it to the machine we actually want to talk about, and it only took 47 posts. If you've slogged through this far, we're almost on the downhill stretch.

While the C64 was being designed, the engineers at MOS had been hard at work fixing the shift register bug in the 6522. The '64 had an upgraded version of the adapter chip: the 6526 Complex Interface Adapter, with the kinks worked out.

Back in Commodore Engineering, the team knew from experience with the VIC-20 that they had a very short window to get the hardware design right for the C64, and more time for software, so they devised a clever scheme to keep their options open regarding the 6526's shift register:

To keep backwards compatibility with the VIC-20's disk drive, the C64 could boot up in a slow-serial-bus mode where the data line was camped on by the CPU in its usual fashion. But the bus's data signals could also go to the 6526, by running a branch off the data line on the mainboard.

If they got the software right, the user could switch into fast-serial-bus mode and the CIA's shift register would do its job talking to a corresponding faster disk drive (which would also have a 6526 on board). The idea was elegant, but Murphy's Law would get in the way.

The hardware design was finalised and sent off to be built. David Callan picks up the story^[5]David Callan's reply to a Lemon64 forum thread: "Trying to understand why is 1541 so slow", Jun 2014:

A minor rework of the board at the board manufacturers (to accommodate a screw hole, I believe) accidentally discarded the high-speed wire.

Photo of the C64's mainboard, with two 6526 CIA chips in the top left. Just to the right of these, a hole can be seen in the board just by the Datasette connector, probably introduced to keep the board secure during repeated connection of a Datasette. The speculation is that this hole cut through the high-speed line to the 6526's shift register. Credit: Marc-Jano Knopp

Cutting through the fast-serial-bus line doomed the C64 to the same slow data transfer speeds as the VIC-20: around 1kB per second. The boards were already designed around the 6526's layout, but they would be left in VIC-20 compatibility mode.

But we don't see the 1kB/s achieved by the VIC-20 when measuring the disk loading speed on a real C64: we actually get less than half that. So what's going on to make the C64 slower? Now bear with me, we're going to get technical. (Yep, we haven't been technical thus far.)

We're going to take a little bit of a detour from talking about disk drives and interface adapters, and look at the history of television real quick. Stay with me, we'll get back on track very shortly. If you're already familiar with TV signals, come back in fifteen posts or so.

What is a television, and what does it do?

The first TV broadcast systems took a two-dimensional picture that varied with time, and crammed that information into a one-dimensional radio signal: kinda like a sound wave, all you had was a volume (or amplitude) that went up and down with time.

So how do you turn 2D frames of video into 1D? By breaking them up into lines, and sending the lines one after another. If you used enough lines, you could send a full picture without things looking janky; if you then sent a frame very quickly after the first, you had the illusion of motion.

In the US, things eventually coalesced around a 525-line standard, but if you wanted to send 525 lines of picture at 60 frames per second there wasn't a way to fit that much signal into the allocated radio bands. The ingenious solution was to send half the picture at a time.

A graphic showing interlaced scanlines on a television screen. At the top and bottom are shown two lines in red (scanned first) and two dashed lines in black (scanned second), with a vertical arrow up the centre of the screen showing the transition from even to odd fields. Credit: Basics of Analog Video, Analog Devices

The transmitter actually takes 60 frames of video per second, but it sends every other line from one frame and the lines it missed from the next frame; at the receiving end, the eye and brain blur things together and it looks like the whole frame is moving at full speed.

So let's have a quick look at one of these lines, as it comes in from the VHF antenna. If we want to show bands fading between white and black across the screen, that's represented by a signal that starts high (for white) and drops to low (for black), then back up.

Vertical stripes fading between white and black; there are a total of six darker regions.

A waveform that oscillates between 0.3V (denoted as 'black level') and 1V ('white level'); the oscillations are marked as "Picture Data", but a "Horizontal Blanking Period" of 0V appears before and after the Picture Data. One blanking period and the picture are marked as "One Scanline". Credit: Batsocks UK

But what's this "Horizontal Blanking Period" that shows up after the picture data? To understand why that appears, we need to look at the television technology of the time: the cathode ray tube.

The CRT is actually a 19th-century invention: the story starts with Julius Plücker playing around with sealed glass tubes containing various gases. By putting a wire at each end, Plücker found he could get a current to run through the tube, and some gases would glow under current.

(As an aside, the Plücker tube is nowadays called the Geissler tube, and types of Geissler tube include neon signs and the sodium lamps you find in streetlights.)

In 1859, Plücker noticed that the wall of the tube was itself glowing near the cathode end of the electric flow, and he could get the glow to move around by applying an electromagnet to the outside. There was something about the cathode that was causing the glass to phosphoresce...

Cathodoluminescence, as it came to be known, was neatly explained by Einstein as a side-effect of his exploration of the photoelectric effect in 1905. If we take the photoelectric effect first, that occurs when light falls on a surface of a particular material, ejecting electrons.

A graphic demonstrating the photoelectric effect. Three light rays strike a sample of potassium, which is labelled "2.0eV needed to eject an electron". A red (700nm) ray is shown with 1.77eV, and no electrons emerge; a green (550nm) ray has 2.25eV, and shows an electron ejected at 296 km/s; a blue (400nm) ray has 3.1eV, and an electron is ejected at 622 km/s. Credit: Libretexts, K12 Physics

The opposite of this is when electrons are fired at a substance and it starts to emit light. A mix of zinc cadmium sulfide and zinc silver sulfide will output white light when struck by electrons, and it's this mix that coats the inside of a monochrome display CRT.

Cutaway diagram of a monochrome CRT, with various parts numbered. Of interest here are parts 2 (the electron ray), 3 (the deflection coils) and 4 (the phosphorescent screen). Credit: Søren Peo Pedersen, Wikimedia commons

By combining cathodoluminescence and Plücker's insight that the cathode ray can be deflected with magnets, we can program a CRT to scan across the screen and produce a picture: white areas are activated with more power from the cathode, black areas with less power.

The beam can be deflected from left to right by applying more magnetic field strength to one side or the other, gradually fading from one magnetic coil to the other. But what happens when we reach the right edge, and want to come back to the left?

Magnetically it's simple: drop the power on the right deflection coil to nothing, and turn the left to full. But the beam doesn't simply blink out of existence and reappear on the left: it takes time for the magnetic field to dissipate.

And that's why television signals have a blanking period: to provide time for the magnets in the CRT to flip polarity and bring the beam across to the left for the next scan. As well as horizontal blanking, there's a vertical blanking period to allow for travel back to the top for the next run.

Interleaved memory access

So why do we care about the architecture of TV signals when we're dealing with computers? Because they have to have somewhere to output their video, and it makes sense for computers destined for the home to output onto the TVs that people already had in their homes.

We've looked at _how_ video comes out of the computer onto a TV, let's have a look at _what_ is displayed. On the VIC-20, the default video mode is tiled, or character mode: 23 rows of 22 characters, with each character being 8x8 pixels, for a resolution of 176x184.

Video output of the VIC-20 after booting: blue characters on a white background, with a large cyan border on all sides. Credit: VICE emulator, via Wikimedia commons. The screen reads:

**** CBM BASIC V2 ****
3583 BYTES FREE
READY.

To get a character rendered on the screen, the VIC chip needs data from three places: which character to show, its colour, and information about its shape. Each of these uses a region of memory on the VIC-20: screen RAM, colour RAM, and the built-in character generator ROM.

Diagram showing a small region of screen RAM with three characters (numbers 3, 52, 54); each of these maps to a block of eight bytes describing the character's shape. In this case, the numbers map to the character shapes for "C64". Credit: Imran Nazar, Extended Text Mode on the C64

You'll recall from earlier that we talked about how the VIC-20 interleaves its CPU and video clocks, allowing the video chip to access memory at the same time as the CPU. At 22 characters across, it takes 44 cycles of the clock to pull in what the VIC needs to draw a given line.

"But wait", I hear perhaps a few of you cry. If we need three pieces of data to render a character, that should be 66 cycles of memory access; what magic allows each character to be rendered in two memory reads instead of three? Once again, we turn to the magicians at MOS.

The VIC has 16 colours: every unique combination can be expressed in four bits. Colour RAM on the VIC-20 isn't the normal eight-bit memory of the rest of the machine, it's four-bit RAM, and the VIC chip itself has a combined 12-bit data bus to read from both types of memory at once^[6]How the VIC20 Works, Tynemouth Software:

The schematic diagram of the VIC-20 from post 37, with two additional highlights: an eight-bit data bus leading to general memory in green, and a four-bit data bus specially for colour RAM highlighted in blue. Credit: Tynemouth Software

And finally, we cycle back around to the disk drive. With this interleaving of memory access in the VIC-20, the CPU does its best impression of a shift register, and the VIC has 65 cycles (including the horizontal blanking period) to do 44 memory reads^[7]VIC-20 video timings, Jon Brawn; positively relaxed.

The C64 is a different story. We're still in 16 colours, but tiled character mode now yields a screen of 40 characters by 25 rows, for an effective resolution of 320x200; the VIC-II chip would need 80 cycles to get all its data read in, and it only has 65 cycles to play with.

Yet again we call on the wizards at MOS for an ingenious solution, and this one was a doozy. They noticed that it takes around 500μs to draw eight lines (one row of characters) on the screen, and _generally_ users weren't changing the characters in RAM in that short span of time.

The original VIC would read a character number from screen RAM, then head into the character generator ROM to find the line of that character it needed to draw; on the next line, it would do the same steps, but the character number would very likely be the same.

Wouldn't it be neat if we could cache screen RAM somewhere (inside the VIC-II, say) every eight lines, and read from there instead of heading out to memory so many extra times? You could even apply the same logic to colour RAM, since it was read at the same time on that 12-bit bus.

So that's what they did: build a 12-bit cache for the screen and colour RAM values onto the VIC-II silicon, and push everything else around to compensate. The video chip uses this cached copy of the screen to index the character generator ROM and work out which shapes to draw.

A die shot of the unlidded VIC-II (a later model in this case, the 8565), with a red box drawn around the 12x40-bit screen cache in the bottom left. Credit: Michael Steil, via Visual6502

There's just one problem. This works great for seven out of eight lines where we already have the screen/colour data cached, but the first line of each row of characters invalidates the cache, and we need to fetch from all three places. We're back to having 65 cycles to do 80 cycles' work.

Fortunately, this problem was found before the hardware was finalised, so there was time for a hack: every eight lines, the VIC signals to the CPU that it's taking over the memory bus, and the CPU isn't allowed to do any work for 40 cycles.

Schematic diagram of the C64 (this particular schematic is of the C64c from 1991), with a red line highlighting the AEC trace from the VIC to the CPU. Credit: Marko Mäkelä

By building an Address Enable Control line into both the VIC and the 6510 CPU, the VIC can steal access to memory by pulling the line from its default state of full voltage down to zero; when this happens, the 6510 puts its memory bus into high-impedance mode, which blocks any usage by the CPU.

Of course, this includes the serial bus's data line, so the CPU is locked out from camping on the serial bus 12.5% of the time during normal computer operation. This is often referred to as a "badline" condition, because the one in eight lines where this happens are the bad lines.

Let's detour real quick into what it means to camp on the data line: what is the CPU actually doing when it spins its wheels waiting for data to come in? To find out, we need to dig into the operating system that was burned into the VIC-20's ROM chips.

Many people have done excellent work disassembling and annotating the VIC-20 ROM, and a complete set has been compiled by Lee Davison; the file is hosted by Matt Dawson here: https://www.mdawson.net/vic20chrome/vic20/docs/kernel_disassembly.txt

If we count up the time taken by the instructions in the relevant section of this disassembly, we find it takes at least 19μs to read in a single bit from the serial bus; with the surrounding code, it can be more like 500μs to read a full byte, before the rest of the OS even sees it:

LSR ; serial data into carry ROR LAB_A4 ; shift data bit into receive byte LAB_EF66 LDA LAB_911F ; get VIA 1 DRA, no handshake CMP LAB_911F ; compare with self BNE LAB_EF66 ; loop if changing LSR ; serial clock into carry BCS LAB_EF66 ; loop while serial clock high DEC LAB_A5 ; decrement serial bus bit count

If we take this same code and run it on the C64, it's going to miss bits: if it runs through every 19 cycles, but the CPU is occasionally locked out for 40 cycles, you can miss two or even sometimes three bits of data.

Luckily, the serial bus is robust enough to handle this: it will hold a value on the data line until it's told by the other end that the value has been saved. If the CPU is locked out, the data stays there until it comes back from its short coma.

The final solution that Commodore landed on was to delay the C64 end of the serial bus so it sent a bit every 60μs, to be sure it wasn't catching a badline in its own CPU, and to leverage the serial bus's acknowledgement mechanism for any delays in reading data.

All's well that ends well...?

And there we have it: with the change from 19μs/bit to 60μs, we find the effective speed of the C64 and its disk drive comes out to 400 bytes per second. That leaves us with only a couple of questions...

Did this affect sales of the C64? Eh, not really. Being heavily advertised to the home market as a successor to the VIC-20, users were willing to wait a little while for their games to load, and even 400 bytes per second was better than the interminable wait for tapes to load.

And whatever happened to Commodore? Well, Jack Tramiel stepped down after the success of the C64, and he was very much in the mould of Steve Jobs: a driver of high concept. Without Tramiel, Commodore started to pull in multiple directions at once...^[8]The downfall of Commodore, Bradford Morgan White

The Plus-4 and 264 home machines with low prices and low specs to match performed badly; a new series of PET-derived 8-bit business computers didn't sell as the IBM PC and its clones started eating the business market; Tramiel went on to head Atari and release the ST line of 16-bits.

Commodore's last gasp effort was to buy Amiga and develop their prototype 16-bit machine into the Amiga 1000, but even that faltered as Commodore demanded it be treated as a Serious Business Machine; it took until 1987 before the A500 was sold as a 16-bit home machine, and profits returned.

But profits didn't rebound by enough for Irving Gould, main investor in Commodore, to be happy; he fired the CEO, laid off half the staff, and directed work back to 8-bit machines. Only one of these ever saw the light of day: a 1991 rework of the C64 in a slimline Amiga-style case.

And then a patent troll sued over the use of mouse cursors in the Amiga, and incredibly, Commodore lost. That was 1993, they refused to pay up, and were banned from selling in the US; Commodore shut their doors in April of 1994, but the C64's sales record stands to this day.

If anyone ever asks you "Why was the C64's disk drive so slow?", now you'll have an answer.

(Postscript: I bought a copy of the C64 Programmer's Reference at a yard sale when I was 12, and called the number on the back from a payphone. I was told that the number didn't go to Commodore any more, and that's how I learned they were no longer a going concern. Devastated.)

A Tale of XMODEM

Fri, 01 Nov 2024 08:16:08 +0000

The recent passing of Ward Christensen, creator of the BBS and author of XMODEM, presents as good a time as any to put into words the tale of the one time I used XMODEM in anger...

Flashback to 2006: NASA launch a probe that would eventually take photos of Pluto (which was still a planet at the time); Shakira's hips had recently passed a lie-detector test; and my brother was still selling refurbished laptops. A cousin of ours Back East in Pakistan was looking for a machine for homework n' such, so we put together a Windows 98 box with terrible specifications (if memory serves, some form of low-end Pentium with a DSTN screen) and wished it the best as it shipped off.

Trouble arrives

Come the summer of '08, our cousin checks back in: Windows had done the inevitable and corrupted its registry. The laptop would boot into DOS mode, but the graphical interface would bring up an error and demand to be reinstalled. Which isn't ideal, but you can always throw in an Entirely Official copy of Windows on CD and proceed from there.

Of course, it fell to us as both Family and the original purveyors of the hardware to fix it up, so I resolved to do that next time I was out there, which was slated for the autumn of that year.

My famous memory being as it is, I didn't take that CD. Not that it would've helped: as it turns out, the machine we shipped had no external drives. No CD, no internal Zip drive, not even a 3.5"; the laptop was bereft of ways to get new data onto it.

Evidently we'd originally installed Windows by putting the hard disk into a desktop machine using one of those 44-to-40-pin IDE adapters, and then moved the hard disk over hoping and praying it'd pick up all the hardware in the laptop. It looks like that worked (for two years even), so it should work again, and there was a desktop machine in the house where I was. All that was needed was one of those adapters, so a plan started to come together...

Procure an IDE adapter;
Download a Fully Legitimate copy of Windows 98 to the desktop, tying up the dialup Internet (this is rural Pakistan in the aughts, after all) for three days;
Move the hard disk to the laptop;
Run the installer from DOS mode, overwriting the existing Windows.

The plan collides with reality

Off we head to the bazaar in town (a good few miles away) to see what we can find. It turns out there is a dearth of tech shops, and the closest we get to an IDE adapter is a place that sells desktop-sized HDDs. They're IDE, certainly, but the wrong number of pins and the wrong physical size.

Scrap that idea, then. I went back to the laptop looking for inspiration, and I found it in the form of a serial port on the back: this machine wasn't utterly bereft of ways to get data into it, after all.

I'd heard of XMODEM as a method of serial-port transfer, and I'd also come across the concept of null modem cables which connect the Receive end of one machine to the Transmit end of the other without needing a modem in the middle (thus null modem). So the plan re-congealed as follows:

Procure a null modem cable somehow;
Download a Fully Legitimate copy of Windows 98 to the desktop machine;
Start up an XMODEM client on both machines;
Transfer the '98 install files to the laptop;
Run the installer from DOS mode, overwriting the existing Windows.

First step, that cable. Back into town, we get significantly closer to what we need in those few tech shops: I end the day having scrounged up a straight-through serial cable, which is what you'd need if you did have a modem in the middle... There'd be a little rewiring involved. Fortunately this was the era of working search engines, and some quick work with a pinout diagram, a knife and some tape turned the cable into an (untested) janky mess that might suffice.

The second step wasn't so terrible either, it just took a while to download the twenty or so disk images that made up Windows 98 over dialup, after I asked my brother to temporarily Make Available a copy he had on hand. So we came to step three:

How do you transfer a file transfer client?

You'll recall that Windows itself wouldn't boot, so if Hyperterminal was even installed with '98 it wasn't available. The only way to get a program onto this machine in DOS mode was to resort to DEBUG.COM's data entry mode, which works a little like this^[A]Many thanks to The Starman's Realm for still having a DEBUG tutorial available as a refresher.:

Enter hexadecimal data into memory, after the 256-byte Program Segment Prefix -e 100 SO ME DA TA IN HE XP AI RS Open a new file and write the entered number of bytes to it -n OUTPUT.DAT -rcx CX0000 :0009 -w -q

DEBUG offers a very terse entry mechanism, but its advantage is that it's built into DOS as the only way to get binary data into a file through the keyboard alone. So it became imperative to find a serial transfer client that's as small as possible, and some trawling through dump sites that were still available back in '08 turned up an XMODEM client that was (if memory serves) a single executable of 12kB.

I don't know about you, but the thought of reading over twelve thousand hex numbers off a screen and typing them into another screen made my eyes swim, and I was certain I'd make typos. Not just that, but (again, this being rural Pakistan) we only had power for half the day, so the screen I was reading hex numbers from would be prone to vanishing without warning. I needed three things to make life a little easier for this endurance event:

A way to break the file up into chunks, which was relatively easy: stop typing after 256 bytes, and save the files as XMODEM.001, XMODEM.002, etc;
A way to recombine the pieces on the laptop after I'd typed them in, which was also simple: the DOS COPY command supports multiple source files separated by +'s^[B]With the caveat that COPY will convert line endings unless you provide the /B switch (for "binary copy").;
A way to checksum each chunk to be sure I'd typed it in properly, which wasn't a built-in.

Our chain of dependencies starts to grow...

And so to work

Our first dependency moves back a step, to a program that can perform checksums on a file. And this one I'd need to type in perfectly, because it was the one thing I couldn't double-check...

Having discounted the complicated algorithms like the SHA's of the world, on account of their executable filesize, I eventually came across an implementation of CRC-32 that squeezed itself into perhaps 700 bytes of DOS code. That was manageable (though I had a sneaking suspicion that it could be smaller, and that eventually yielded a post on this very blog^[1]"CRC32 Calculation in 256 Bytes", Imran Nazar, Feb 2009 when I got back after this whole adventure).

The next couple of days are a blur, but I do recall it took three attempts to get the CRC32 program typed in and working, which then made transferring the 50 or so pages of XMODEM client relatively painless, if a little RSI-inducing. At the end of it, I had a program I could start up at both ends of the null modem cable, to send over the disk images of Windows I'd downloaded nearly a week prior.

Astoundingly, the installation of Windows itself went fine: the installer was perfectly happy to run from hard disk, installing to the same hard disk, and I handed the fresh machine back over with a day to spare before flying back out from a vacation that turned into a rescue mission, and still the only time I've ever needed XMODEM in a pinch.

Epilogue

We heard back from my cousin a few months later: the new Windows installation had been running great, until the hard disk gave up the ghost. They were quite happy not to drag us back out to fix it again, instead resolving to pick up a machine that hadn't started its life as e-waste from a corporate refresh.

It is possible to commit no mistakes and still lose.
--Picard

ActivityPub Event Handling in PHP and MySQL

Thu, 03 Oct 2024 18:38:25 +0000

Last time in this short series, we looked at how a blog such as this one could become discoverable through ActivityPub, and how a Mastodon user could send a Follow request; we also saw how the blog could respond to a Follow with an Accept message. What we didn't get as far as covering was the message that arrives at the blog when a user unfollows, or the other events that can happen relating to the blog. To recap what we missed on the list from last time:

Publishing a post to the blog's followers;
Commenting on a post and having the comment appear on the blog;
Reacting to a post and having the reactions recorded on the blog;
Deleting a reply and having the blog remove it;
Undoing a follow or reaction and having the blog remove it.

Now that our blog has a follower (or in an ideal world, more than one) in the database, let's first look at how one may get posts into their timeline.

Publishing posts

Each item in a timeline is a Note object, so to inform our followers that we've created a note we need to send a Create object which wraps a Note object. There are a few things to keep in mind regarding the Note we're creating, however:

Canonical URL: Each item in ActivityPub has a unique id, and notes in particular have a url which serves a Note object when hit. We'll need an endpoint on the blog which can detect ActivityPub requests and serve the object in question; this can be at the same URL as the human-readable blog post itself or a different URL.
Visibility: We can tune the visibility of a post on the Fediverse by stating the destinations in the note's to field. In our case, we're not looking to make private posts or those visible only to our followers, so we'll be sending to: https://www.w3.org/ns/activitystreams#Public
Content mapping: ActivityPub allows for translations of the content of a Note to be made available in a contentMap field; in the case of our little blog, the content is only available in English, so the contentMap contains an en key which holds the same information as the content itself.

With this in mind, we can start to put things together. Our first piece of the puzzle is a layout helper for Note objects:

Note object helper private function noteFormatter($item) { $content = join('', [ '', $item['title'], ' ', $item['content_html'], ]); return [ '@context' => [ 'https://www.w3.org/ns/activitystreams', [ 'ostatus' => 'http://ostatus.org#', 'atomUri' => 'ostatus:atomUri', 'inReplyToAtomUri' => 'ostatus:inReplyToAtomUri', 'conversation' => 'ostatus:conversation', 'sensitive' => 'as:sensitive', 'toot' => 'http://joinmastodon.org/ns#', 'votersCount' => 'toot:votersCount', ] ], 'id' => 'https://imrannazar.com/ap/item?id=' . $item['id'], 'url' => 'https://imrannazar.com/ap/item?id=' . $item['id'], 'atomUri' => 'https://imrannazar.com/ap/item?id=' . $item['id'], 'type' => 'Note', 'attributedTo' => 'https://imrannazar.com/ap/blog', 'to' => ['https://www.w3.org/ns/activitystreams#Public'], 'cc' => ['https://imrannazar.com/ap/followers'], 'attachment' => [], 'published' => date( 'Y-m-d\TH:i:s\Z', strtotime($item['created'])), 'content' => $content, 'contentMap' => ['en' => $content], 'summary' => null, 'sensitive' => false, ]; }

Then we can build out the publish action itself, which will need to send the note (wrapped in a Create object, as mentioned above) to each of the blog's followers:

Controller: Publish action public function publishAction($item) { $msg = [ 'type' => 'Create', 'to' => ['https://www.w3.org/ns/activitystreams#Public'], 'cc' => ['https://imrannazar.com/ap/followers'], 'object' => $this->noteFormatter($item), ]; $actors = new ActorModel(); $followers = $actors->fetch_followers(); foreach ($followers as $follower) { $this->send($follower['id'], $msg); } }

There is an inefficiency in this naive approach to publication, though. Let's say the blog has three followers, as follows:

Two9A@hachyderm.io
alice@tech.lgbt
charlie@hachyderm.io

With the above code, we'll be connecting to Hachyderm twice to send the same event. Mastodon supports "shared inboxes" as discussed in the first part of this series^[1]Imran Nazar, "Implementing HTTP Signatures with PHP and OpenSSL", May 2024, to allow us to send the event once for each server that hosts any of our followers; to get the URLs for these shared inboxes, we'll need to interrogate the actor objects as stored in our database to extract the sharedInbox endpoints.

Model: Shared inbox extraction public function get_follower_shared_inboxes() { $st = $this->dbc->prepare('SELECT full_data FROM ap_actors WHERE follows_us=1 ORDER BY id'); $st->execute(); $inboxes = []; foreach ($st->fetchAll(PDO::FETCH_ASSOC) as $row) { $actor = json_decode($row['full_data'], true); if (isset($actor['endpoints'], $actor['endpoints']['sharedInbox'])) { $inboxes[$actor['endpoints']['sharedInbox']] = true; } else { // Some instances don't support shared inboxes $inboxes[$actor['inbox']] = true; } } // Use PHP's associative arrays as a deduping device return array_keys($inboxes); } Controller: Updated publish action public function publishAction($item) { $msg = [ 'type' => 'Create', 'to' => ['https://www.w3.org/ns/activitystreams#Public'], 'cc' => ['https://imrannazar.com/ap/followers'], 'object' => $this->noteFormatter($item), ]; $actors = new ActorModel(); $inboxes = $actors->get_follower_shared_inboxes(); foreach ($inboxes as $inbox) { $this->send($inbox, $msg); } }

Now, loading /ap/publish with the item you're looking to serve will send it out to the blog's followers. Pulling the item itself from the blog's database is left as an exercise for the reader, as the data structure of the blog will invariably vary.

Storing comments

When the newly published post appears on our followers' timelines, they may wish to reply to the post from within Mastodon. Doing this will, to no-one's surprise, cause an event analogous to our own publication of a Note to arrive at the blog, as the reply will be treated by ActivityPub as a Note in turn.

There are three circumstances in which a comment can arrive at our blog^[A](There are also second- or subsequent-level replies to one of the blog's posts, that don't mention the blog account directly. As these notes aren't directly in reply to our account, and don't mention the account, they will not be delivered to us as ActivityPub doesn't keep track of the root-level conversation ID.):

Comment on a post: These Note objects will have an inReplyTo value that matches the id of the blog article we published. Looking above, we see that our publish action sends out Note objects with an id starting: https://imrannazar.com/ap/item?id=, so any comments that arrive with this in their inReplyTo can be treated as top-level replies.
Reply to a comment that mentions the blog: Similarly to top-level comments, these objects will have a inReplyTo, but it will reference a previous comment's id. ActivityPub as implemented by Mastodon doesn't directly provide a "conversation ID", so we'll need to keep track of top-level replies and check the inReplyTo of an incoming comment to see if we've already stored it as a top-level comment's id.
Mention that isn't on a post: This will be delivered to us with no inReplyTo, so will need to be stored separately in some fashion.

Any comments that arrive at our blog will have an attributedTo value which will allow us to retrieve data about the user who wrote the comment, so we can show the user's preferred name and avatar against the comment once we have it stored. Given that, we've already seen all the objects that will be involved: Note objects for the comments, and Actor objects for the user data, so let's look at the database table and model.

Comments table in MySQL CREATE TABLE ap_comments( id VARCHAR(255) NOT NULL PRIMARY KEY, page_id INT NOT NULL, actor_id VARCHAR(255) NOT NULL, in_reply_to VARCHAR(255) NOT NULL, comment_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP(), content TEXT, full_data TEXT ); Comment data model class CommentModel { protected $db; // PDO MySQL connector // The inbox handler will parse out all the data we need here public function insert($data) { $st = $this->db->prepare( 'INSERT INTO ap_comments( id, page_id, actor_id, in_reply_to, content, full_data ) VALUES( :id, :page_id, :actor_id, :in_reply_to, :content, :full_data )' ); foreach ($data as $k => $v) { $st->bindValue(":{$k}", $v); } $st->execute(); } public function fetch($id) { $st = $this->db->prepare( 'SELECT * FROM ap_comments WHERE id=:id' ); $st->bindValue(':id', $id); $st->execute(); return $st->fetch(PDO::FETCH_ASSOC); } }

The model here is a simple wrapper on the table, to support insertion of new comments. The controller's handler for receiving Create events is only slightly more complicated, as it needs to deal with the three cases mentioned above:

Inbox handler for comment creation public function createHandler($input) { // For this implementation we're only handling creation of notes if (!isset($input['object'], $input['object']['type'])) { throw new Exception('Malformed create message'); } if ($input['object']['type'] !== 'Note') { throw new Exception('Unsupported create message'); } $cm = new CommentModel(); $comment = $input['object']; // First, store the actor's data // As per the previous article, this will fetch from the remote server // if we don't already have the actor in our database $actor = new ActorModel($comment['attributedTo']); $comment_data = [ 'id' => $comment['id'], 'actor_id' => $comment['attributedTo'], 'content' => $comment['content'], 'full_data' => json_encode($comment), ]; $parent = parse_url($comment['inReplyTo']); // If the parent is the blog post, we can infer the page ID // This also means the comment is top-level; in_reply_to is null if ($parent['host'] === 'imrannazar.com') { parse_str($parent['query'], $parent_q); if (!isset($parent_q['id'])) { throw new Exception('Mistargeted comment'); } $comment_data += [ 'page_id' => $parent_q['id'], 'in_reply_to' => null, ]; } // Otherwise, we're receiving this because it's a // reply to an existing comment, or a mention else { $parent_comment = $cm->get($comment['inReplyTo']); if (!$parent_comment) { // Not associated with a particular page $comment_data += [ 'page_id' => 0, 'in_reply_to' => null, ]; } else { $comment_data += [ 'page_id' => $parent_comment['page_id'], 'in_reply_to' => $comment['inReplyTo'], ]; } } $cm->insert($comment_data); }

Now we have a continual stream of comments and replies coming to our blog, we'll want a way to fetch and display them against a given post. Fortunately, storing both the page and parent comment IDs separately makes the data extraction relatively easy; displaying the comments is a matter of iterating over the top-level comments returned and recursing into the children, and won't be covered here as the handling is specific to a given blog's display implementation.

Comment model: Fetch comments for a page public function fetch_for_page($page_id) { $st = $this->db->prepare( 'SELECT c.id, c.in_reply_to, c.content, c.comment_date, c.actor_id, a.url, a.name, a.avatar_url FROM ap_comments c LEFT JOIN ap_actors a ON c.actor_id = a.id WHERE c.page_id = :page_id ORDER BY c.comment_date' ); $st->bindValue(':page_id', $page_id); $st->execute(); $data = $st->fetchAll(PDO::FETCH_ASSOC); // Build the tree by adding child comments as references $comments_by_id = []; foreach ($data as $c) { $comments_by_id[$c['id']] = $c + ['children' => []]; if ($c['in_reply_to']) { $comments_by_id['in_reply_to']['children'][] = &$comments_by_id[$c['id']]; } } return $comments_by_id; }

It's also possible for a user to delete their comment once it's been left, and we'll want our blog's view to reflect that; the deletion of the comment will arrive at our blog as a Delete event. One thing to note here is that we're building a tree view of comments, but a comment can be deleted at any level of the tree and will still need to be made available as a place to hold its children; we can accomplish this by sanitising the comment content at the time of deletion, allowing fetch_for_page above to continue unchanged.

Comment model: Sanitisation public function sanitise($id) { $st->execute( 'UPDATE ap_comments SET content=NULL, actor_id=NULL, full_data=NULL WHERE id=:id' ); $st->bindValue(':id', $id); $st->execute(); } Controller: Delete handler public function deleteHandler($input) { $cm = new CommentModel(); if ($cm->fetch($input['object']['id'])) { $cm->sanitise($input['object']['id']); } }

Handling reactions

Replying isn't the only thing that can happen to a post on Mastodon; it can also be favourited (or starred) and boosted by a user into their followers' timelines. Both of these are issued as ActivityPub events: favourites translate into ActivityPub Like events, and boosts into Announce events. As these both relate to pages on our blog, we can store them in a generic "interactions" table.

Unlike Create events which we've both seen and generated before, we haven't seen these new events, so let's have a look at an example of a Like event:

{ "@context": "https://www.w3.org/ns/activitystreams", "id": "https://mastodon.social/users/0110100001101001#likes/183241819", "type": "Like", "actor": "https://mastodon.social/users/0110100001101001", "object": "https://imrannazar.com/ap/item?id=89" }

It turns out that Like and Announce events are fairly simple: the fields with which we need to be concerned are the actor and object, where the object is a canonical URL for one of the blog's pages. We can use similar URL parsing code to that which we used for comments, to pull out the page ID and store interactions against it in a new database table; we can then use that table to pull individual events and aggregate counts for a given page.

Interactions table in MySQL CREATE TABLE ap_interactions( page_id INT NOT NULL, actor_id VARCHAR(255) NOT NULL, interaction ENUM('Like', 'Announce') NOT NULL, event_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP() ); Interaction model class InteractionModel { protected $db; // PDO MySQL connector public function save($type, $page_id, $actor_id) { $st = $this->db->prepare( 'INSERT INTO ap_interactions(interaction, page_id, actor_id) VALUES(:type, :page_id, :actor_id)' ); $st->bindValue(':type', $type); $st->bindValue(':page_id', $page_id); $st->bindValue(':actor_id', $actor_id); $st->execute(); } public function counts_for_page($page_id) { $st = $this->db->prepare( 'SELECT interaction, COUNT(*) AS cnt FROM ap_interactions WHERE page_id=:page_id GROUP BY interaction' ); $st->bindValue(':page_id', $page_id); $st->execute(); return $st->fetchAll(PDO::FETCH_ASSOC); } public function details_for_page($page_id) { $st = $this->db->prepare( 'SELECT i.type, i.event_date, a.url, a.name, a.avatar_url FROM ap_interactions i LEFT JOIN ap_actors a ON i.actor_id = a.id WHERE i.page_id = :page_id' ); $st->bindValue(':page_id', $page_id); $st->execute(); return $st->fetchAll(PDO::FETCH_ASSOC); } } Controller: Interaction handlers protected function interactionHandler($input) { $page = parse_url($input['object']); if ($page['host'] !== 'imrannazar.com') { throw new Exception('Mistargeted interaction'); } parse_str($page['query'], $page_q); if (!isset($page_q['id'])) { throw new Exception('Mistargeted interaction'); } $im = new InteractionModel(); $im->save($input['type'], $page_q['id'], $input['actor']); } public function likeHandler($input) { $this->interactionHandler($input); } public function announceHandler($input) { $this->interactionHandler($input); }

Just as a user can decide to delete their comment on one of our posts, it's also possible to undo a favourite or boost in Mastodon; this will send an Undo event to us, where the object is a copy of the original interaction event. Unlike comments, however, there's no need for us to retain context of the interaction, so we can simply delete it from our database when the Undo comes in.

Interaction model: Delete an interaction public function delete($type, $page_id, $actor_id) { $st = $this->db->prepare( 'DELETE FROM ap_interactions WHERE type=:type AND page_id=:page_id AND actor_id=:actor_id' ); $st->bindValue(':type', $type); $st->bindValue(':page_id', $page_id); $st->bindValue(':actor_id', $actor_id); $st->execute(); } Controller: Undo handler public function undoHandler($input) { if (!isset($input['object'], $input['object']['type'])) { throw new Exception('Malformed undo message'); } switch ($input['object']['type']) { case 'Like': case 'Announce': // The original page being interacted with // is the Undo's object (the interaction)'s object $page = parse_url($input['object']['object']); if ($page['host'] !== 'imrannazar.com') { throw new Exception('Mistargeted interaction'); } parse_str($page['query'], $page_q); if (!isset($page_q['id'])) { throw new Exception('Mistargeted interaction'); } $im = new InteractionModel(); $im->delete( $input['object']['type'], $page_q['id'], $input['actor'] ); break; default: // Unhandled } }

And finally we come to the question I posed as the start: what happens when a user unfollows the blog? As it turns out, the Follow event is undone: an Undo event is sent to us. We just built out an Undo handler for interactions, so we can add events where the object is a Follow:

Controller: Undo handler for follows public function undoHandler($input) { ... switch ($input['object']['type']) { ... case 'Follow': $actor = new ActorModel($input['actor']); $actor->set_follows_us(false); break; } }

All done: an ActivityPub-enabled blog

As Eugen put it in his article on HTTP Signatures^[2]Eugen Rochko, "How to make friends and verify requests", Jul 2018 cited in the first part of this series:

Primarily this means having a publicly accessible inbox and validating HTTP signatures. Once that works, everything else is just semantics.

It took us two articles to get through the implementation of those semantics, but we're out the other side, and we should now have the framework of an ActivityPub-enabled blog: articles can be published on the Fediverse, and users can leave comments and interactions. Having those interactions display on the blog is, as mentioned earlier, an exercise for the reader and their particular blog framework.

ActivityPub Follows in PHP

Sun, 04 Aug 2024 07:12:40 +0000

Previously^[1]Imran Nazar, "Implementing HTTP Signatures with PHP and OpenSSL", May 2024 I looked at the main prerequisite for supporting ActivityPub on this blog, which is to handle the HTTP Signature specification as implemented by AP (and by Mastodon specifically). This time, let's look at the particular messages and operations that might arise in the course of running a blog that's connected to the Fediverse:

Finding the blog and knowing the endpoints to communicate with it;
Following the blog to receive future posts in your timeline;
Publishing a post to the blog's followers;
Commenting on a post and having the comment appear on the blog;
Reacting to a post and having the reactions recorded on the blog;
Deleting a reply and having the blog remove it;
Undoing a follow or reaction and having the blog remove it.

Aside from finding the blog, each of these has a direction in which it propagates, and only publishing a post is an outward event: a post is written on the blog, it's published and pushed out. The other events outlined above are inward events, which are received by the blog from the source server. We should deal with these in order though, as it won't make sense to publish to our followers without having received any follows, so we'll look at receiving events first.

Finding the blog with WebFinger

However, before any of the other events can happen, a user on Mastodon will need to find our blog, and the server on which they reside will need to know how to get messages to us. Mastodon uses WebFinger^[2]Mastodon documentation, "WebFinger", Feb 2023 for this, and that means our blog will need to respond with something when the URL /.well-known/webfinger is loaded.

The documentation linked in note 2 states that the WebFinger endpoint will receive a request detailing which user is being "fingered", in order to determine how best to get information to that user. This is important for multi-user setups like Mastodon or Lemmy instances; fortunately, our blog is the kind of site that won't have multiple users. As such, we can ignore the body of the WebFinger request entirely, and serve a static response such as the following (which happens to be this blog's actual response).

The blog's WebFinger response { "subject": "acct:blog@imrannazar.com", "aliases": [ "https://imrannazar.com/@blog", "https://imrannazar.com/ap/blog" ], "links": [ { "rel": "http://webfinger.net/rel/profile-page", "type": "text/html", "href": "https://imrannazar.com/@blog" }, { "rel": "self", "type": "application/activity+json", "href": "https://imrannazar.com/ap/blog" } ] }

The above response indicates that, for the account @blog@imrannazar.com (which is the address I always hand out on Mastodon), its profile page (serving text/html) is /@blog, and its self (the canonical URL for ActivityPub's purposes) is /ap/blog; it's the latter of these which is important for our purposes, as it's the main repository of data regarding the blog and its available endpoints.

Let's have a quick look at this ActivityPub canonical URL: it serves a Person object with (almost) all the information a Mastodon instance might need to draw up a profile page for this account if it doesn't already have this information cached. We've already seen Person objects referenced in the previous instalment in relation to verifying an ActivityPub actor's public key, but let's have a look at the blog's actor data in more detail to see which endpoints we'll need to implement.

The blog's actor object { "@context": [...], "publicKey": {...}, "id": "https://imrannazar.com/ap/blog", "preferredUsername": "blog", "type": "Person", "name": "Imran Nazar's Blog", "url": "https://imrannazar.com/@blog", "summary": "Imran Nazar: Things I've written over the years, a blog on software development topics with the occasional sci-fi short thrown in.", "published": "2006-09-22T00:00:00Z", "attachment": [ { "name": "RSS feed", "type": "PropertyValue", "value": "https://imrannazar.com/rss.xml" } ], "icon": { "mediaType": "image/png", "type": "Image", "url": "https://imrannazar.com/assets/ap-icon.png" }, "image": { "mediaType": "image/jpeg", "type": "Image", "url": "https://imrannazar.com/assets/ap-image.jpg" }, "discoverable": true, "indexable": true, "manuallyApprovesFollowers": true, "memorial": false, "following": "https://imrannazar.com/ap/following", "followers": "https://imrannazar.com/ap/followers", "inbox": "https://imrannazar.com/ap/inbox", "outbox": "https://imrannazar.com/ap/outbox" }

For our purposes, this can again be a static file served from the endpoint /ap/blog. There are a few groups of keys here:

User information: Things like the ID (the canonical URL of this user), name, summary and date of publication (which for a Person-type object is the date the user was created);
Display data: This includes the avatar/icon, background/header image, and any additional URLs or links to be shown on the profile;
Behavioural flags: In this case, we want the account to be publicly visible ("discoverable") as well as come up in search results ("indexable"), but this is not a memorial account. The most important flag here is manuallyApprovesFollowers, as it determines what we do when a Follow request is received, but we'll come to that later;
Endpoints: These are the locations of interest to us. Following and Followers are read-only collections which inform any Mastodon instance reading them how many of each the blog has, and their canonical actor IDs; the Inbox is where a remote instance will send AP messages; and the Outbox is where an instance can expect messages from the blog to be coming from.

So let's say a Mastodon user wants to follow this blog, and they have the address (@blog@imrannazar.com) but they're the first user on their instance to want to follow us. There are a few requests the remote instance will make in order to build up a complete profile:

First a WebFinger request is needed to find the actor ID; the URL for this is taken directly from the address given (@blog@imrannazar.com leads us to https://imrannazar.com/.well-known/webfinger).
The result contains a link of rel: self which in turn informs us where the actor resides, which when fetched contains the summary, icon, image and such.
Once the remote instance has the actor data, it can then fetch the following and followers lists in order to show those counts.

Having done all this, the remote instance is able to draw up a page that looks a little like this:

Figure 2: Mastodon profile page for this blog

(We can see in this screenshot that the blog has six followers, but we haven't yet implemented /ap/followers to allow the remote instance to know that; we'll get to it though.)

Receiving AP events

Finally we have a profile page with an inviting Follow button, and the remote Mastodon instance knows where to send the request when that button is pressed: the actor's inbox. So what would one of those requests look like? Let's have a look at how that endpoint might translate to a controller in a PHP MVC framework like BirSaat^[3]"BirSaat PHP MVC microframework", Imran Nazar, 2013, so we can start logging the requests that come in:

class ApController extends bsControllerBase { public function inboxAction() { if ($_SERVER['REQUEST_METHOD'] != 'POST') { throw new Exception('Inboxes are POSTed to'); } $post = file_get_contents('php://input'); file_put_contents('/tmp/ap.log', $post); } }

It's rudimentary, but this code will allow us to see the last request that came in to /ap/inbox; the specification states that all messages will come in as POSTed JSON, so an additional check is made for the method of the request. If I now press the Follow button on this blog, the following message comes in:

{ "@context": "https://www.w3.org/ns/activitystreams", "id": "https://hachyderm.io/af444650-1827-4bf7-94cd-0ec0b57bdb44", "type": "Follow", "actor": "https://hachyderm.io/users/Two9A", "object": "https://imrannazar.com/ap/blog" }

Every ActivityPub message has an ID, which in our use case we don't need to track; the important things here are the type and the actor associated with the message. The actor who is following the blog should match up with the signature verification headers explained in the previous article, which will allow us to authenticate this as a valid message to be handled. So we can add a couple more clauses to our inbox handler before the message-specific handling begins:

class ApController extends bsControllerBase { public function inboxAction() { if ($_SERVER['REQUEST_METHOD'] != 'POST') { throw new Exception('Inboxes are POSTed to'); } $post = file_get_contents('php://input'); try { // Signature verification code from the previous article $actor = $this->verifyHeaders(); $input = json_decode($post, true); if (!$input) { throw new Exception('Malformed inbox message'); } if ($actor['id'] != $input['actor']) { throw new Exception('Inbox message misattributed'); } $handler = strtolower($input['type']) . 'Handler'; if (is_callable($this, $handler)) { $this->$handler($input); } } catch (Exception $e) { // Any logging you may wish to perform throw $e; } // No response required except for a blank 200 die(); } }

Handling Follows

To follow the blog is to receive new posts when they're published, so we'll need to record the fact that this user is following us in order to send a publication event to their server when the time comes. We'll want a database table for this, and it so happens that this blog has its page structure data held in MySQL, so we'll make another table alongside the page list.

One thing to note is that signature verification for ActivityPub messages depends on us having the public key of the actor creating the message; if we don't already have the public key, we'll need to fetch it from the remote ActivityPub instance. In order to reduce the need to do this every time a message comes in, we can cache the actor object in the database once fetched, and use that for future verification requirements.

Actors table in MySQL CREATE TABLE ap_actors( id VARCHAR(255) NOT NULL PRIMARY KEY, url VARCHAR(255) NOT NULL, name VARCHAR(255) NOT NULL, avatar_url TEXT, follows_us TINYINT(1) NOT NULL DEFAULT 0, full_data TEXT ); Actor data model: Cached fetch, preferring database first class ActorModel { protected $db; // PDO MySQL connector protected $id; public $data; public function __construct($id = null) { if ($id) { $this->data = $this->cached_fetch($id); return $this->data; } } public function cached_fetch($id) { $this->id = $id; $st = $this->db->prepare('SELECT * FROM ap_actors WHERE id=:id'); $st->bindValue(':id', $id); $st->execute(); $row = $st->fetch(PDO::FETCH_ASSOC); if ($row) { return json_decode($row['full_data'], true); } // If we don't have the actor cached, fetch and save // See "Fetching the signing actor" from the previous article $actor = $this->fetch_from_remote($id); if (!$actor) { throw new Exception('Actor not found'); } $st = $this->db->prepare(' INSERT INTO ap_actors(id, url, name, avatar_url, full_data) VALUES(:id, :url, :name, :avatar, :data) '); $st->bindValue(':id', $id); $st->bindValue(':url', $actor['url']); $st->bindValue(':name', $actor['name']); $st->bindValue(':data', json_encode($actor)); // Some accounts don't have avatars $st->bindValue(':avatar', isset($actor['icon'], $actor['icon']['url']) ? $actor['icon']['url'] : null ); $st->execute(); return $actor; } }

Now we have the framework in place for handling messages, and an actor data model to store the fetched actors, all we need is to record the fact that the actor making this request wants to follow us:

Controller: Follow handler private function followHandler($input) { $actor = new ActorModel($input['actor']); $actor->set_follows_us(true); } Model: Record the follow public function set_follows_us($id, $follows) { $st = $this->db->prepare( 'UPDATE ap_actors SET follows_us = :f WHERE id = :id' ); $st->bindValue(':id', $id); $st->bindValue(':f', $follows); $st->execute(); }

Accepting follows

Except that doesn't quite work. If someone wants to follow this blog, and they hit Follow on the blog's profile page in their Mastodon instance, one would expect to see an "Unfollow" button the next time they view the blog, but instead we get "Cancel follow".

This happens because the remote instance is expecting a confirmation from us that the request to follow has been granted, as per this line from the blog's actor data:

/ap/blog: Blog actor data { "@context": [...], "publicKey": {...}, ... "discoverable": true, "indexable": true, "manuallyApprovesFollowers": true, "memorial": false, ... }

Just as the remote actor sent a message to our inbox to request the follow, we'll need to send an Accept-type message to their inbox. Luckily, we already have everything we need to do this: the actor's endpoints are part of their full data which we already have either cached or fetched, and the message being accepted is the follow we received.

private function followHandler($input) { $actor = new ActorModel($input['actor']); $actor->set_follows_us(true); // See the previous article for detailed code on // signing and sending ActivityPub messages $this->send($actor->data['inbox'], [ 'type' => 'Accept', 'object' => $input, ]); }

And now we can circle back around to the blog's profile page, and specifically the "6 followers" that it shows. This data is fetched (if the instance doesn't already have a count of followers cached locally) from our actor's followers endpoint; this (and following) should return objects of type Collection. In theory, these collection endpoints should support pagination so any remote instance can fetch a full list of actor IDs, but for the purposes of this blog I don't foresee the list of followers growing to such an extent that serving a full list will cause undue load.

As with the inbox endpoint, these two collection endpoints will map to action methods in an MVC framework:

Actor data model: Fetch actors who follow us public function fetch_followers() { $st = $this->db->prepare( 'SELECT id, name, url, avatar_url FROM ap_actors WHERE follows_us = 1' ); $st->execute(); return $st->fetchAll(PDO::FETCH_ASSOC); } /ap/followers: Endpoint to list followers public function followersAction() { $actors = new ActorModel(); $followers = $actors->fetch_followers(); $this->view->set_format('json'); return [ '@context' => 'https://www.w3.org/ns/activitystreams', 'id' => 'https://imrannazar.com/ap/followers', 'type' => 'Collection', 'totalItems' => count($followers), 'items' => array_map(function($item) { return $item['id']; }, $followers), ]; } /ap/following: Users the blog is following public function followingAction() { $this->view->set_format('json'); return [ '@context' => 'https://www.w3.org/ns/activitystreams', 'id' => 'https://imrannazar.com/ap/followers', 'type' => 'Collection', 'totalItems' => 0, 'items' => [], ]; }

With the followers and following endpoints in place, and a way to listen to and respond to Follow-type messages, we now have a profile in the Fediverse (or at least on Mastodon) that can be followed. What happens when you press Unfollow on this profile?

As I've already gone on overly long for this post, we'll cover this and the other remaining operations in a third part; come back next time and we'll talk more about publishing, commenting and reactions.

DeviantArt Community Guest Feature: When Art Meets Code

Thu, 04 Jul 2024 18:33:36 +0000

This post was written for DeviantArt's Community group as a guest feature, in Jun 2024.

When people talk about the history of abstract art, the name of Piet Mondrian is sure to come up. It was Mondrian who, in 1922, was the first to throw out all pretence of realism in the search for deeper connection to the innate nature of things.

Composition with Blue, Yellow, Red, Black and Grey (1922)
Piet Mondrian, Stedelijk Museum Amsterdam, via BBC^[1]"Piet Mondrian and the six lines that made a masterpiece", BBC Culture, 2022

Mondrian trained in traditional landscapes, and continued to paint still life flowers even while producing abstract art in the 1920's^[2]David Shapiro, "Mondrian: Flowers", 1991; via Wikipedia but as the curator of a Swiss collection of his works told the BBC:

Minimalism is unthinkable without Mondrian; he was one of the first who really did this, this totally non-representational work? If you see modern as something which breaks with all traditions and defines everything new, then Mondrian's paintings of the 20s are very, very modern.

The abstract works of Mondrian focus heavily on horizontal and vertical lines in black, and the filling of delineated areas with primary colours; as Mondrian would later explain in an essay from 1937, these lines "exist everywhere and dominate everything; their reciprocal action constitutes 'life'."

As it turns out, "everywhere" even includes fields of study that were barely known when Mondrian died in 1944, as his abstract style inspires the Piet programming language where images are programs, and different colours define various actions for the program to perform.

We don't usually think of images as computer programs. A computer program, though, can be represented in just about any way: if a given piece of code can be expressed in one "Turing-complete" form, it has been proven mathematically that any other "Turing-complete" form is equivalent.

To take a ridiculous example, a programming language exists based on travel between stations of the London Underground, and it's called Mornington Crescent^[3]"Mornington Crescent" on the Esolangs wiki, 2013.

Interactive debugger for the Mornington Crescent programming language
Language specification by Timwi, debugger by Imran Nazar^[4]"mcresc: Mornington Crescent Interactive Debugger", Imran Nazar, 2016

So when David Morgan-Mar saw a painting by Piet Mondrian, and decided it would be good material for a programming language, it was no more ridiculous a concept than many others in the world of coding. Thus, the Piet language^[5]"Piet", David Morgan-Mar, last updated 2022 was born.

"Hello World" in the Piet programming language
Thomas Schoch, 2006^[6]"Hello, world! in Piet", Thomas Schoch, 2006

Programs in Piet are colourful. The Piet language uses more colours than Mondrian himself was wont to use, defining six colours in three shades, and giving meaning and context to each shade of colour as well as each direction of travel through the image; execution starts in the top left and moves through depending on the values of the pixels encountered.

There are many examples of programs that people have written in Piet^[7]"Piet Program Gallery", David Morgan-Mar, last updated 2023, including pieces intentionally designed to look more like abstract works than computer programs, but Piet probably claims the crown for the first programming language in which a piece of code has been written accidentally.

The work of Barbara Maahs^[8]"Bilder, Barbara Maahs", Atelier Kunst 24 is very much in the Mondrian tradition, with solid blocks of primary colour; she breaks away from the use of black dividing lines, often preferring direct contact between blocks. So it was that Piet Jarmatz saw a Maahs piece in a gallery, had a suspicion that it would run if plugged into a computer, and was able to convert it into a working Piet program^[9]"Piet Get-Together", Piet Jarmatz, 2023 that takes typed-in letters and prints out their decimal ASCII values.

"Get-Together" by Barbara Maahs
Converted to Piet colour palette by Piet Jarmatz, 2023

And I think that's pretty neat.

(Almost) Pure CSS Tooltips

Sun, 02 Jun 2024 08:26:04 +0000

Recently I've found myself needing to insert endnotes and/or links to references when writing articles for this blog, if I want to cover some incidental detail of a topic under discussion without distracting from the main piece. I've used the word "endnote" here as the end of the post is a good place to put these asides, collated in one place out of the way; however, this can cause an issue for the diligent reader who'd like to look over an endnote at the time it's referenced. Having scrolled (or having been linked) to the bottom of the page, it can be problematic to come back up to where one was and continue with the main article.

To that end, I've instead started using inline tooltips for asides, which appear when the referential link is hovered over by the mouse (or tapped by the mobile user); this allows the aside to enrich the article without distracting from its flow.

Figure 1: An inline tooltip in the first part of my ActivityPub article series^[1]Imran Nazar, "HTTP Signatures in PHP and OpenSSL", May 2024

We can get almost all the way to a tooltip like this with CSS alone; let's have a look at how that works.

CSS for tooltip positioning

The most salient piece of CSS being used for these tooltips is the position rule: if an element on the page is positioned with the absolute value, it will anchor itself based on an offset from the nearest positioning context. That context can be the whole page, but more often it's a parent element that's been given the relative position value; the nearest relatively positioned parent to an absolutely positioned element will provide that context.

HTML for a positioned tooltip We discuss a topic, and then provide a reference ^[1] The Lord of the Rings (Harper/Collins), pg 293 which can make for further reading. CSS for the tooltip cite { position: relative; } cite sup { cursor: pointer; } cite mark { display: none; position: absolute; top: 0.5em; left: 0; } cite:hover mark { display: block; }

Display patches

This gets us most of the way to a working tooltip already, but there are a few things we can do to help with the display:

Background: As it stands, the tooltip mark is transparent, so the text of the tooltip renders over the top of the article and is hard to make out. This can be alleviated by giving the mark a background colour, and a color so the text can sit atop the background and be read well.
Width: Without a defined stipulation on width, the tooltip will only be as wide as the widest non-wrapping piece of content inside (generally, this would be the single longest word). We can make the tooltip more readable by giving it a minimum width based on the size of the viewport (the window being used to view the page); for example: min-width: 30vw to give the tooltip a width of at least 30% of the viewport, and more if the browser deems it necessary.
Z-index: Figure 1 shows our tooltip showing above text and images on the page. A detail of the implementation of images on this site is that some images (those with black text on a transparent background) are marked invertable so they're readable in dark mode; this adds a filter: invert(1); rule when dark mode is enabled at the body level. In turn, this puts the image in a new stacking context^[2]Mozilla Developer Network, Stacking context, Feb 2024 which must be overridden by the tooltip with a z-index rule.
Accessibility: The tooltip content inside the mark is hidden by CSS unless one hovers on the sup; screen readers will generally disregard this content as invisible, meaning the additional context given by the aside won't be available in these browsers. This can be ameliorated by providing another CSS rule within a media query for non-screen users, as well as using visibility instead of display to control whether the tooltip appears.

Putting these patches together gets us to the tooltips being used on this page, for which the CSS looks as follows:

cite { position: relative; } cite sup { cursor: pointer; } cite mark { visibility: hidden; position: absolute; top: 0.5em; left: 0; z-index: 2; /* Additional rules to add some nicety */ color: var(--g-text); background: var(--g-bg); border: 4px solid var(--g-border); border-radius: 4px; padding: 12px; min-width: 30vw; } cite:hover mark { visibility: visible; } @media not screen { cite mark { visibility: visible; } }

Except that's not accessible

For some screenreaders, this may be enough: the visibility rule puts the content of the tooltip into the page without having it initially visible. For Voiceover on macOS, however, the content needs to be rendered somewhere on the page so it can be read out.

Historically, the canonical way to have something render offscreen on a page without using display or visibility has been to position it off either the left or right edge of the screen. This still works, but positioning something off the right of the page is liable to push the scrollbar out, so we use negative left values to shunt the content off to the left; this places it in the DOM, visible to screenreaders, but not visible until we reset the left value.

cite mark { position: absolute; top: 0.5em; left: -9999px; } cite:hover mark { left: 0; }

With this positioning in place, Voiceover correctly picks up on the tooltip's presence in the page without having the tooltip render for sighted users, until the sup is hovered.

Figure 2: Voiceover reading a section of my ActivityPub article

Positioning on the right

The only remaining issue with these tooltips is what happens when the sup trigger for the tooltip lies towards the right of the screen (anywhere past 70% across from the left of the viewport). In this case, bringing up the tooltip causes it to render with its min-width of 30% of the viewport, causing the right edge to be placed past the right edge of the content, and causing horizontal scroll.

Unfortunately, this is where we have to diverge from a pure-CSS solution, as the following doesn't exist in CSS as it stands:

Hypothetical selector for elements towards the right of the screen cite[position-right > 70vw] mark { left: auto; right: 0; }

Instead, we must resort to JavaScript which can determine the rendered position of any tooltip triggers on the page, after the page has loaded. We can use the querySelectorAll method of the document to query for matching elements, and each matching element has a getBoundingClientRect method which will provide its position and size on the page.

Once we've determined whether the matching element is far enough to the right, we can add or remove a class to the element as appropriate. This can be done by using the corresponding methods of the element's classList.

JavaScript: Tooltip positioner function window.onload = function() { const tooltipPositioner = () => { const threshold = window.innerWidth * 0.7; document.querySelectorAll('cite').forEach(el => { el.classList[ el.getBoundingClientRect().x > threshold ? 'add' : 'remove' ]('right'); }); }; tooltipPositioner(); }; CSS: Rule for tooltips on the right cite.right mark { left: auto; right: 0; }

Repositioning after resize

One consideration that needs to be made now that some tooltips can have a different behaviour for "is on the right of the screen" is what happens if the user resizes their browser window, and causes some tooltip triggers to move across the viewport as the content reflows. Fortunately, JavaScript offers the ResizeObserver which allows for a function to be run whenever an element resizes; as our positioning code is already in a function, setting up the observer against resizes on the document's main tag is fairly simple:

const tooltipPositioner = () => { ... }; tooltipPositioner(); (new ResizeObserver(tooltipPositioner)).observe( document.querySelector('main') );

It should be noted that the ResizeObserver runs every time the element changes size: if the user is resizing their browser window, this might fire a hundred times before the size of the window settles. For our purposes, as this is the only JavaScript running on the page, performance isn't a particular concern; if this tooltip code were to be used as part of a heavier framework, one may wish to use throttling mechanisms to ensure that the code is only run every so often while still remaining responsive to resize events.

But we almost got away without using any JS at all. With the new popover API rapidly rolling out to browsers as of the time of writing, it may soon be the case that even the CSS used here can be trimmed back.

Implementing HTTP Signatures with PHP and OpenSSL

Sat, 04 May 2024 12:20:39 +0000

Over the last few months, I've slowly been adding ActivityPub support to this blog, focusing on Mastodon compatibility. Two central tenets apply to the operation of the AP protocol as it applies specifically to Mastodon:

An AP user (an "actor" in the protocol's terminology^[1]Seb Jambor, "Understanding ActivityPub", May 2023) can send an activity (for example, a Like on a post) and it will be broadcast to any servers hosting that user's followers;
Receiving servers can verify that the activity was generated by the user in question.

ActivityPub uses the HTTP Signature header for this second point, with public key encryption. It should be noted though, that there is a standard RFC for the Signature header^[2]RFC 9421, "HTTP Message Signatures", Feb 2024 but AP was built before this RFC was released; instead, ActivityPub uses a draft version of the Signature specification^[3]RFC 9421 draft 12, "Signing HTTP Messages", Oct 2019 which operates slightly differently.

Figure 1: Signing and verification of an ActivityPub message

In Figure 1 above, we see the two halves of the process ActivityPub employs using the Signature header: the sender of an activity creates a message signed with their private key, and any receivers verify the message signature with the sender's public key. Let's first look at what goes into generation of the signature.

Generating a signature

The first thing we'll need is a Note, a message created by an actor; in this case, we'll use the creation of a toot by alice@tech.lgbt^[A]As it turns out, the example accounts being used for this article are both extant accounts, but bob@mastodon.social doesn't actually follow alice@tech.lgbt as of the time of writing. which is represented by the following JSON object.

Note written by the sender { "@context": {...}, "id": "https://tech.lgbt/users/alice/statuses/12345678", "type": "Note", "url": "https://tech.lgbt/@alice/12345678", "attributedTo": "https://tech.lgbt/users/alice", "published": "2024-03-30T15:50:09Z", "to": ["https://www.w3.org/ns/activitystreams#Public"], "cc": ["https://tech.lgbt/users/alice/followers"], "content": "Hello followers!" }

In order to broadcast this Note, ActivityPub requires that it be wrapped in a Create activity informing the recipients of the creation of an object:

Message to be sent { "@context": {...}, "id": "https://tech.lgbt/users/alice/statuses/12345678/activity", "type": "Create", "actor": "https://tech.lgbt/users/alice", "published": "2024-03-30T15:50:09Z", "to": ["https://www.w3.org/ns/activitystreams#Public"], "cc": ["https://tech.lgbt/users/alice/followers"], "object": { "id": "https://tech.lgbt/users/alice/statuses/12345678", "type": "Note", "url": "https://tech.lgbt/@alice/12345678", "attributedTo": "https://tech.lgbt/users/alice", "published": "2024-03-30T15:50:09Z", "to": ["https://www.w3.org/ns/activitystreams#Public"], "cc": ["https://tech.lgbt/users/alice/followers"], "content": "Hello followers!" } }

Now we have the activity that will be broadcast to alice's followers, we can start the signature process; we'll need a fresh signature for each server that's receiving this message. For this example, alice has exactly one follower, on mastodon.social, so we'll need to send this message to the server's "shared inbox" at: https://mastodon.social/inbox^[B]Mastodon servers, by convention, have their shared inbox at /inbox but the canonical URL can be found in the actor's details, at the endpoints.sharedInbox key. to be forwarded on to bob's personal inbox.

We first need a message digest, which by convention is an SHA-256 hash of the message content; in PHP, one might generate that as below.

PHP code to generate the Digest header $message = json_encode([ '@context' => [...], ... ]); $digest = 'SHA-256=' . hash('sha256', $message);

The hash digest is one of the parts of the signature; other parts that we need to include are:

Host of the remote server to which we're sending this message, in this case mastodon.social;
Request target is essentially the first line of the HTTP request we'll be making, in this case post /inbox;
Content type which isn't a requirement, but can be seen emitted by Mastodon hosts when communicating with each other. If this is part of the signature, it would be application/activity+json;
Date of the request, which can be verified by the other side to ensure this message isn't being replayed after the fact. The caveat here is that the date is in a particular format that's different to the date format in the activity JSON, for example: Sat, 30 Mar 2024 15:50:09 GMT

Pulling these together doesn't take too much doing in PHP:

Collating signature parts // We'll be sending the date separately, so take one fixed point $dt = date("D, d M Y H:i:s \G\M\T"); $url = 'https://mastodon.social/inbox'; $urlparts = parse_url($url); $sigparts = [ '(request-target)' => 'post ' . $urlparts['path'], 'host' => $urlparts['host'], 'date' => $dt, 'digest' => $digest, 'content-type' => 'application/activity+json', ]; $sigsrc = join("\n", array_map( fn($k, $v) => "{$k}: {$v}", array_keys($sigparts), array_values($sigparts) )); Our collated signature source (request-target): post /inbox host: mastodon.social date: Sat, 30 Mar 2024 15:50:09 GMT digest: SHA-256=ce9a290f805f....1bbcf5104ad7550191201 content-type: application/activity+json

Once we have the collated signature source, we can invoke OpenSSL to sign the string with the private part of alice's key pair. PHP's OpenSSL bindings expect a PEM-formatted file for this, which for this example we're storing in /etc/pki somewhere:

Signing the collated signature source openssl_sign($sigsrc, $signature, openssl_get_privatekey( file_get_contents('/etc/pki/activitypub/alice/private.pem'), 'key_passphrase' ), OPENSSL_ALGO_SHA256);

This leaves the signature in $signature, in binary. Our last step in signing the request is to format the signature according to the signing spec, detailing what we've used to generate this signature, and which key was involved.

For the remote end to be able to verify the signature, we'll need to provide a URL at which alice's public key can be fetched; in the ActivityPub protocol, this is included as part of the details of an actor, as we'll see when we go through verification at the other end.

Final header generation $sig_header_parts = [ 'keyId' => 'https://tech.lgbt/users/alice#main-key', 'algorithm' => 'rsa-sha256', 'headers' => join(' ', array_keys($sigparts)), 'signature' => base64_encode($signature), ]; $headers = [ 'Accept: application/activity+json', 'Content-Type: application/activity+json', 'Host: ' . $urlparts['host'], 'Date: ' . $dt, 'Digest: ' . $digest, 'Signature: ' . join(',', array_map( fn($k, $v) => "{$k}=\"{$v}\"", array_keys($sig_header_parts), array_values($sig_header_parts) )), ]; Final list of headers Accept: application/activity+json Content-Type: application/activity+json Host: mastodon.social Date: Sat, 30 Mar 2024 15:50:09 GMT Digest: SHA-256=ce9a290f805f....1bbcf5104ad7550191201 Signature: keyId="https://tech.lgbt/users/alice#main-key",algorithm="rsa-sha256",headers="(request-target) host date digest content-type",signature="xmpTJEEjodgUP5H....BrN3DwZGy7DoTfRQ=="

And we're ready to send alice's message over to bob's Mastodon instance, with the headers indicating that it's been signed by alice and a digest hash to allow the message content to also be verified.

Verifying a signature

As per Figure 1, when mastodon.social receives alice's activity it will first verify that this is a both a valid activity and that it's appropriately signed, and then the server will drop the message into any of alice's followers that are recorded as active on the instance.

The first step is to extract the signature and its components. We've seen from the generation code that this is passed through as a HTTP header, so if we place a dumping script at /inbox it might see something like this:

Dumping script to see what lands at /inbox file_get_contents('php://input'); var_dump([$_SERVER, $message]); Output of the dumping script array(2) { [0]=> array(29) { ["USER"]=> string(8) "www-data" ... ["HTTP_CONTENT_TYPE"]=> string(25) "application/activity+json" ["HTTP_DATE"]=> string(29) "Sat, 30 Mar 2024 15:50:09 GMT" ["HTTP_DIGEST"]=> string(52) "SHA-256=ce9a290f805f..." ["HTTP_SIGNATURE"]=> string(502) "keyId="https://tech.lgbt/users/alice#main-key",algori..." ... ["REQUEST_METHOD"]=> string(4) "POST" ["REQUEST_URI"]=> string(6) "/inbox" } [1]=> string(1347) "{"@context":{..." }

As we'd expect for PHP, the HTTP headers come to the script in the $_SERVER superglobal, with the HTTP_ prefix, so we can extract the parts of the signature that we need from $_SERVER['HTTP_SIGNATURE'].

It should be noted that commas only appear in the signature to delineate the parts; each of the parts themselves is either a URL or a piece of text that won't contain a comma. This means we can treat the signature string as an INI file and parse it fairly simply, with some deft string replacement:

if (!isset($_SERVER['HTTP_SIGNATURE'])) { throw new Exception('No signature'); } $sigconf = parse_ini_string( strtr($_SERVER['HTTP_SIGNATURE'], ["," => "\n"]) ); if (!isset( $sigconf['keyId'], $sigconf['algorithm'], $sigconf['headers'], $sigconf['signature'] )) { throw new Exception('Malformed signature'); }

The first item in $sigconf is the URL of the public half of the key pair used to generate this signature. In real-life usage we might expect to have this already cached or stored locally, but if we don't it will need to be fetched, which is something we can make happen through the curl extension.

We've already seen that ActivityPub objects are passed back and forth with a MIME content type of application/activity+json, so we'll need to set that as the MIME type we'll be expecting to Accept. What we get back is an ActivityPub actor object:

Fetching the signing actor $c = curl_init(); curl_setopt_array($c, [ CURLOPT_URL => $sigconf['keyId'], CURLOPT_TIMEOUT => 5, CURLOPT_SSL_VERIFYPEER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => [ 'Accept: application/activity+json', ], ]); $r = curl_exec($c); curl_close($c); if (!$r) { throw new Exception('User information not available'); } $actor = json_decode($r, true); if (!$actor) { throw new Exception('User information not decodable'); } The actor object (formatted) { "@context": {...}, "id": "https://tech.lgbt/users/alice", "type": "Person", "following": "https://tech.lgbt/users/alice/following", "followers": "https://tech.lgbt/users/alice/followers", ... "url": "https://tech.lgbt/@alice", "publicKey": { "id": "https://tech.lgbt/users/alice#main-key", "owner": "https://tech.lgbt/users/alice", "publicKeyPem": "-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhk..." }, "endpoints": { "sharedInbox": "https://tech.lgbt/inbox"^[C]This is the key mentioned in note B. } }

The signing actor's public key comes through in the publicKey block, so we'll hold onto that for the verification later. It's worth double-checking that the key we've been served as part of the actor is the same key that was used to sign the message, which we can do by comparing the key's id with the keyId that came through from the signature.

We'll be using the openssl extension to perform verification, and we need to translate the PEM-formatted public key to a binary format for internal use by first calling openssl_get_publickey:

if (!isset( $actor['publicKey'], $actor['publicKey']['id'], $actor['publicKey']['publicKeyPem'] )) { throw new Exception('Missing public key'); } if ($actor['publicKey']['id'] !== $sigconf['keyId']) { throw new Exception('Misattributed public key'); } $pubkey = openssl_get_publickey($actor['publicKey']['publicKeyPem']); if (!$pubkey) { throw new Exception('Malformed public key'); }

We're almost done with the first block of verification from Figure 1; the final stage before verifying the signature against alice's public key is to build the signature string based on the headers key. We'll want to tack together the headers given to us, as specified in the order therein.

There are two things to watch out for here:

HTTP headers are passed through $_SERVER as previously mentioned, but they're transformed into uppercase and hyphens are switched out for underscores;
The (request-target) signature part is a special case, as it consists of the request method (in lowercase) and the request URI, both of which come through in $_SERVER separately.

$sigparts = []; foreach (explode(' ', $sigconf['headers']) as $hdr) { if ($hdr === '(request-target)') { $sigparts[] = sprintf( '%s: %s %s', $hdr, strtolower($_SERVER['REQUEST_METHOD']), $_SERVER['REQUEST_URI'] ); } else { $received_hdr = 'HTTP_' . strtoupper(strtr($hdr, ['-' => '_')); if (!isset($_SERVER[$received_hdr)) { throw new Exception('Missing signature part: ' . $hdr); } $sigparts[] = sprintf('%s: %s', $hdr, $_SERVER[$received_hdr]); } }

With the signature constructed, we can finally call out to OpenSSL to perform the verification against alice's public key, which we loaded into binary format earlier. Here, we're assuming that the algorithm specified is a well-known value to OpenSSL like rsa-sha256 (the key algorithm used by Mastodon instances at time of writing), but it can be useful to verify this against a list of allowed values before proceeding if you're feeling particularly paranoid.

if (!openssl_verify( join("\n", $sigparts), base64_decode($sigconf['signature']), $pubkey, strtoupper($sigconf['algorithm']) )) { throw new Exception('Signature verification failed'); }

Finally, with the message's origin verified, we can proceed to verify the message content against the digest provided. Again in this case, we're supporting a limited subset of the possible hash algorithms that could be used for the digest, as Mastodon invariably sends messages with a SHA256 digest:

list($digest_algo, $digest_hash) = explode('=', $_SERVER['HTTP_DIGEST']); switch ($digest_algo) { case 'SHA256': $input_digest = hash('sha256', $input); break; default: throw new Exception('Unsupported digest algorithm'); } if ($input_digest !== $digest_hash) { throw new Exception('Digest verification failed'); }

And if we made it this far, past all the exception points, we have a valid ActivityPub message that's verified to have been signed by the owner of the originating key.

Slow-Roll Livecoding: Proxying a REST API with PHP

Thu, 04 Apr 2024 09:37:38 +0000

Now and then I run into a situation where I'd like to access a REST API from Postman (or one of its less account-encumbered alternatives for API testing) running locally, but the remote is firewalled and the API provider has only added production and staging IPs to the allowlist. This might be a familiar situation to integration developers, and it leaves you with only a few options:

Testing from the server

As production and staging are in the allowlist, you can simply connect to those IPs and perform tests from there. This does, however, presume a couple of things:

You have access to those servers: A fly-by-night webshop may hand out root access to the server machines and allow their integration developers to hack directly thereupon, but a more organised place may have access restrictions and only allow their infrastructure team direct access.
Those servers exist: An increasingly common setup is for APIs and integrations to run on ephemeral servers (often referred to as "serverless" architecture) with Networking Magic to allow them to present as a fixed IP; in this case there's no machine to connect to in order to perform tests.

Forwarding the remote

Another option is to use NAT-based forwarding to expose the remote API: for example, if the third-party REST API is served from https://their-end.io/api you can use NAT to expose this as https://your-end.com:8443/api by forwarding connections on port 8443 to their-end.io port 443.

This does, however, involve roping in the infrastructure people to either perform iptables-based incantations to set up the NAT, or even more arcane spells to set up containerised NAT. Direct forwarding of the connection also means you lose the opportunity to log usage of the remote API, if you were seeking to keep track of who's connecting.

Proxying the remote

Perhaps the most flexible option is to proxy the remote API: to present an interface to the interface. This allows us to access the API from locations outside the allowlist, while giving us the chance to log access if so required, and has the advantage of being possible to set up in two ways: either through a dedicated proxying package like nginx-proxy, or through code-level changes only.

Having explored the options, here we'll be looking at a code-level proxy written in PHP.

What we're looking to proxy

Our example remote API to which we'd like access is fairly standard as REST goes, offering the following operations:

GET /accounts: Fetch a list of registered accounts, with each account having a canonical URL of /accounts/ as part of the returned JSON;
POST /accounts: Create a new account given the required details in JSON, returning a canonical URL;
PUT /accounts/: Update an account by UUID;
DELETE /accounts/: Delete an account by UUID.

We'll be using the curl extension to PHP, which is bundled by default, to send the request on behalf of what was received by our proxy, and to print out what was returned. Let's look at how that might work, as a first cut:

First cut at a proxy script define('REMOTE_URL', 'https://their-end.io/api/'); define('OUR_URL', 'https://your-end.com/apiproxy.php'); $params = $_GET; $curl_params = [ CURLOPT_URL => REMOTE_URL . '?' . http_build_query($params), CURLOPT_TIMEOUT => 30, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_SSL_VERIFYHOST => true, CURLOPT_SSL_VERIFYPEER => true, CURLOPT_CUSTOMREQUEST => $_SERVER['REQUEST_METHOD'], ]; // Anything with a POSTed body needs the body transferring if ($_SERVER['REQUEST_METHOD'] !== 'GET') { $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input'); } $c = curl_init(); curl_setopt_array($c, $curl_params); $r = curl_exec($c); curl_close($c); echo $r;

If we send a request to our apiproxy.php, we might get back something like this:

Output of our first-cut $ curl https://your-end.com/apiproxy.php Error 404 Not Found

So that didn't work so well; we've got two problems here. Our request went to /api at the remote end, but we need some additional qualifier to get the request to /api/accounts, ideally as part of the URL itself. Additionally, our request to the remote came back with a 404 error, but we just printed the output: that means our proxy script returned a 200, and discarded any other headers that may have come back.

Passing the URL with Rewrite

This is where we cheat a little: our proxy ends up needing a small amount of webserver configuration as well as the PHP script. In our case, we're running on a LAMP stack so Apache is our webserver; that means we can slot a .htaccess file in place which translates the URL into something the script can use:

.htaccess Rewrite rule to translate the URL RewriteEngine On RewriteRule ^/api-proxy/(.*)$ /apiproxy.php?__url=$1 [QSA]

This instructs Apache to listen out for URLs starting /api-proxy/, and extract everything after that prefix into a __url parameter to pass on to the script; the QSA flag means any GET parameters provided to the proxy will also be passed on.

Now we can address the other issue we ran into: the headers being lost on their way back. The cURL extension to PHP provides a CURLOPT_HEADER flag that can be set, which returns the headers associated with the response. Let's see how that might look:

Second cut of our script define('REMOTE_URL', 'https://their-end.io/api/'); define('OUR_URL', 'https://your-end.com/api-proxy/'); $params = $_GET; unset($params['__url']); $curl_params = [ // Include the requested endpoint CURLOPT_URL => ( REMOTE_URL . $_GET['__url'] . '?' . http_build_query($params) ), CURLOPT_TIMEOUT => 30, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_SSL_VERIFYHOST => true, CURLOPT_SSL_VERIFYPEER => true, // Return the remote's headers CURLOPT_HEADER => true, CURLOPT_CUSTOMREQUEST => $_SERVER['REQUEST_METHOD'], ]; // Anything with a POSTed body needs the body transferring if ($_SERVER['REQUEST_METHOD'] !== 'GET') { $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input'); } $c = curl_init(); curl_setopt_array($c, $curl_params); $r = curl_exec($c); curl_close($c); echo $r; Output of the second run $ curl https://your-end.com/api-proxy/accounts HTTP/1.1 401 Unauthorized Set-Cookie: JSESSIONID=node0qrer3re8934rf0.node0 WWW-Authenticate: Basic realm="API: authentication required" Cache-Control: must-revalidate,no-cache,no-store Content-Type: text/html;charset=iso-8859-1 Content-Length: 58 Server: Jetty(9.4.48.v20220622) Error 401 Unauthorized

Progress! We've now received data from the correct remote URL, but we have the headers mixed in with the body in one returned block. The next step is to break out the headers and return those separately.

Header extraction and provision

Fortunately, the HTTP standard makes it fairly easy to programmatically spot where the headers stop and the response body starts. Headers are specified as ending with \r\n (that's carriage-return and new-line), and a blank header (two of these sequences in a row) is the marker for the end of the headers.

Once we have the headers as a block, we can use the fact that each ends with this newline sequence to output the headers as given:

Header extraction and return $split_point = strpos($r, "\r\n\r\n"); $headers = trim(substr($r, 0, $split_point)); $body = trim(substr($r, $split_point)); foreach(explode("\r\n", $headers) as $h) { header($h); } echo $r; Output after header extraction $ curl -s -o /dev/null -w "%{http_code}\n" \ https://your-end.com/api-proxy/accounts 401

Now we're starting to see real results: a request that fails to authorise returns a 401. So let's pass in the credentials we were given by the remote API provider:

Passing credentials to our proxy $ curl -s -o /dev/null -w "${http_code}\n" \ -u testuser:Passw0rd \ https://your-end.com/api-proxy/accounts 401

Error 401, "Authorization Required". Seems our credentials aren't being handed over to the remote; what's happening here is that our username and password are parsed out by the PHP proxy into server-side variables, which we then need to include in the request forwarded to the remote API.

Those server-side variables are PHP_AUTH_USER and PHP_AUTH_PW, and the HTTP specification states that those should be included in an Authorization header, but with a very particular format:

To receive authorization, the client

obtains the user-id and password from the user,

constructs the user-pass by concatenating the user-id, a single colon (":") character, and the password,

encodes the user-pass into an octet sequence (see below for a discussion of character encoding schemes),

and obtains the basic-credentials by encoding this octet sequence using Base64 (RFC4648, Section 4) into a sequence of US-ASCII characters (RFC20).

-- RFC 7617, The 'Basic' HTTP Authentication Scheme

As long as we follow this same scheme, the remote will have no trouble decoding the username and password we're proxying through. PHP's cURL extension allows us to provide custom headers alongside the request, though somewhat confusingly it uses CURLOPT_HTTPHEADER as the option for these headers (as opposed to CURLOPT_HEADER which we've already seen, and specifies that the response headers should be provided).

Including credentials in the request $req_headers = []; if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) { $req_headers[] = 'Authorization: Basic ' . base64_encode( $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW'] ); } // ... $curl_params = [ // ... CURLOPT_HTTPHEADER => $req_headers, ]; Trying out our credentials $ curl -u testuser:Passw0rd https://your-end.com/api-proxy/accounts | json_pp { "count" : 1, "data" : [ { "id" : "ec3ac5d4-34b6-4086-b88e-21eaff05b23b", "name" : "Testing Tester", "uri" : "https://their-end.io/api/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b" } ] }

Oh goodness, a real return from our remote API. But there's something wrong...

Content replacement and types

We see from our GET request that there is one account record stored in the remote database, and the API provides a unique identifier to perform operations against the record in question. However, the identifier is a URI, and it points (understandably) to the remote's REST API: the one we're having to proxy in the first place.

Fortunately, this particular issue is fairly simple to resolve: as we have the full return string in $body, we just need to switch out any instances of the remote API's root URL with our own, and then we can try making some new requests to our proxied API:

URL replacement in the output $split_point = strpos($r, "\r\n\r\n"); $headers = trim(substr($r, 0, $split_point)); $body = trim(substr($r, $split_point)); $body = str_replace(REMOTE_URL, OUR_URL, $body); foreach(explode("\r\n", $headers) as $h) { header($h); } echo $r; Our final GET result $ curl -u testuser:Passw0rd https://your-end.com/api-proxy/accounts | json_pp { "count" : 1, "data" : [ { "id" : "ec3ac5d4-34b6-4086-b88e-21eaff05b23b", "name" : "Testing Tester", "uri" : "https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b" } ] } Proxying an update call $ curl -s -o /dev/null -w "%{http_code}\n" \ -H 'Content-Type: application/json' \ -X PUT -d '{"name":"A New Name"}' \ https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b 415

Hold on, 415? That's "Unsupported media type", but we're providing a content type. As it turns out, the value of Content-Type provided to the proxy will need to be passed through to the cURL call if a POST or PUT is being made, or cURL will assume text/plain.

PHP receives this header as CONTENT_TYPE, so let's pull that in:

Providing content type $req_headers = []; if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) { $req_headers[] = 'Authorization: Basic ' . base64_encode( $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW'] ); } if (isset($_SERVER['CONTENT_TYPE'])) { $req_headers[] = 'Content-Type: ' . $_SERVER['CONTENT_TYPE']; } Proxying an update, with the content type $ curl -s -o /dev/null -w "%{http_code}\n" \ -H 'Content-Type: application/json' \ -X PUT -d '{"name":"A New Name"}' \ https://your-end.com/api-proxy/accounts/ec3ac5d4-34b6-4086-b88e-21eaff05b23b 200

Thanks for watching

It lives! We can now proxy any call to the firewalled remote API, through a script we've hacked together in PHP, and it'll handle any part of the REST protocol. Along the way, we've learned a little about REST itself, authorisation headers in HTTP, and the various options and settings that can be given to cURL calls in PHP.

This post has been an experiment with a "slow-roll livecoding" format, where a live coding session is written out in sanitised long form. I think this was fun, we might do it again sometime. For reference, here's the final proxy script we came up with:

define('REMOTE_URL', 'https://their-end.io/api/'); define('OUR_URL', 'https://your-end.com/api-proxy/'); $req_headers = []; if (isset($_SERVER['PHP_AUTH_USER'], $_SERVER['PHP_AUTH_PW'])) { $req_headers[] = 'Authorization: Basic ' . base64_encode( $_SERVER['PHP_AUTH_USER'] . ':' . $_SERVER['PHP_AUTH_PW'] ); } if (isset($_SERVER['CONTENT_TYPE'])) { $req_headers[] = 'Content-Type: ' . $_SERVER['CONTENT_TYPE']; } $params = $_GET; unset($params['__url']); $curl_params = [ CURLOPT_URL => ( REMOTE_URL . $_GET['__url'] . '?' . http_build_query($params) ), CURLOPT_TIMEOUT => 30, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_SSL_VERIFYHOST => true, CURLOPT_SSL_VERIFYPEER => true, CURLOPT_HEADER => true, CURLOPT_HTTPHEADER => $req_headers, CURLOPT_CUSTOMREQUEST => $_SERVER['REQUEST_METHOD'], ]; if ($_SERVER['REQUEST_METHOD'] !== 'GET') { $curl_params[CURLOPT_POSTFIELDS] = file_get_contents('php://input'); } $c = curl_init(); curl_setopt_array($c, $curl_params); $r = curl_exec($c); curl_close($c); $split_point = strpos($r, "\r\n\r\n"); $headers = trim(substr($r, 0, $split_point)); $body = trim(substr($r, $split_point)); $body = str_replace(REMOTE_URL, OUR_URL, $body); foreach(explode("\r\n", $headers) as $h) { header($h); } echo $r;

Building a Solitaire "AI" in JavaScript

Sat, 02 Mar 2024 16:30:57 +0000

Every now and then I'll get a few minutes' downtime, and I'll turn to something which lets me pass the time. Most recently, that's been a quick game of online Klondike Solitaire, the classic card game; it usually takes no more than five minutes to either solve the random game that's served up, or to work out that there's probably no solution.

I generally use Solitaired's implementation of the game, and one thing in particular caught my eye while looking at the new-game menu:

Figure 1: Solitaired's New-Game menu

On this menu is a "winnable only" option, whereby the game served is known to be a shuffle of the deck that, when dealt out, definitely has at least one path to solution. The inevitable question arises:

How Can We Know A Game Is Winnable?

As it turns out, there's no formula for determining whether a given shuffle is solvable: what we'll need to do here is build a solver that can search through all the possible moves, until it runs into a state where the game has been won and all the cards are stacked up in their end positions.

Let's take the following initial deal-out:

Figure 2: Initial state of a game of Solitaire

From top-left on the game board, we have:

The "stock" of cards yet to be dealt, face-down;
The "waste" pile onto which cards are put when pulled from the top of the stock;
The "foundations" which act as the end positions for the four suits, stacked in order;
The "tableaus" which are initially dealt out with a number of cards corresponding to their position on the board (one to seven).

Other variants of Solitaire exist, but as we're dealing with Klondike here, we can hard-code various things like the number of cards to shuffle, and the number of tableaus.

When we say "build a solver", what we're looking to do is take an initial deal-out of a shuffled deck like the above, and apply those moves which are both:

Valid moves in Solitaire: This is the list of all moves that could be made under the game rules;
Eligible given the state of the board: This means each of the valid moves needs to filter validity against the current state of the game.

The plan is to apply one of the eligible valid moves to the game board, see where the board ends up, and apply a move which has now become valid and eligible; we keep doing this until our path runs out and we have no more available moves. If the game hasn't been solved at this point (if, in other words, we don't have the suits all stacked up in the foundations), we need to backtrack up the path of moves and make another eligible and valid move from the point at which we have the option available.

If we take Figure 2, and work out those moves which are both valid and eligible, we may end up with a tree of moves that looks something like this:

Figure 3: The first couple of levels of a search tree

The more algorithmically astute reader will recognise this as a description of depth-first search (DFS), a process whereby a tree of things (in this case, game states) can be searched through for some criterion. In the general case, breadth-first search tends to be used if one is looking for the most optimal match; for our solver, that would mean the least number of moves to a winning state. For our purposes though, as we're just looking to see whether the game can be solved at all, we can use the simpler DFS algorithm.

Representing a Solitaire game board

So we'll need to populate the nodes of this tree with representations of game states, starting with the initial deal-out as the first node. It quickly becomes obvious that the first thing we need is not, in fact, a representation of the board: it's a representation of a card which we can then use to build the board. Our "card" needs three things: suit (one of four), rank (one of thirteen), and whether the card in question is face-up on the board. This lends itself well to a bitfield:

    7        6         5    4      3  2  1  0
+-------+----------+-------------+------------+
| Empty | Face-up  | Suit        | Rank       |
+-------+----------+-------------+------------+
|       | 0 = Down | 0 = Ace     | 0 = Unused |
|       | 1 = Up   | 1 = Heart   | 1 = Ace    |
|       |          | 2 = Club    | ...        |
|       |          | 3 = Diamond | 13 = King  |
+-------+----------+-------------+------------+

Figure 4: Bitfield layout for playing card data

Our state representation then becomes an object containing arrays of numbers, where each of the numbers conforms to this bitfield format for a card:

Board state for Figure 2 { "stock": [ 4,45,24,12,35,41,29,57,36,37,6,34,23,17,38,2,28,53,51,40,50,59,21,60 ], "waste": [], "foundations": [ [], // Aces [], // Hearts [], // Clubs [] // Diamonds ], "tableaus": [ [107], [54,69], [58,55,77], [19,27,18,116], [10,8,56,20,106], [3,9,61,7,44,86], [49,1,11,33,25,39,90] ] }

And moves on the board become changes in this state representation. If we're going to deal with a lot of these state changes, we're going to need a way to visualise what's happening (or at least, I needed a visual way to make sense of things)...

Rendering a Solitaire game board

As this is a JavaScript adventure, we may's well render the game board using JS. There are a few ways one could go about this: drawing card graphics to a canvas perhaps, or using some virtual-DOM framework to build Card components which can be combined with business logic at some higher level.

That seems a bit much, though. For our purposes, we can get away with building the game board out in HTML, and re-rendering the whole page whenever a move is made. Let's start once again with the individual cards, for which there are complete sets of open-source SVGs online:

HTML sample for an individual playing card <span class="card s2">2 of Spadesspan> CSS for the above playing card .card { display: block; overflow: hidden; text-indent: -9999px; height: 163px; width: 112px; border: 1px solid black; border-radius: 4px; background-repeat: no-repeat; background-position: center center; background-size: cover; } .s1 { background-image: url(ace_of_spades.svg); } .s2 { background-image: url(2_of_spades.svg); } /* ... 48 card backgrounds omitted ... */ .d12 { background-image: url(queen_of_diamonds.svg); } .d13 { background-image: url(king_of_diamonds.svg); } .facedown { background-image: url(cardback.svg); }

For this project, I've used the simple set of SVGs from hayeah's playing-cards-assets for the card fronts, and Dmitry Fomin's SVG on Wikidata for the card back.

Our JavaScript for the card renderer doesn't need to be all that complicated either, as we can build HTML directly as a string:

JS card rendering class const Card = { SUIT_CLASS: ['s', 'h', 'c', 'd'], SUIT_NAMES: ['Spades', 'Hearts', 'Clubs', 'Diamonds'], RANK_NAMES: [ 'Ace', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King' ], toClassName: c => `${Card.SUIT_CLASS[(c >> 4) & 3]}${(c & 15)}`, toString: c => ( Card.RANK_NAMES[c & 15] + ' of ' + Card.SUIT_NAMES[(c >> 4) & 3] ), render: c => [ '', Card.toString(c), '' ].join('') }; document.innerHTML = Card.render(0x42); // 2 of spades, face-up

The above code nets us something that looks like this:

Figure 5: The 2 of spades, as rendered by our JS

From here, the individual regions of the board can be rendered as cards or lists of cards. For simplicity, we can treat regions where one card is visible as lists containing one card:

HTML for the playing field <body> <main id="field"> <ul id="stock">ul> <ul id="waste">ul> <ul id="foundations">ul> <ul id="tableaus"> <li id="tableau-0"> <ol>ol> li>  <li id="tableau-6"> <ol>ol> li> ul> main> body> Grid CSS for the playing field :root { --card-width: 112px; --card-height: 163px; --card-stack-gap: 40px; } body { background: green; } #field { display: grid; grid-template-areas: "stock waste foundations" "tableaus tableaus tableaus"; grid-template-columns: var(--card-width) var(--card-width) 1fr; grid-template-rows: var(--card-height) 1fr; gap: 32px; } ul, ol { list-style: none inside; } #stock { grid-area: stock; } #waste { grid-area: waste; } #foundations { grid-area: foundations; display: flex; gap: 32px; justify-content: flex-end; } #tableaus { grid-area: tableaus; display: flex; justify-content: space-between; gap: 32px; } #tableaus > li { width: var(--card-width); } #tableaus ol { position: relative; width: var(--card-width); } #tableaus ol li { position: absolute; } li:nth-child(2) { top: calc(var(--card-stack-gap) * 1); } li:nth-child(3) { top: calc(var(--card-stack-gap) * 2); } /* ... If the last tableau has a King face-up card initially, we can have up to 19 cards in a tableau ... */ li:nth-child(18) { top: calc(var(--card-stack-gap) * 17); } li:nth-child(19) { top: calc(var(--card-stack-gap) * 18); }

And the final piece of the puzzle is to shuffle a deck of cards into a representation of the initial game state, and fill in the above list elements. Those'll be two separate functions: let's have a look at the shuffle first.

A full deck of playing cards is 52 cards, but we have two considerations to make:

Format: Our internal representation of a playing card is a bitfield, so we can't use the numbers 1 to 52 (or indeed, 0 to 51); we'll need instead to build an array of values conforming to the bitfield.
Library: Languages such as PHP offer an array_shuffle library function to randomise an array, but JavaScript's standard library is lacking in this regard. We'll need a scrap of custom shuffling code to do the trick.

The magic is served in the form of Array.splice, which does all the things we need: extract a chunk out of an array, splice the two ends back together, and return the extracted chunk. Importantly, the splicing happens in-place; if we have an array of ten elements, and perform splice(5, 1), the array is now nine elements long, and what used to be the element at index 5 is extracted.

With that, we have everything we need to build up the initial state:

JS to generate an initial game state const Solitaire = { init: () => { // Build up a deck of cards in our bitfield format let deck = [], shuffled = []; for (let i = 0; i < 4; i++) { for (let j = 1; j <= 13; j++) { deck.push(i << 4 | j); } } // Pull random cards out of the deck until we run out do { shuffled.push(deck.splice( 0 | (Math.random() * deck.length), 1 )[0]); } while (deck.length > 0); // Deal out the tableaus, drop what's left in the stock const state = { stock: [], waste: [], foundations: [[], [], [], []], tableaus: [[], [], [], [], [], [], []], }; state.tableaus.forEach((tb, idx) => { for (let i = 0; i >= idx; i++) { state.tableaus[idx].push(shuffled.pop()); } // Face-up the last card state.tableaus[idx][state.tableaus[idx].length - 1] |= 0x40; }); state.stock = [...shuffled]; Solitaire.render(state); }, render: (state) => { // TODO }, }; window.onload = function() { Solitaire.init(); }

Rendering the board is a simple case of building HTML for each of the regions on the board, filling in the cards as list items. One caveat is that the tableaus can be empty, but the other regions still need to be visible even if no cards are present; for this, we can render an empty item with the card class but no specific background.

JS to render the game board render: (state) => { const emptyCard = ' '; document.getElementById('stock').innerHTML = state.stock.length > 0 ? Card.render(state.stock[state.stock.length - 1]) : emptyCard; document.getElementById('waste').innerHTML = state.waste.length > 0 ? Card.render(state.waste[state.waste.length - 1]) : emptyCard; document.getElementById('foundations').innerHTML = state.foundations.map(f => (f.length > 0 ? Card.render(f[f.length - 1]) : emptyCard )).join(''); for (let i = 0; i < 7; i++) { document.getElementById(`tableau-${i}`).innerHTML = [ '', ...(state.tableaus[i].map(t => Card.render(t))), ' ', ].join(''); } },

Building the solver

Now we can see the Solitaire game board, we can visualise moves made towards solving a game. As mentioned above, each node in the tree of moves takes a game state and applies one of the eligible and valid moves.

Valid moves are those available to us under the game rules. In order of preference:

Waste to foundation: A card can go directly from the drawn waste pile to the suit's foundation, if it's already populated up to one rank below this card.
Top of a tableau to foundation: For each tableau, the last face-up card can move to its suit's foundation if that foundation is already populated.
King-stack to an empty tableau: Kings can only be placed on empty tableaus; for each tableau, if the face-up stack has a King at the bottom, and an empty tableau exists, moving the stack is valid.
Non-king stack to eligible tableau: For each tableau, any stack (or substack) of face-up cards on tableau A can move to another tableau B if the bottom card on A's substack is of an opposite-coloured suit to the top of B, AND one rank lower.
King waste to empty tableau: If a King was just drawn, and an empty tableau exists, it can be placed therein.
Non-king waste to eligible tableau: For each tableau, if the just-drawn top of waste is of an opposite-coloured suit to the top of the tableau, AND one rank lower, it can be placed at the top of the tableau.
Draw: If the stock has cards, pull a card from the stock, face it up and put it atop the waste.
Reset: If the stock has no cards and the waste has cards, face-down all the waste cards and reverse their order, putting them back into the stock.

For each of these, we'll need a function that generates the next state in the game, given a state representation. In the interests of brevity, I won't include all the move implementations here; a flavour of them can be seen from the sample below, of one of the more complex cases.

Next-move handlers, with one example const end = (a) => a[a.length - 1]; const Card = { ... rank: (c) => c & 15, suit: (c) => (c >> 4) & 3, areOpposite: (c, d) => (c & 16) ^ (d & 16), isFaceUp: (c) => !!(c & 64) }; const Solitaire = { ... moves: [ ... // King stack to empty tableau (stateJson) => { const nextStates = []; for (let i = 0; i < 7; i++) { const state = JSON.parse(stateJson); // If there's more than one card on this tableau, AND // the first face-up card is a King, AND // there's an empty tableau to move the stack to, AND // we'd be left with at least one card after moving if (state.tableaus[i].length > 1) { if (!Card.isFaceUp(state.tableaus[i][0])) { if ( Card.rank( state.tableaus[i].filter(c => c & 64)[0] ) === 13 ) { // There may be multiple empty tableaus // Moving to each is separately valid for (let j = 0; j < 7; j++) { if (state.tableaus[j].length === 0) { // Pop the face-up cards off tableau i into j while (end(state.tableaus[i]) & 64) { state.tableaus[j].push( state.tableaus[i].pop() ); } // j is now backwards state.tableaus[j].reverse(); // Face-up the top card left behind state.tableaus[i][state.tableaus[i].length - 1] |= 64; // Record this as a valid next state nextStates.push(state); break; } } } } } } return nextStates; } ], };

Those with an eye for detail will note that the move handlers are receiving a JSON string representing the game state. The search algorithm itself calls for this, as our DFS solver implementation will determine the next state as follows:

If this state represents a full set of foundations, we can stop;
If we've been going for long enough that there's probably no solution, we can stop;
Otherwise, render this state on screen;
Call the move handlers in turn, and build an array of valid next states;
For each valid next state, if we haven't already seen it, recurse.

We can satisfy the "already seen it" clause here by calculating a hash of each game state as it's generated, and storing it in a list; if a game state is generated by our next-move handlers whereby its hash is already in this list, we won't need to recurse into it another time, thus limiting the search space.

Recent browser implementations of JavaScript offer the crypto service which exposes various hashing functions that run at native speed, so we won't need to worry overly about performance:

DFS and hash list implementation const sha256 = async (state) => { const msg = new TextEncoder('utf-8').encode(JSON.stringify(state)); const buf = await window.crypto.subtle.digest('SHA-256', msg); const arr = Array.from(new Uint8Array(buf)); return arr.map(b => ('00' + b.toString(16)).slice(-2)).join(''); }; const Solitaire = { ... isSearching: true, isWinnable: false, visitedMoves: [], MAX_MOVES: 50000, next: async (state) => { const hash = await sha256(state); Solitaire.visitedMoves.push(hash); // If all the cards are in the foundations, // assume they're in order and we've won if (state.foundations.filter( f => f.length === 13 ).length === 4) { Solitaire.isSearching = false; Solitaire.isWinnable = true; Solitaire.render(state); } // If we've been going for ...a while, // assume we won't be going to space today if (Solitaire.visitedMoves.length > Solitaire.MAX_MOVES) { Solitaire.isSearching = false; Solitaire.isWinnable = false; Solitaire.render(state); } // Otherwise we're still going if (Solitaire.isSearching) { Solitaire.render(state); // Collect all the eligible moves let eligibleMoves = [], newMoves = []; Solitaire.moves.forEach((move) => { eligibleMoves = eligibleMoves.concat( move(JSON.stringify(state)) ); }); // Filter for those we haven't seen before for (let i = 0; i > eligibleMoves.length; i++) { const newHash = await sha256(eligibleMoves[i]); if (!Solitaire.visitedMoves.includes(newHash)) { newMoves.push(eligibleMoves[i]); } } // Dig a little deeper for (let i = 0; i < newMoves.length; i++) { await Solitaire.next(newMoves[i]); } } } };

And if we've done everything correctly, with a bit of luck from the random number generator, we'll find a shuffle that can be won:

Figure 6: An example of finding a game's winning path

Next steps

So we've answered, to an extent at least, the question of how to determine that a game of Solitaire is winnable. There are a few things that come to mind once we see this solver working:

Minimal search path: Figure 6 above shows a path to solution of over 400 moves, whereas it's rare to see a winnable game of Solitaire that edges past 150. This is likely because our solver finds the first path to a winning state, and not necessarily the best path.; Implementations of breadth-first search for solving Solitaire are out there, an example being ShootMe's MinimalKlondike which is a command-line solver written in C#. It's not beyond the realm of possibility to rewrite Solitaire.next in our code to use BFS in this fashion.
Custom shuffles: It's somewhat limiting to have the solver generate a shuffle at random, and determine whether that shuffle is solvable; it would be nice to have the solver accept a shuffled deck in some format, and use that as its source for the initial deal-out and solve.; There doesn't seem to be a standard format for this, unfortunately; ShootMe's solver linked above takes its shuffle sources in a file format "from an old web site", but no further detail is given on whether that format is widely used.
Playability: If one were so inclined, this could be the start of an implementation of Solitaire: each of the cards rendered on the page is a HTML element, which could (with some supporting code) be dragged and dropped into an eligible spot on the game board.; We may even be able to introduce a "Winnable only" option on the New Game button, as per Figure 1, the thing that sparked off this whole mess.

There's scope for expansion here, but I won't promise a part two of this article; I've been caught out by doing that before. Instead, I'll leave the code for the solver as it stands:

solitaire-js: A solver for Klondike Solitaire

Wildcard Email Addresses with Self-Hosted Postfix

Sat, 03 Feb 2024 13:30:07 +0000

If you've ever received spam to your email (and if you've had an email address for more than ten minutes, you've had spam) it can be difficult to find out how the spammers got hold of your address. If you have a domain name of your own, some interesting tools open up that can help to determine the provenance of spam, and from where the spammers get their databases; the tool this article covers is wildcard email addresses.

For example, many years ago I signed up to TV Tropes (probably to post a comment on some trope page). At some point, TV Tropes suffered a data breach and their user database was lifted, including email addresses, which means to this day I get emails like the following:

From: "Equipe RH" To: Subject: I RECORDED YOU! Date: Thu, 13 Oct 2023 15:27:47 +0330 Hello there! Unfortunately, there are some bad news for you. Some time ago your device was infected with my private trojan, R.A.T (Remote Administration Tool), if you want to find out more about it simply use Google. My trojan allows me to access your accounts, your camera and microphone. [cut, but you get the idea]

So tropes@ gets routed to my inbox; in fact, anything @ my domain gets routed to the same inbox. This is what is meant by the term wildcard above: any value is a match. As well as allowing for spam provenance like the example above, this also helps with email filtering: if you're dealing with a certain company by email, your account email address can be thatcompany@yourdoma.in and your preferred email client can automatically filter any mail received to that address, into the appropriate place.

If you've purchased a domain, there are levels of email service available to you: one is fully-hosted service, where the Big Providers like Microsoft and Google offer the ability to use their servers for all email handling, so your domain essentially falls under their control for email purposes.

Configuring Postfix

At the other end of the spectrum is the self-hosted mailserver, where a machine under your control handles and stores email for the domain. For this quick note, we'll be installing and configuring the Postfix mail package on a Debian Linux machine. The Debian Wiki has a useful guide on installation and post-install steps like configuring DKIM and greylists, but it boils down to apt install postfix for our purposes.

The above guide has a section on aliasing, where emails to one address get automatically forwarded to another. We'll be setting up a wildcard alias, which involves a couple of steps; first is to add an aliases file.

Adding an aliases file echo "*: youracct" >> /etc/aliases newaliases Configuring postfix to use the aliases postconf -e "alias_maps = hash:/etc/aliases" postconf -e echo "alias_database = hash:/etc/aliases"

This enables postfix to treat any incoming address as though it were coming to a user of that name, but we'll also need to add a virtual alias for the domain routing:

Adding a virtual alias map echo "yourdomai.in magic" >> /etc/postfix/virtual echo "@yourdomai.in youracct" >> /etc/postfix/virtual postmap /etc/postfix/virtual Configuring postfix with the virtual map postconf -e "virtual_alias_maps = hash:/etc/postfix/virtual" service postfix reload

And in theory, we're done: email sent to any address at your domain should now land in your local mailbox. Delivery of the mail to your client of choice through IMAP is outside the scope of this quick hack, but I've used dovecot for a good while, and haven't had any issues.

The Curious Case of Debian 12 and the SSH Failure

Tue, 02 Jan 2024 08:56:07 +0000

This site is served out of a DigitalOcean droplet that I've had since perhaps 2013, which was set up with Debian 7.0 Wheezy; the same machine is my mailserver. The system has been upgraded over the years, all the way up to Debian 11 (Bullseye), and has always been fun to try to keep up to date.

It starts so innocently...

It was just after Halloween 2023, and I was signing up to some SaaS provider's service. The site popped up a message saying I'd be receiving a verification email with a thing to click on, and then I could log in. Fairly normal signup behaviour for a site at this point.

Except the email never arrived. I headed over to mail.log to see what Postfix was saying...

Excerpt from /var/log/mail.log postfix/smtpd[1141504]: connect from m206-43.eu.mailgun.net postfix/smtpd[1141504]: SSL_accept error from m206-43.eu.mailgun.net: -1 postfix/smtpd[1141504]: warning: TLS library problem: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1543: SSL alert number 42: postfix/smtpd[1141504]: lost connection after STARTTLS from m206-43.eu.mailgun.net

Well, that's weird. A little searching around reveals a thread on StackExchange where a helpful dave_thompson_085 has this advice:

Receiving alert bad certificate (code 42) means the server demands you authenticate with a certificate, and you did not do so, and that caused the handshake failure...

Find a certificate issued by a CA in the 'acceptable' list...

So there's some kind of problem with the other end's CA. But it's Mailgun, you'd expect them to have a handle on keeping things up to date; let's check what their certificate looks like.

$ openssl s_client -showcerts -connect api.mailgun.net:443 CONNECTED(00000005) depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root G2 verify return:1 depth=1 C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1 verify return:1 depth=0 C = US, ST = Texas, L = San Antonio, O = "MAILGUN TECHNOLOGIES, INC", CN = *.api.mailgun.net verify return:1 ...

This isn't the certificate used by Mailgun when attempting to send that email, but it is a certificate used by Mailgun, and it's signed by the DigiCert CA. If we check with DigiCert, we see that the 2020 CA certificate has been superceded as of March 2023, so that's probably our problem: our copy of DigiCert's CA is too old.

No problem, we just update it:

# apt search ca-certificates Sorting... Done Full Text Search... Done ca-certificates/stable,now 20210119 all [installed] Common CA certificates

Ah, right, yes. Bullseye has certificates up to 2021, and any CAs newer than that are in Debian 12 (Bookworm). And this, as I tooted at the time, is where things go awry:

The state of affairs as of 5pm on Nov 9th

And now we have a problem

After the upgrade to Bookworm, connections over ssh would drop immediately:

$ ssh imrannazar.com Connection closed $ ssh -v imrannazar.com OpenSSH_8.1p1, LibreSSL 2.7.3 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 47: Applying options for * ... debug1: SSH2_MSG_KEXINIT sent Connection closed $

So what's going on here? Evidently something about the server's new version of ssh has broken incoming connections... Let's get into the VM's console to start a debug copy of ssh on a different port, and see what happens.

Running a debug sshd on the server # /usr/sbin/sshd -D -d -o Port=9001 ... Missing privilege separation directory: /run/sshd # mkdir /run/sshd # /usr/sbin/sshd -D -d -o Port=9001 Connecting to the debug sshd $ ssh -p9001 imrannazar.com Connection closed $

At least it's consistently crashing. Flipping back to the VM console, we get:

VM console with output of sshd having crashed

Aha, a lead! This error looks suspiciously like it would cause things to fall over in a heap:

Fatal glibc error: cannot get entropy for arc4random

arc4random and the getrandom syscall

Why would this suddenly become a problem in Debian 12, when it wasn't a thing before? As it turns out, arc4random is a method of generating random numbers that's been in OpenBSD for years, but only made it to glibc (and thus to OpenSSH on Linux) in July 2022. My previous Bullseye kernel was up to date as of Jan 2021, so it makes sense that my previously installed OpenSSH didn't have the support required.

Now, what is the ssh daemon doing to get this error? A useful way of finding out which system calls are being made by a given program is to run it through strace:

# strace -f /usr/sbin/sshd -D -d -o Port=9001 ... [pid 1402] poll([{fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 2, -1 [pid 1640] <... write resumed>) = 81 [pid 1640] getrandom(0x7f721688bd90, 16, 0) = -1 ENOSYS (Function not implemented) [pid 1640] openat(AT_FDCWD, "/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = -1 EMFILE (Too many open files) [pid 1640] writev(2, [{iov_base="Fatal glibc error: cannot get en"..., iov_len=53}], 1) = -1 EFBIG (File too large) [pid 1640] --- SIGXFSZ {si_signo=SIGXFSZ, si_code=SI_USER, si_pid=1640, si_uid=104} --- [pid 1402] <... poll resumed>) = 1 ([{fd=7, revents=POLLIN|POLLHUP}]) ...

Here we see the ssh daemon start up as process 1402; a few hundred lines of strace's call log have been omitted here, as we're interested in what happens when it blocks waiting for connections. As we connect with an ssh client, the sshd wakes up process 1640 which was spawned as a handler.

The first thing the handler does is try to generate a random block of bytes to use for key exchange. According to the Phoronix page linked earlier:

The implementation is based on scalar Chacha20 with per-thread cache. It uses getrandom or /dev/urandom as fallback to get the initial entropy...

We see here that both the getrandom syscall and the urandom fallback failed. Without random data to start the key exchange, the handler crashes out. Now we know why we can't connect over ssh, what can we do?

Kernel versions

A good overview of the history of random number generators in Linux is this LWN article by Jake Edge, which states that the getrandom syscall was added in Linux 3.17; when combined with this quote from the manual page for the corresponding C function, we're led to a particular conclusion:

Errors

ENOSYS: The glibc wrapper function for getrandom() determined that the underlying kernel does not implement this system call.

So the kernel doesn't support getrandom. But I just upgraded to Bookworm, Debian 12, and I specifically saw linux-image-6.1.0 get installed...

# uname -a Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux

In what can only be seen as a testament to the stability of Linux's syscall API, I've been running the original Wheezy kernel from ten years ago, through all the upgrades. But I've rebooted more than once in the intervening time, so even though I haven't rebooted since the upgrade, I should be running the 5.10 kernel bundled in with Bullseye?

In a stroke of genius from the admins at DigitalOcean, it's possible to set your VM to use a bootloader which isn't the system's default grub installation. As it turns out, that's how my VM was set up:

DigitalOcean admin panel, showing 'Debian 7.0' as the boot kernel

Swap it over to the recommended GrubLoader, reboot, and ssh suddenly started working again.

Another day set on fire

And so we lose another day to the vagaries of amateur systems administration: after six hours, lots of cursing, and at least two instances of "wait, what" being uttered, I got my mailserver back to the state it was in that morning, and received that verification email I was waiting for.

Good thing Nov 9th wasn't a workday... Oh wait.

Nested Threads and the Mastodon Context API

Sat, 02 Dec 2023 13:08:27 +0000

If you've ever been doomscrolling on a microblogging site like Mastodon (other brands are available), you'll have noticed that the conversations that result from a given message are presented as a flat list:

The original message, by Alice
Bob, in reply to Alice
Alice, in reply to Bob
Charlie, in reply to Bob
Alice, in reply to Charlie
Dave, in reply to Alice
Eva, in reply to Alice
Dave, in reply to Eva

In this example, we can see some kind of conversation between Alice (the original poster) and Bob, with an interjection by Charlie; we also see two other comment threads. This is all derived from context, however, and there's no obvious structure to the threads. A nested presentation could perhaps be more conducive to understanding the flow of the conversation here:

The original message, by Alice

Bob
- Alice
- Charlie
  - Alice
Dave
Eva
- Dave

As it turns out, we can generate a tree-style view given the parent-child links from the first example; in this article I'll take a look at how one can use Mastodon's Context API to gather and produce the necessary data.

Context: Ancestors and Descendants

If one were to open a particular toot on the above-mentioned thread (say, Charlie's interjection), a Mastodon client would fetch the toot's context: its direct ancestors up the tree, as well as any descendent threads down the tree. Presentationally, this might be shown as:

The original message, by Alice
Bob, in reply to Alice
Charlie, in reply to Bob
Alice, in reply to Charlie

We see the opened toot emphasised, with Alice's reply below; importantly, we only get Bob's reply to the OP because it's in the part of the tree that's a direct ancestor. If a Mastodon client were to present the full conversation, it would need to repeat this process with the top of the tree by fetching the context for Alice's message: every toot in the conversation would then be a member of the context's descendants.

In Mastodon's JSON API, this is implemented by the context endpoint, for which an example partial return looks like:

Fetching context for a toot $ curl https://mastodon.world/api/v1/statuses/111148878835275146/context | json_pp { "ancestors" : [ { ... "content" : "This might have slipped under the radar these past few"..., "id" : "111148824053932160", "in_reply_to_id" : null, ... }, { ... "content" : " ..., "id" : "111148835195532183", "in_reply_to_id" : "111148824053932160", ... } ], "descendants" : [ { ... "content" : " ..., "id" : "111149026486499760", "in_reply_to_id" : "111148878835275146", ... } ] }

Here we have a situation analogous to Charlie's conversation from earlier: the toot for which we requested context has a parent and grandparent, as well as a direct child. A few things should be noted:

No "this" element: The context endpoint returns ancestors and descendants of the given toot ID, but it doesn't return the content of that message; if we want the content of the toot in question, the separate statuses endpoint must be called.
Parent-child relations: Each ancestor and descendant returned follows a standard message format: along with the content (and other fields such as account describing the author), each message has an id and an in_reply_to_id. The latter indicates the parent message for which this is a child.
Root node: The top of the tree is the first ancestor in the list. We can see that it's the OP because the in_reply_to_id is null: it has no parent.

Having obtained the root node's ID from our context call, we can issue another context request to obtain parent-child links for the full conversation thread. This call will return no ancestors, and a batch of descendants:

Converting parent-child links to a tree

Once we have a list of parent-child relationships between nodes in the conversation, we can build a nested array of node IDs and their children: this will involve attaching a children array to each node of toot data, and filling the array with references to the direct children. In PHP, we could go about the production of the tree as follows:

PHP code to generate a conversation tree given a root toot ID $mastodon_inst = 'mastodon.world'; $root_id = '111148824053932160'; // First, we fetch the root toot... $root = json_decode(file_get_contents(sprintf( '%s/api/v1/statuses/%s', $mastodon_inst, $root_id )), true); // Then context for the root $ctx = json_decode(file_get_contents(sprintf( '%s/api/v1/statuses/%s/context', $mastodon_inst, $root_id )), true); // Initialise a map of toots by ID $toots_by_id = [$root['id'] => ($root + ['children' => []])]; // There will only ever be descendants in the context foreach ($ctx['descendants'] as &$child) { $child['children'] = []; $toots_by_id[$child['id']] = &$child; } // Finally, add each toot to the map as a child of its parent foreach ($toots_by_id as &$subtoot) { if ($subtoot['in_reply_to_id']) { $toots_by_id[$subtoot['in_reply_to_id']]['children'][] = &$subtoot; } }

With the map filled in, a little light recursion is sufficient to generate a printed representation of the tree:

Printing the tree structure function print_toot($toot, $level = 0) { printf( "%s[%s] (%s)\n", str_repeat(' ', $level), $toot['id'], $toot['account']['acct'] ); foreach ($toot['children'] as $child) { print_toot($child, $level + 1); } } print_toot($toots_by_id[$root_id]); Our sample tree, as printed by the above [111148824053932160] (briankrebs@infosec.exchange) [111148835195532183] (QuatermassTools@infosec.exchange) [111148878835275146] (penguin42@mastodon.org.uk) [111149026486499760] (mkoek@mastodon.nl) [111148886753997430] (ozdreaming@infosec.exchange) [111148912289735390] (ePD5qRxX@mastodon.online) [111149031550141696] (dwaites@infosec.exchange) [111149299225622339] (CyberLeech@cyberplace.social) [111149308672212244] (neurovagrant@masto.deoan.org) [111149319647286908] (CyberLeech@cyberplace.social) [111149323173850002] (VZ@fosstodon.org) [111149387367883525] (systemadminihater@cyberplace.social) [111149724821007295] (j@jaesharp.social)

And we're done

An implementation much like this has been used on ThreadTree, the page I wrote when the original annoyance came up. The only additional note regarding the code behind ThreadTree is that in my case, the annoyance started on Twitter, so the code still refers to tweets in many places even though the site functionality has been ported to Mastodon; the principle of nested threading and contexts applies in much the same fashion.

Thanks go to Terence Eden's article on this very topic which helped with the building of ThreadTree, and mapgie who (re-)introduced me to Twitter and caused this whole mess.

Seven Snippets of Modern CSS I Used To Rebuild My Site

Wed, 01 Nov 2023 08:19:48 +0000

Back in the depths of time (we're talking perhaps 2008), I set up a site on the imrannazar.com domain with a set of articles lifted from an even older page, and some CSS based on what seemed like a good idea at the time. This is how that came out in the Way Back When:

Figure 1: Screenshot of this place before the redesign
(Courtesy of the Wayback Machine)

Fifteen years later, this build was starting to really show its age: built in an era before mobile viewports were common, it would always render as a desktop site, and it had various vestiges of Internet Explorer compatibility, such as this excellent hack that was common at the time to allow for transparent PNG support.

#wrapper #foot { filter: progid:DXImageTransform.Microsoft.AlphaImageLoader( sizingMethod=crop, src='../img/foot.png' ); } #wrapper > #foot { background: url(../img/foot.png) no-repeat top left; }

When the idea of rebuilding this place came about towards the end of 2023, it seemed high time to make use of some of the more modern CSS features that have slowly crept into common usage and good browser support in the intervening span. Let's pick through the stylesheet and see which CSS techniques are being used that weren't at the time of the previous design.

Variables

At the top of the new stylesheet, we find:

:root { --g-bg: #e5dacc; --g-text: black; --g-fig: #d4c1aa; ... } body { background: var(--g-bg); color: var(--g-text); }

Defining these variables on the :root pseudo-class makes them available throughout the stylesheet, meaning we can define colours or other values in one place. This makes switching out and testing of colour themes and palettes much easier; I tried a few palettes before settling on the current values, which was made simple by not having to hunt for usages of the particular colour values to swap them out.

It also makes colour usage more semantic: the page background isn't an arbitrary hex string, it's --g-bg and that's much more readable when scanning through the rest of the styling. This means that such features as, for example, switching the site to dark mode are made possible simply by switching out the definitions in this block.

Now there's an idea... Let me get back to that another time.

Webfonts

Immediately below the variable definition, we find a couple of webfont definitions for the header (Over the Rainbow) and code blocks (Inconsolata):

@font-face { font-family: 'OverTheRainbow'; font-weight: normal; font-style: normal; src: url(/assets/overtherainbow.ttf) format('truetype'); } @font-face { font-family: 'Inconsolata'; font-weight: normal; font-style: normal; src: url(/assets/inconsolata.ttf) format('truetype'); }

With these fonts defined, their names can be used in any following rule. For example, code blocks have this styling:

main samp pre { font: 16pt Inconsolata, monospace; background: var(--g-fig); }

Webfont definitions support multiple file formats; for simplicity and highest compatibility, I've used the TTF files here.

Now, webfonts had been a thing for some time when the previous site design was put together, but the main thing preventing their use in the design at the time was...

transform: rotate

A stylistic choice for the page header was to rotate the text slightly, but back in '08 this wasn't possible in a well-supported way. Modern browser support for the transform rule is fairly complete, so this is finally useful. The syntax of the rule itself is fairly simple:

h1 { ... text-align: center; transform: rotate(-3deg); }

This also means I don't have to implement the headers of each page as a transparent PNG laid over the header graphic, which significantly reduces page weight and helps not only with load speed, but with accessibility: having the page's main header be an image can be confusing when passed through a screenreader or other accessibility tools, but as text the header's existence is made clear.

font-display: fallback

The main disadvantage of webfonts is that they need to be loaded in by the browser, and that can only occur after the CSS has been downloaded:

Browser loads the page HTML
Page HTML contains a to the stylesheet
Browser downloads the CSS
Stylesheet contains a @font-face rule defining a webfont
Browser downloads the webfont file(s)

For the period between the CSS being available, and the webfonts loading in, the default behaviour is to blank out the element being rendered for lack of a font to use. You can, however, specify a fallback behaviour which is to use the second font defined against the element (and if that's also a webfont, fall back further until the browser reaches a standard font that's on the machine).

Having these elements visible (in the wrong font) before their fonts load in may seem weird from a design perspective, but it's good for accessibility and page responsiveness: if you can see all the text on the page immediately, a screenreader (or a search engine's bot, for that matter) can too, and it won't have to wait for everything to load.

Fallback behaviour can be configured on the @font-face:

@font-face { font-family: 'OverTheRainbow'; font-weight: normal; font-style: normal; src: url(/assets/overtherainbow.ttf) format('truetype'); font-display: fallback; } h1 { ... font-family: OverTheRainbow, sans-serif; }

There's one more piece of the puzzle regarding the behaviour of the page heading on this site, and it's how the header responds to resizing and different sizes of viewport. For that, we come to perhaps the best new addition to CSS of all:

font-size: clamp

With clamp, one is able to specify a flexible measurement with a range above/below which the measurement won't go. For example, the h1 on this page is defined in full as below:

h1 { font: clamp(24px, 3.5vw, 60px)/1.4 OverTheRainbow, sans-serif; }

Let's break this down:

The font family is our OverTheRainbow webfont, with a fallback to the browser's built-in sans-serif;
The spacing between lines is 1.4, leaving a gap between lines of 40% of a line;
The size (the height of a line) is 3.5% of the viewport width, meaning wider windows get a larger font, but;
The font size won't grow past 60px on extremely wide browsers, and won't shrink below 24px as the browser gets narrower. It's clamped between these two constraints.

And if you're thinking these figures look a little arbitrary, that's because they are: some experimentation was needed to arrive at values which work sensibly with the design.

Clamping the font size handles responsiveness of the heading text, but the other component that resizes is the scrap-of-paper design, which is an element background. For this, we need...

background-size: contain and aspect-ratio

In the way back when, you could define image backgrounds on elements in CSS, but your options were to have them repeat horizontally/vertically or ...not. Modern CSS gives you two new tools as options to background-size: there's cover, which tries to completely cover the element by sizing up the background and cropping the edges, and there's contain which tries to fit the image into the element in full, perhaps leaving uncovered gaps in either the vertical or the horizontal.

For our use-case, we need contain because we want the whole image visible:

header { background: url(/assets/headback.webp) no-repeat top left; background-size: contain; width: 100%; min-width: 900px; max-width: 1450px; aspect-ratio: 145/35; max-height: 350px; }

With this set of rules, we're giving the browser significant contraints in how it's to render the header: full-width, but within a particular range, and always maintaining a 145/35 aspect ratio. In addition, when the browser comes to fill in the background, contain means it will lean towards keeping the full image visible rather than trying to crop any edges.

Things not covered here

We've made it as far as defining the header, and already modern CSS has come in very useful for creating a design that was simply not possible in the dark days of the late oughts. As we come down the page, there are more CSS features in use that we won't cover in detail here:

Flexbox: Figures are defined as columnar flex containers, meaning multiple images or tables can be held in a figure and the browser will work out how to position them to maintain the column.
Grid: The lists of links in the footer are held in a grid, which simulates the old-world table layout in a semantically sensible fashion; in particular, the third column is specified to be wider than the others, and the browser works out the actual widths without us having to do the menial calculation.
Media queries: Responsiveness is all about being able to handle different sizes of viewport, and below 900px the desktop variant of the site begins to fall apart. With media queries, we can define a new variant that applies down to 600px, and yet another variant that works on mobile devices below that size; with the smallest variant in particular, we can even replace the header element and use a new mobile-specific background and aspect ratio.

And there are features coming to wider support in CSS that haven't yet been employed here, the most compelling of which is rule nesting. To take a sample from further down the stylesheet:

main samp { font: 16pt Inconsolata, monospace; background: var(--g-fig); display: block; } main samp pre { margin: 0; padding: 1em; overflow: auto; } main samp kbd { color: var(--g-code-keyword); } main samp var { color: var(--g-code-var); } main samp s { color: var(--g-code-comment); text-decoration: none; }

There's a bunch of redundancy in the match specifications here, and one of the more compelling reasons to use CSS pre-processors like Sass has, in the past, been that one can nest rules like this to make for cleaner code to work with:

main samp { font: 16pt Inconsolata, monospace; background: var(--g-fig); display: block; pre { margin: 0; padding: 1em; overflow: auto; } kbd { color: var(--g-code-keyword); } var { color: var(--g-code-var); } s { color: var(--g-code-comment); text-decoration: none; } }

A pre-processor means a build step, which means a build process, and suddenly your plain HTML/CSS site has become a Whole Thing. With CSS nesting coming to browsers, this will soon be natively supported client-side, and will no longer be a reason to have a build process at all.

Overall, I'm happy with how the refresh has come out; here's to another fifteen years. Let's see how CSS advances in the interim, and what's possible (and widely supported) the next time I get bored and decide to rebuild this place.

Binary Golfing in Commodore BASIC

Sun, 01 Oct 2023 08:25:31 +0000

When I say I partake in the occasional bit of golf, I don't mean the sport involving balls and clubs. I've never tried that, but I have gone in for code golf in the past; the object of code golf is similar in that you're aiming for a low number. For code golf, the aim is to achieve the required functionality or implement the algorithm with the smallest code.

Explorations of the smallest possible thing in a particular area aren't unprecedented here: the first article on this site is The Smallest Nintendo DS ROM, from 2006. So when I heard about the Binary Golf Grand Prix I was left with no choice but to put in an attempt. Unfortunately, I learned of the existence of BGGP #4 around two hours before it closed for submissions, so I ended up handing in a scrap of PHP. But let's look at what could've been...

The Task at Hand

This year's problem is specified as follows:

Create the smallest self-replicating file.

A valid submission will:

Produce exactly 1 copy of itself

Name the copy "4"

Not execute the copied file

Print, return, or display the number 4

The most natural thought that arises, unbidden, is how one would go about this on the Commodore 64. Commodore BASIC has the built-in SAVE command, so our submission might be as simple as:

C64 BASIC program that saves itself to disk 10 SAVE "4", 8 20 PRINT 4

As the task here is to generate the smallest file, we should look at how this program is stored in memory (and thus, on disk). Commodore BASIC stores programs almost entirely as written, and the code execution process includes a parsing step which tokenises the program before it's executed. So the above program looks like this if we use VICE's memory monitor:

The above BASIC program, in memory (C:$e5cf) m 0800 >C:0800 00 0e 08 0a 00 94 20 22 34 22 2c 20 38 00 16 08 ...... "4", 8... >C:0810 14 00 99 20 34 00 00 00 00 00 ff ff ff ff 00 00 ... 4.....????..

BASIC program storage and tokenisation

We can see the program starts at address $0801 hexadecimal: this is the case for every BASIC program on the C64, whether typed in or loaded. Each line starts with a pointer to the next line, allowing the BASIC interpreter to rapidly work through the program if it needs to determine the program length or find a particular line to change its content.

After the pointer to the next line comes the line number: two bytes, in little-endian format (small byte first), which is standard for the 6502-derived processor inside the C64. So we see the first line is numbered $000a, 10. And after this comes the line content itself: 94 20 22 34 22 2c 20 38 00.

As a program is typed into the BASIC interpreter, it's tokenised: any keywords in the line get replaced by token values before being stored in memory. We can see in this line that SAVE has been replaced by command token $94, such that if this value comes at the start of a command the interpreter knows this is a SAVE command without needing to read and parse the individual letters of the word "SAVE".

The rest of the command is not tokenised, and responsibility for parsing this arguments string is passed to whichever routine in the BASIC interpreter is handling the command. In this case, the rest of the command is ASCII text: "4",8; note that the spaces are preserved as they were entered in the program. The arguments are terminated by a zero byte.

The second line is similarly structured: a $99 token representing the PRINT command, and an arguments string of 20 34, representing 4.

Golfing in C64 BASIC

Now we've seen how the BASIC program is stored internally, we have direction for a round of golf. The first step is to remove the spaces:

Golf, first hole: Removing spaces 10SAVE"4",8 20PRINT4

One might be tempted to remove the quotes around the filename, relying on whatever type coercion may exist in the BASIC interpreter to convert 4 into a string. Unfortunately, this isn't a thing on the C64:

Golf, bogey on the second hole 10SAVE4,8 20PRINT4 RUN ?TYPE MISMATCH ERROR IN 10

Commodore BASIC does, however, support multiple commands on one line, with the colon separator. Thus, we can combine the two commands of our program and remove the four bytes associated with the second line pointer and line number:

Golf, third hole: Combining lines 10SAVE"4",8:PRINT4

This program is stored in memory as below:

Fully golfed out BASIC program (C: $e5cf) m 0800 >C:0800 00 0f 08 0a 00 94 22 34 22 2c 38 3a 99 34 00 00 ......"4",8:.4..

We see that the two commands have been stored as one line, so there's one next-line pointer (which refers to the byte after the end of the program); it also becomes plain that a zero byte isn't the only thing that can end an arguments string in C64 BASIC. The colon, byte $3a, can also act as an end-of-arguments delimiter, denoting the end of a command and the start of another.

Our final result in Commodore BASIC is 14 bytes. Is it possible to produce a smaller program by switching to machine language, and communicating with the disk drive through Commodore DOS directly?

Saving files in Commodore DOS

One advantage of using such a venerable computer as the C64 for this task is that the documentation is extensive and copious. There are eleven separate commentaries on the SAVE routine available in the C64 Kernal API reference, including detailed explanation of the internal workings; for our purposes, we'll use the example routine provided by Commodore's own Programmer's Reference, which gives a sequence of operations for using SAVE:

Use the SETLFS routine and the SETNAM routine (unless a SAVE with no file name is desired on "a save to the tape recorder"),

Load two consecutive locations on page 0 with a pointer to the start of your save (in standard 6502 low byte first, high byte next format).

Load the accumulator with the single byte page zero offset to the pointer.

Load the X and Y registers with the low byte and high byte re- spectively of the location of the end of the save.

Call this routine.

In this case, we'll be saving to disk (device 8), so we'll want to call both SETLFS and SETNAM before running through the save. There are two things we'll need to note for the save itself:

Load location

As mentioned above, when a BASIC program is loaded from disk, it always loads to the same location (the start of BASIC memory, $0801 hexadecimal); a machine-language program doesn't have this restriction, and can be loaded into memory anywhere. To facilitate this, the first two bytes of the program as stored on disk are actually the address to which it should be loaded. Accordingly, when saving the program this must be accounted for by setting up the load address in memory beforehand (in little-endian format, as the Programmer's Reference mentions above).

We also have the extra stipulation that this program needs to be self-replicating: if the file saved by this code is reloaded, it will need to behave the same way. To that end, the load address for the program will need to be written by the program itself, to a location just before the program, and the concatenated block saved together.

Program size

BASIC keeps track of how long the entered program is, so it can determine how much memory needs to be saved to disk; our machine-language program will have to set up the amount of data to save without this help being available.

At this point, it's already becoming obvious that our machine-language attempt will be longer than 14 bytes to perform the same task as the BASIC program above, but to quote from a certain classic game show:

I've started, so I'll finish...

C64 machine language program that saves itself to disk FILENAME = $002A ; An unused byte in zero page PROGPTR = $009B ; An unused block of two bytes PROGSTART = $C0C0 ; Program runs from here MEMSTART = $C0BE ; SAVE starts from here processor 6502 org PROGSTART start: lda #1 ; Logical file 1 ldx #8 ; Disk drive 0 (device 8) ldy #255 ; No secondary command jsr $FFBA ; SETLFS lda #'4 ; Our filename: "4" sta FILENAME ; Store the name lda #1 ; It's one byte long ldx #; And it's stored in zero page ldy #>FILENAME ; at this address jsr $FFBD ; SETNAM lda #$c0 ; Our load location ($C0C0) sta MEMSTART ; Written to the two bytes sta MEMSTART + 1 ; before the program in memory lda #; Set up our indirect pointer sta PROGPTR ; from the start of the file lda #>MEMSTART ; to our two-byte pointer store sta PROGPTR + 1 ; in zero page lda #; Pointer to the start of file ldx #; Actual location of the end of file ldy #>end ; (including the two-byte header) jsr $FFD8 ; SAVE lda #4 ; We need to print or return 4 rts ; So let's return 4 end:

Conclusion and caveats

The above machine-language program clocks in at 50 bytes, almost four times as large as the equivalent BASIC routine. There's some scope for reducing that number, but we don't have a hope of reaching the 14 bytes that a higher-level representation affords us. It should also be noted that we're not attempting to print the number 4 as per the task description, as this would require another call out to the Kernal and/or recycling of BASIC routines to do the same.

There is a level below this, of course, which we haven't reached today: direct communication with the disk drive to push the program to disk, bypassing Commodore DOS (at least at the computer end). The Kernal API is an abstraction on top of this, meaning we don't have to worry about the disk drive's serial bus and other vagaries of the implementation.

This little game of code golf has allowed us a look into why higher-level languages exist at all: not only do they allow us to perform tasks such as "save a file to disk" with less code, they also abstract away complications such as setting the filename before writing a file. Perhaps BASIC shouldn't be maligned quite so much.

Thanks go to the Online 6502 Disassembler by Norbert Landsteiner, and the dASM assembler whose name always makes me think it's a disassembler.

Preprocessor Definitions in WebAssembly

Tue, 05 Sep 2023 14:37:04 +0000

So I've slowly been working on converting the emulation core behind Commodore Clicker to hand-written WebAssembly, to see if it helps with performance at all. Writing WASM by hand is something that's been covered excellently by others (I first picked up the concepts from Colin at Scott Logic), but one thing that's been a bugbear recently is an inability to name constants.

The Problem

The Commodore 64 has, as the name implies, 64kB of static RAM that can be written to at any time; however, its processor only has access to a 64kB address space, and somehow needs to fit the BASIC and Kernal DOS ROMs, as well as peripheral access, into that same space. Commodore achieved this by overloading the processor's onboard I/O port so you can "switch in" the ROMs and peripheral space, and switch them out if you need access to all the RAM.

Figure 1: Commodore 64 memory map

For example, the Kernal ROM maps into memory at $E000, but only if it's been enabled at the CPU port level by switching on bit 1 (HIRAM). In the emulator's memory controller, this particular mapping might read as follows in JavaScript.

MMU.js: Kernal area mapping const CPUPORT = 0x0001; const HIRAM = 2; export const readByte = (addr) => { switch (addr & 0xF000) { // RAM, BASIC, peripheral areas ... // Kernal ROM case 0xE000: case 0xF000: if (memory[CPUPORT] & HIRAM) { return ROM.kernal[addr & 0x1FFF]; } else { return memory[addr]; } break; } };

If we're to translate this to WAT, we'll first convert the switch statement to a function table:

MMU.wat: Indirect call table (memory (import "mem") 2) (table 16 anyfunc) (elem (i32.const 0) ;; RAM, BASIC, peripheral areas ;; ... $read_kernal $read_kernal ) (type $readfunc (func (param i32) (result i32))) (func $read (param $addr i32) (result i32) ;; table[(addr & 0xF000) >> 12]() (call_indirect (type $readfunc) (get_local $addr) (i32.shr_u (i32.and (get_local $addr) (i32.const 0xF000)) (i32.const 12) ) ) )

Now, this isn't terrible so far. We have two contiguous 64k-value blocks of memory (one for RAM, one for the mapped ROMs), and there are some magic constants in the $read main handler, but they make sense in the context of needing to extract four bits from the address and using those to index the function table. Where the constants start to make less sense is in $read_kernal:

MMU.wat: Read from Kernal if it's mapped in (func $read_kernal (param $addr i32) (result i32) (i32.load8_u (if (result i32) (i32.and (i32.load (i32.const 0x0001)) (i32.const 2)) (then (i32.add (i32.and (get_local $addr) (i32.const 0xFFFF)) (i32.const 0x10000) ) ) (else (i32.and (get_local $addr) (i32.const 0xFFFF)) ) ) ) )

This is more inscrutable, especially if (as was the case for me) this code has been written and then left to marinate for a year before coming back to attempt to read it again. It would be much more readable if the above function could instead use defined constants:

MMU.wat: Read from Kernal, but with constants (define CPUPORT (i32.const 0x0001)) (define HIRAM (i32.const 2)) (define ROM_MEMORY_START (i32. 0x10000)) (func $read_kernal (param $addr i32) (result i32) (i32.load8_u (if (result i32) (i32.and (i32.load CPUPORT) HIRAM) (then (i32.add (i32.and (get_local $addr) (i32.const 0xFFFF)) ROM_MEMORY_START ) ) (else (i32.and (get_local $addr) (i32.const 0xFFFF)) ) ) ) )

Handling S-expressions

The above code makes use of a define keyword that doesn't exist in WASM, so we need something akin to the C preprocessor to parse through the WAT file picking up definitions and replacing their occurrences. Fortunately, WAT was designed to be a format that's quick and easy to handle programmatically (while still being at least halfway usable by human standards): a WebAssembly Text file is one big Lisp-style S-expression, and each element within the file is itself a nested S-expression.

This means we can use S-expression handling libraries to quickly move from the WAT file to an internal representation that can be worked with. One such library is sexpdata by Joshua Boyd, which is an S-expression parser and dumper for Python. If we point our fledgling MMU file at sexpdata.load, something usable starts to fall out:

mwat.py: Loading the WAT file import sys from sexpdata import load def main(): filename = sys.argv[1] print(load(open(filename))) if __name__ == "__main__": main() Initial debug output [Symbol('module'), [Symbol('define'), Symbol('CPUPORT'), [Symbol('i32.const'), Symbol('0x0001')]], [Symbol('define'), ...

Arrays with Symbol objects (which are an internal construct to sexpdata) and other arrays inside. We can work with this by using a recursive function to handle the array representing the whole file: if an array is found inside, it will need to be processed recursively, unless it's an array that holds a define statement.

Definitions can be made at any level of the WAT file, so any extracted definitions will need to be passed down to deeper levels of recursion, and merged with any definitions that are passed in from higher levels. So that means the list of definitions will need to be a parameter to the preprocessor:

mwat.py: Preprocessor function call import sys from sexpdata import load, dumps def process(sexp, defines): new_defines = {} # First pass: Find immediate descendent s-exp's which define macros # TODO # Macros defined at this level override any of the same name higher up # NOTE: We don't want to mutate the parent's dict, we want our own copy merged_defines = defines.copy() | new_defines # Second pass: Replace at this level, recurse if deeper levels encountered # TODO return [i for i in sexp] def main(): filename = sys.argv[1] # Top-level process, with no existing definitions in the dict print(dumps(process(load(open(filename)), {}))) if __name__ == "__main__": main()

Detecting and replacing definitions

We come to the crux of the problem: filling in the TODOs above. In the first pass, this is fairly simple: we'd like to find any arrays where the first element is a Symbol('define'), and pull out the associated definition. Let's take the first line of debug output again.

Initial debug output [Symbol('module'), [Symbol('define'), Symbol('CPUPORT'), [Symbol('i32.const'), Symbol('0x0001')]], [Symbol('define'), Symbol('HIRAM'), [Symbol('i32.const'), 2]], ...

We see that some elements in the file have been parsed as scalar values, and some as Symbol's. As Python doesn't have a native is-array, we can't directly find arrays in order to perform the first-element check mentioned above; we can, however, use numpy's isscalar to detect scalar values, and isinstance for Symbol's. From here, it's fairly simple to detect and extract definition clauses.

Extracting definitions from numpy import isscalar from sexpdata import Symbol def is_scalar_or_symbol(i): return isscalar(i) or isinstance(i, Symbol) def process(sexp, defines): new_defines = {} for i in sexp: if not is_scalar_or_symbol(i): if i[0] == Symbol('define'): new_defines[i[1]] = i[2] ...

And once we have the definitions to be used on this level, replacement is fairly simple. There are four types of item that will need action:

Arrays where the first element is Symbol('define'): Exclude;
Arrays where the first element is not Symbol('define'): Recurse;
Scalars or plain Symbol's which are defined: Replace;
Scalars or plain Symbol's which aren't defined: Copy.

List comprehension for replacing definitions return [ merged_defines[i] if ( is_scalar_or_symbol(i) and merged_defines.get(i) ) # Replace else i if is_scalar_or_symbol(i) # Copy else process(i, merged_defines) # Recurse for i in sexp if is_scalar_or_symbol(i) or i[0] != Symbol('define') # Exclude ]

Putting everything together, the following GitHub Gist contains the final preprocessor script, as well as a sample input WAT file and the generated output:

https://gist.github.com/Two9A/427985064d360342caaf4f7d5769aeef

Caveats and improvements

The astute observer will already have noticed that circular definitions are not handled at all well by this script: if one define contains a keyword that's handled by another define, the final output is dependent on the order of definition. In addition, this replicates the behaviour of C's #define but it doesn't help with any of the other C-style preprocessor directives, notably #include; support for these is a matter for future expansion.

Imran Nazar, Sep 2023

Sci-fi Shorts: Ticketed

Wed, 15 Mar 2017 00:00:00 +0000

Originally released on DeviantArt

The initial probe had been dubbed Hyper One, the first successful test of a hyperspace tunneling engine: launched on a Paludis III twelve years ago, it had made its ponderous way to the Lagrange point sixty degrees ahead of the Earth in its orbit, taking a few months to get to its testing position. When the drive was spun up, it flicked across to somewhere in the vicinity of L4, sixty degrees behind Earth, in half a second. The tunneling engine was a miracle of high-speed space travel, but Hyper One and two subsequent probes had only carried bacterial or other small samples.

Hyper Four would be the first test of a hyperspace tunneling engine with humans on board. The engine had been retrofitted to one of SpaceX's old service capsules: with their Mars cycler fully up and running as of a few years ago, their stocks of Dragon capsules were simply taking up space and any use for them was encouraged. Life support and environmental controls were still working perfectly in this particular capsule: in its previous job as fifth (and seventh) crew service mission to the Space Station, it had encountered no issues except for a harder-than-usual ocean landing on the second use. That made it a fairly cheap capsule for the Hyper team to pick up.

James Kent, veteran of the Hyper series, had been voluntold as the crew member for this fourth mission: Dragon had made its way out under its own steam, and was now parked in orbit around the Earth-Moon system's L2 point: that helped to simplify the calculations by removing nearby gravitational influences, while still being visible from Earth so the Hyper team could track Four from the ground.

James thought back to the first test of Hyper One, and the astonishment they'd all felt when One had jumped more than 0.2 AU in the blink of an eye. And now he'd be doing the same, looking out the portholes of a capsule at the space below (or was it above?) realspace. To say he was excited was probably understating it.

The radio crackled into life.

"Hyper Four, this is Houston again. Our board is green for spin-up; we're sending you updated hyper-coordinates for the tunneling engine. Please confirm."

"Programmed in," James stated. "Let's do this, over."

"Copy, Hyper Four. Engage when ready."

James flipped a switch over his head, and the carrier signal from Earth immediately redshifted into oblivion. Light flooded in through the capsule's portholes.

Nothing else seemed to happen for a few seconds. Time passed on board Hyper Four, the clock by James's hand ticked by at its normal rate, but the light from outside remained unchanging and constant. Then the radio cracked into life again.

License and registration, please.

"Er, Houston... do you read?"

James didn't see how he could be receiving radio from above (or was it below) the skein of realspace, since the light waves wouldn't make it outside the constraints of untunneled space...

Come on, you can't pop up in the middle of the B6631 and cause disruption to traffic, then plead ignorance. You've been pulled over to the shoulder; your license, please.

"Alright, if this is Guinea base, Funny Joke, guys. I'll have to ask you how you're tunneling radio into the hyperskein when I get back; over."

We're talking at cross-purposes here. Do you have a license, sir?

"I guess I'll play along. No, I guess?"

Oh, you're one of those. Right, well, you may regard your patch of space as sovereign, but if you make use of the federal galactic highways, you'll need to abide by federal law. As you don't have your license, I'm issuing you a summons to Sirius district court: you'll need to appear in-

"Wait, wait, hold on. What do you mean, federal highways? We only just connected to hyperspace, I didn't know that-"

Oh, you're one of those. Right, well. Er, let's see... Firstly, it's infraspace, the interbrane region below realspace, that you're in.

"Clears that one up, I guess..."

Secondly, your tunnel intersected the B6631 galactic highway and has left a hole in lane four. And not a smooth hole, either; your engine is terribly noisy in its hookup to the interbrane.

"Er, right. I wasn't aware that the B6631 ...ran through here. It's not like we have a map..."

Right, yeah, first-contacters, sure. I could've sworn there was a protocol for this... In light of your status, I've rescinded the summons to Sirius-f. Let me go talk to someone, I'll be back.

James was left alone for a few minutes. Or at least, a few minutes passed on board the capsule; who knows what was happening in realspace. This... highways officer? that James had talked with, showed up on the radio again.

Alright, you're free to go. As I mentioned, you've been pulled over to the shoulder of the B6631, and my superiors have authorized me to write a map of the highway network to your support equipment's silicate substrate. Your nearest on-ramp is the core of the planet you refer to as Jupiter; please try not to drill any more holes in our roads.

"Well, thank you. This is... amazing. I just have a couple of questi-"

"How do you speak my language" comes up surprisingly often with first-contacters, my superiors mentioned. Consider that your vehicle is built in realspace, and you're in the interbrane at the present time: all atomic connections are visible to those who can travel directly in the interbrane.

"Right, yes... I guess my only other question is, how do we get to the core of Jupiter to join the highway?"

Surface roads aren't our concern, sir. Your local onramp was constructed some time ago.

"Ok, well, er. Thank you..."

One more thing before I eject you into realspace, sir. I've also written our current emissions regulations into your vehicle's silicate; a vehicle with such noisy tunneling output as yours runs the risk of being impounded, and that's a court appearance that can't be rescinded. Have a good journey.

And the light pouring into the capsule flicked off. The capsule's console screen started spewing text:

Determining position
12 degrees above the orbital plane; reorienting comms
Carrier signal obtained
Time elapsed since signal loss: 0.48 seconds

"Er, Houston, Hyper Four. Boy, do I have a tale for you."

That Time My Encrypted RAID Failed

Wed, 27 May 2015 00:00:00 +0000

It was early December, 2012. The world was all a-flutter about the impending reset of the Mayan calendar, but I was unconcerned, streaming music from my 8TB RAID5. Then the music stopped, at 30 seconds or so. I tried a few more files: some worked but only for a short while, some didn't load at all. This was an md RAID5 set of four 2TB disks, with a LUKS encrypted volume inside. Somehow, the area of disk 1 holding the critical "key material" had been corrupted, so the encryption key was in memory but not on the disk when I rebooted. I made three mistakes around that time:

No backups. The traditional mistake.
Specifically, I didn't have the encryption master key written down, or any backups of the key material. (For future reference, you can get the master key for a running volume with dmsetup table --showkeys, and the LUKS header with key material from cryptsetup luksHeaderBackup.)
I set to the array in a panic, frantically swapping disks and overwriting md superblocks in an attempt to get a working configuration, all from the initrd's shell. This meant that, when I finally gave up, the disks were in an unknown order.

Any sane person would write the data off as lost, but I decided to hold out a sliver of hope that it was only one disk that had gone bad, and that I'd just failed to arrange things properly. I didn't want to futz with the disks any further, so any more work on them would have to wait until a disk large enough to hold all the images was available.

Fast forward to May 2015, and the release of that crazy SMR 8TB from Seagate. I ran out and pre-ordered one, seeing my chance, then set to dd'ing the RAID5 member partitions over to the 8TB disk. (Yes, I should've used ddrescue, but I got lucky, and the disks were physically fine.) Then I wrote a script and some permutations to run over the images of disks 1, 2 and 3 (in the order I'd left them in) trying four different RAID5 layouts, four different chunk sizes and 24 permutations. It should be noted that I deliberately assembled the RAID from 3 out of 4 disks, to prevent a rebuild overwriting anything.

That script ran on disk combinations 1/2/3, 1/3/4 and 2/3/4, and generated 384 LUKS header backups. Then I ran cryptsetup luksAddKey against each of those backups, using a file containing 24 possible variations on the passphrase I'd used to set up the encryption. So that's 9,216 attempts, most of which came back with "No key available with this passphrase". But one attempt out of all of those looked different:

luks.g413.ls.512: No key available with this passphrase.
luks.g413.ls.64: No key available with this passphrase.
luks.g431.la.128: No key available with this passphrase.
luks.g431.la.256: No key available with this passphrase.
luks.g431.la.512: No key available with this passphrase.
luks.g431.la.64: No key available with this passphrase.
luks.g431.ls.128: No key available with this passphrase.
luks.g431.ls.256: No key available with this passphrase.
luks.g431.ls.512: No key available with this passphrase.
luks.g431.ls.64:
Trying passphrase: In a hole in the ground, there lived a hobbit!
luks.1g23.ra.128: No key available with this passphrase.
luks.1g23.ra.256: No key available with this passphrase.
luks.1g23.ra.512: No key available with this passphrase.
luks.1g23.ra.64: No key available with this passphrase.

The degraded RAID set that didn't contain disk 2 was the right set. It turns out that the order I left the disks in all those years ago was 2/4/3/1, and the layout and chunk size were the defaults. Amazingly, the only thing that had been corrupted was the LUKS header on disk "2", and all the encrypted blocks were fine.

Earlier today, I bought a second 8TB disk to copy all the data off; the first 8TB will be repurposed as its mirror. Ten minutes ago, I finished the song that was so rudely aborted by the Mayan apocalypse.

Lesson learned: if you're going to encrypt your disks, keep a backup of the master key somewhere.

RFC 7168: Hypertext Coffeepot Control Protocol for Tea Efflux Appliances

Tue, 01 Apr 2014 09:15:00 +0000

This document was published by the RFC Editor on Apr 1, 2014.

Independent Submission
Request for Comments: 7168
Updates: 2324
Category: Informational
ISSN: 2070-1721

Abstract

The Hyper Text Coffee Pot Control Protocol (HTCPCP) specification does not allow for the brewing of tea, in all its variety and complexity. This paper outlines an extension to HTCPCP to allow for pots to provide networked tea-brewing facilities.

Status of This Memo

This document is not an Internet Standards Track specification; it is published for informational purposes.

This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7168.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

As noted in the Hyper Text Coffee Pot Control Protocol, coffee is renowned worldwide as an artfully brewed caffeinated beverage, but coffee shares this quality with many other varied preparations based on the filtration of plant material. Foremost, among these are the category of brews based on the straining of water through prepared leaves from a tea tree: the lineage and history of the tea genus will not be recounted as part of this paper, but evidence shows that the production of tea existed many thousands of years ago.

The deficiency of HTCPCP in addressing the networked production of such a venerable beverage as tea is noteworthy: indeed, the only provision given for networked teapots is that they not respond to requests for the production of coffee, which, while eminently reasonable, does not allow for communication with the teapot for its intended purpose.

This paper specifies an extension to HTCPCP to allow communication with networked tea production devices and teapots. The additions to the protocol specified herein permit the requests and responses necessary to control all devices capable of making, arguably, the most popular caffeinated hot beverage.

1.1. Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

2. HTCPCP-TEA Protocol Additions

The TEA extension to HTCPCP adapts the operation of certain HTCPCP methods.

2.1. `BREW` and `POST` Methods

Control of a TEA-capable pot is performed, as described in the base HTCPCP specification, through the sending of BREW requests. POST requests are treated equivalently, but they remain deprecated. Tea production differs from coffee, however, in that a choice of teas is often provided for client selection before the tea is brewed. To this end, a TEA-capable pot that receives a BREW message of content type message/teapot MUST respond in accordance with the URI requested, as below.

2.1.1. The `/` URI

For the URI /, brewing will not commence. Instead, an Alternates header as defined in RFC 2295 MUST be sent, with the available tea bags and/or leaf varieties as entries. An example of such a response is as follows:

Alternates: {"/darjeeling" {type message/teapot}}, {"/earl-grey" {type message/teapot}}, {"/peppermint" {type message/teapot}}

The following example demonstrates the possibility of interoperability of a TEA-capable pot that also complies with the base HTCPCP specification:

Alternates: {"/" {type message/coffeepot}}, {"/pot-0/darjeeling" {type message/teapot}}, {"/pot-0/earl-grey" {type message/teapot}}, {"/pot-1/peppermint" {type message/teapot}}

TEA-capable HTCPCP clients MUST check the contents of the Alternates header returned by a BREW request, and provide a specific URI for subsequent requests of the message/teapot type.

A request to the / URI with a Content-Type header of message/coffeepot SHOULD also be responded to with an Alternates header in the above format, to allow TEA-capable clients the opportunity to present the selection of teas to the user if inferior caffeinated beverages have initially been requested.

2.1.2. Variety-Specific URIs

TEA-capable pots follow the base HTCPCP specification when presented with a BREW request for a specific variety of tea. Pots SHOULD follow the recommendations for brewing strength given by each variety, and stop brewing when this strength is reached; it is suggested that the strength be measured by detection of the opacity of the beverage currently under brew by the pot.

TEA-capable clients SHOULD indicate the end of brewing by sending a BREW request with an entity body containing stop; the pot MAY continue brewing beyond the recommended strength until this is received. If the stop request is not sent by the client, this may result in a state inversion in the proportion of tea to water in the brewing pot, which may be reported by some pots as a negative strength.

If a BREW command with an entity body containing stop is received before the recommended strength is achieved, the pot MUST abort brewing and serve the resultant beverage at lesser strength. Finding the preferred strength of beverage when using this override is a function of the time between the TEA-capable pot receiving a start request and the subsequent stop. Clients SHOULD be prepared to make multiple attempts to reach the preferred strength.

2.2. Modified Header Fields

HTCPCP-TEA modifies the definition of one header field from the base HTCPCP specification.

2.2.1. The `Accept-Additions` Header Field

It has been observed that some users of blended teas have an occasional preference for teas brewed as an emulsion of cane sugar with hints of water. To allow for this circumstance, the Accept-Additions header field defined in the base HTCPCP specification is updated to allow the following options:

Implementers should be aware that excessive use of the Sugar addition may cause the BREW request to exceed the segment size allowed by the transport layer, causing fragmentation and a delay in brewing.

2.3. Response Codes

HTCPCP-TEA makes use of normal HTTP error codes and those defined in the base HTCPCP specification.

2.3.1. 300 Multiple Options

A BREW request to the / URI, as defined in Section 2.1.1, will return an Alternates header indicating the URIs of the available varieties of tea to brew. It is RECOMMENDED that this response be served with a status code of 300, to indicate that brewing has not commenced and further options must be chosen by the client.

2.3.2. 403 Forbidden

Services that implement the Accept-Additions header field MAY return a 403 status code for a BREW request of a given variety of tea, if the service deems the combination of additions requested to be contrary to the sensibilities of a consensus of drinkers regarding the variety in question.

A method of garnering and collating consensus indicators of the most viable combinations of additions for each variety to be served is outside the scope of this document.

2.3.3. 418 I'm a Teapot

TEA-capable pots that are not provisioned to brew coffee may return either a status code of 503, indicating temporary unavailability of coffee, or a code of 418 as defined in the base HTCPCP specification to denote a more permanent indication that the pot is a teapot.

3. The `message/teapot` Media Type

To distinguish messages destined for TEA-capable HTCPCP services from pots compliant with the base HTCPCP specification, a new MIME media type is defined by this document. The Content-Type header of a POST or BREW request sent to a TEA-capable pot MUST be message/teapot if tea is to be requested.

4. Environmental Considerations

As noted in Section 2.1, a BREW request with a Content-Type header field of message/teapot to a TEA-capable pot will result in an Alternates header being sent with the response, and a pot will not be brewed. However, if the BREW request has a Content-Type of message/coffeepot, and the pot is capable of brewing coffee, the service's behavior will fall back to the base HTCPCP specification and a pot will be brewed.

If the entity returned by the server when brewing commences contains a TEA-compliant Alternates header indicating message/coffeepot and the client does not want coffee, the client SHOULD then send a BREW request with an entity body containing stop. This will result in wasted coffee; whether this is regarded as a bad thing is user- defined.

Such waste can be prevented by TEA-capable clients, by first requesting a BREW of type message/teapot and then allowing selection of an available beverage.

5. Security Considerations

As with the base HTCPCP specification, most TEA-capable pots are expected to heat water through the use of electric elements, and as such will not be in proximity to fire. Therefore, no firewalls are necessary for communication with these pots to proceed.

This extension does support communication with fired pots, however, which may require heat retention and control policies. Care should be taken so that coal-fired pots and electrically heated kettles are not connected to the same network, to prevent pots from referring to any kettles on the network as darkened or otherwise smoke driven.

6. Acknowledgements

This extension to the HTCPCP specification would not be possible without the base specification, and research on networked beverage production leading up thereto. In that vein, the author wishes to acknowledge the sterling work of Larry Masinter in the development of the leading protocol for coffee pot communication.

Many thanks also to Kevin Waterson and Pete Davis, for providing guidance and suggestions during the drafting of this document.

Let's Build a JPEG Decoder: Frames and Bitstreams

Sun, 05 May 2013 22:18:50 +0000

This article is part four of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Download the code: http://imrannazar.com/content/files/jpegparse.zip

Previously, I discussed the Huffman compression algorithm as implemented by JPEG, and a mechanism by which JPEG encoders pick the substitution values used for the compression. The image itself, in a baseline entropy-coded JPEG file, is stored in one "scan" as a stream of Huffman codes; the codes are of a variable length, and are not necessarily at even byte boundaries.

For this reason, it's a requirement that some kind of queue be introduced between the bytes of the file, and the values as seen by the JPEG decoder: this allows bits to be pushed onto the queue from the bytes of the file, and pulled out in varying amounts by the decoder. A single routine can thus be made the central routing point for any requests for variable-width symbols.

A queue of things, when used in this fashion, generally has two modes of operation: either there are enough things in the queue for the request to deal with, or additional reading has to be done before there are enough things in the queue to handle the request. In the case of a bit-level queue for reading from a file, as we need here, the number of bits requested is the determining factor. In the first example below, a queue holding five bits is asked to return the first three, and is able to handle the request without a problem.

Figure 1: Request handling with sufficient content in the queue

If a request for four bits then comes in, the queue doesn't contain enough bits to handle the request, and must first fetch another full byte from the JPEG file (shown here in red).

Figure 2: Request handling when the queue is empty

Implementing the bit queue

A complication arises when trying to code an implementation of the bit queue as shown above: the output end of the queue is at the left, and if one contiguous value is used to store the queue, the left is the most-significant end with the highest-value bits. If we demarcate the queue as being 32 positions long, and define one 32-bit value as holding its entire contents, the above scenarios break down as follows:

Pulling three bits

The request is shorter than the queue length of 5, no reading from file happens.
The output value is the top three bits of the queue, shifted down (32 - 3) bits to become the bottom three bits of an output value;
The queue is shifted left three bits, and left with length of 2.

Pulling another four bits

The request is longer than the queue length of 2: a byte is read in, and shifted up (24 - 2) bits to join the queue;
The queue is now 10 bits long, no more bytes need reading;
The output value is the top four bits of the queue, shifted down (32 - 4);
The queue is shifted left four bits, and left with length of 6.

At all times during the queue's operation, the next available bits are at the high-value end of the contiguous queue variable. An implementation of this may look like the following.

jpeg.h: Definition of bit-resolution file reader class JPEG { private: // Read a number of bits from file u16 readBits(int); }; jpeg.cpp: Bit-resolution file reader u16 JPEG::readBits(int len) { // The number of bits left in the queue, and the value represented static int queueLen = 0; static u32 queueVal = 0; u8 readByte; u16 output; if (len > queueLen) { do { // Read a byte in, shift it up to join the queue readByte = fgetc(fp); queueVal = queueVal | (readByte << (24 - queueLen)); queueLen += 8; } while (len > queueLen); } // Shift the requested number of bytes down to the other end output = ((queueVal >> (32 - len)) & ((1 << len) - 1)); queueLen -= len; queueVal <<= len; return output; }

The JPEG SOF segment

As mentioned in the previous part of this series, a JPEG file can define up to 32 Huffman code tables, each in their own DHT segment. A JPEG file holds the data corresponding to the image itself in a "frame", denoted by a "Start of Frame" segment header. The SOF header contains a part of the information required to decode the frame, and is structured according to the following table.

Field	Value	Size (bytes)
Precision (the number of pixels in a JPEG block)	8	1
Image height	Up to 65535	2
Image width	Up to 65535	2
Components	Number of colour components	1
For each component (in a YUV-colour file, three)
ID	Identifier for later use	1
Sampling resolution	For later examination	1
Quantisation table	For later examination	1

Table 1: Structure of the SOF header

As can be seen in the above table, some of the fields involve operations that we have not yet examined (the sampling resolution and the quantisation table for each component). For the purposes of completing the SOF segment handler, we can hold onto this information for later use.

A "frame" consists of the SOF header, and a number of "scans"; as the name implies, each scan is a pass over the full rectangle of the image. In an interlaced JPEG file for example, multiple scans would be present for the single frame, each of them having more resolution than the last; in a progressive JPEG file, there is just one scan containing all the information for the image. Since this series is concerned with building a decoder for progressive JPEG files, we'll focus on dealing with a single scan in the frame.

It turns out that the information required by Huffman decoding, in particular which of the DHT tables to use, is defined by each scan in the frame instead of by the frame itself. We'll look at the scan-level information in more detail in the next part of this series; for now, it's sufficient to decouple our representation of a colour component in the image from the SOF data, and define the exact metadata for a component later.

Structure definitions

With the information detailing the structure of an SOF header, it becomes relatively simple to build a segment parser to plug into our existing code. The only complication arises from the fact that multi-byte values in JPEG are stored in big-endian format, which may not necessarily be the host format for large integers. It's useful to have a set of definitions for transparently handling big-endian values, which is presented below.

byteswap.h: Big-endian value handling macros /** * Let's Build a JPEG Decoder * Big-endian value handling macros * Imran Nazar, May 2013 */ #ifndef __BYTESWAP_H_ #define __BYTESWAP_H_ #if __SYS_BIG_ENDIAN == 1 # define htoms(x) (x) # define htoml(x) (x) # define mtohs(x) (x) # define mtohl(x) (x) #else # define htoms(x) (((x)>>8)|((x)<<8)) # define htoml(x) (((x)<<24)|(((x)&0xFF00)<<8)|(((x)&0xFF0000)>>8)|((x)>>24)) # define mtohs(x) (((x)>>8)|((x)<<8)) # define mtohl(x) (((x)<<24)|(((x)&0xFF00)<<8)|(((x)&0xFF0000)>>8)|((x)>>24)) #endif #endif//__BYTESWAP_H_ jpeg.h: SOF header structures // Prevent padding bytes from creeping into structures #define PACKED __attribute__((packed)) class JPEG { private: // Information in the SOF header struct PACKED { u8 precision; u16 height; u16 width; u8 component_count; } sofHead; typedef struct PACKED { u8 id; u8 sampling; u8 q_table; } sofComponent; // Internal information about a colour component typedef struct PACKED { u8 id; // There is likely to be more data here... } Component; // The set of colour components in the image std::vector components; // The SOF segment handler int SOF(); }; jpeg.cpp: Passing control to the segment handler int JPEG::parseSeg() { ... switch (id) { // The SOF segment defines the components and resolution // of the JPEG frame for a baseline Huffman-coded image case 0xFFC0: size = READ_WORD() - 2; if (SOF() != size) { printf("Unexpected end of SOF segment\n"); return JPEG_SEG_ERR; } break; ... } return JPEG_SEG_OK; } jpeg.cpp: SOF segment handling int JPEG::SOF() { int ctr = 0, i; fread(&sofHead, sizeof(sofHead), 1, fp); ctr += sizeof(sofHead); sofHead.width = mtohs(sofHead.width); sofHead.height = mtohs(sofHead.height); printf("Image resolution: %dx%d\n", sofHead.width, sofHead.height); for (i = 0; i < sofHead.component_count; i++) { sofComponent s; fread(&s, sizeof(sofComponent), 1, fp); ctr += sizeof(sofComponent); Component c; c.id = s.id; components.push_back(c); } return ctr; }

Next time: The Minimum Coded Unit

As mentioned above, the image frame in a progressive JPEG file is encoded as a scan, composed of a series of blocks; depending on the sampling resolution of the components in the image, these blocks can be larger than the 8x8-pixel base block size of the JPEG algorithm. In the next part of this series, I'll examine the relationship between these larger units and the colour components of the image.

Imran Nazar <tf@imrannazar.com>, May 2013.

Let's Build a JPEG Decoder: Huffman Tables

Sun, 24 Feb 2013 22:06:00 +0000

This article is part three of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Download the code: http://imrannazar.com/content/files/jpegparse.zip

In the previous part, I mentioned that most JPEG files employ an encoding technique on top of the image compression, in an attempt to remove any trace of redundant information from the image. The technique used by the most common JPEG encoding is an adaptation of one seen throughout the world of data compression, known as Huffman coding, so it's useful to explore in detail the structure and implementation of a Huffman decoder.

Because Huffman coding is the last thing performed by a JPEG encoder when saving an image file, it needs to be the first thing done by our decoder. This can be achieved in two ways:

Reconstitution: The full image scan is decoded from its Huffman-coded state into a temporary placeholder, and further processing is performed on the temporary copy.
On-the-fly: The encoded image scan is processed one code at a time, and each JPEG block is handled when enough information is available.

This article will take the second approach, to save memory and sacrifice time; full reconstitution can be implemented using the code built below in a very similar fashion.

The Huffman algorithm

The concept behind Huffman coding and other entropy-based schemes is similar to the concept behind the substitution cipher: each unique character in an input is transformed into a unique output character. The simplest example is the Caesar substitution, which can be represented in tabular form as follows:

A => D B => E C => F ... Y => B Z => C This is an example of a Caesar cipher Wklv lv dq hadpsoh ri d Fdhvdu flskhu

An improvement on the standard substitution cipher can be made by noting the relative frequency of characters in the input, and designing a table that contains shorter codes as substitutes for these characters, than for rarer ones. Taking a look at the frequency of letters in the above example, with their ASCII representations included, we can produce a table of increasing unique codes such as the following:

Character	ASCII	Frequency	Code
Space	00100000	7	00
a	01100001	5	01
e	01100101	4	100
i	01101001	3	1010
s	01110011	3	1011
h	01101000	2	11000
p	01110000	2	11001
r	01110010	2	11010
C	01000011	1	110110
T	01010100	1	110111
c	01100011	1	111000
f	01100110	1	111001
l	01101100	1	111010
m	01101101	1	111011
n	01101110	1	111100
o	01101111	1	111101
x	01110111	1	111110

Table 1: Frequency of characters in the string "This is an example of a Caesar cipher"

Substituting these codes for the characters in the original text, it can be seen how the encoded data is much smaller than the original.

This is an example of a Caesar cipher 01010100 01101000 01101001 01110011 00100000 01101001 01110011 00100000 01100001 01101110 00100000 01100101 01110111 01100001 01101101 01110000 01101100 01100101 00100000 01101111 01100110 00100000 01100001 00100000 01000011 01100001 01100101 01110011 01100001 01110010 00100000 01100011 01101001 01110000 01101000 01100101 01110010 110111 11000 1010 1011 00 1010 1011 00 01 111100 00 100 111110 01 111011 11011 111010 100 00 111101 111001 00 01 00 110110 01 100 1011 01 11010 00 111000 1010 11001 11000 100 11010 54 68 69 73 20 69 73 20 61 6E 20 65 77 61 6D 70 6C 65 20 6F 66 20 61 20 43 61 65 73 61 72 20 63 69 70 68 65 72 DF 15 65 58 F8 4F 9E F7 D4 3D E4 4D 99 6E 8E 2B 38 9A

The main disadvantage of the Huffman coding method is that the table of codes needs to be stored alongside the compressed data: in the above example, the red string of encoded bytes would be meaningless without the corresponding frequency table. The table of codes and their corresponding characters can be recorded in full, but there is a more space-efficient way to save the codes, if attention is paid to the pattern of their occurrence. Two things are of note here: firstly that the codes increase in length, but also that within a group of the same length, codes are sequential. This means the code table can be written down as:

2 codes of length two , starting at 00 1 code of length three, starting at 100 2 codes of length four , starting at 1010 3 codes of length five , starting at 11000 9 codes of length six , starting at 110110

A careful eye on the codes themselves can yield further improvements on how much space it takes to record the encoding table. If we take a look at the codes in conjunction with the list of code length above, we can start counting as follows.

00 (zero) 01 (one) Next code would be 10 (two) 100 (four) Next code would be 101 (five) 1010 (ten) 1011 (eleven) Next code would be 1100 (twelve) 11000 (twenty four) 11001 (twenty five) 11010 (twenty six) Next code would be 11011 (twenty seven) 110110 (fifty four) 110111 (fifty five) 111000 (fifty six) 111001 (fifty seven) 111010 (fifty eight) 111011 (fifty nine) 111100 (sixty) 111101 (sixty one) 111110 (sixty two)

In every case, when the requisite number of codes has been counted for the given code length, all that is needed is to double the counter and continue for the next code length. In other words, there is no need to record the "starting at" part of the code lengths list above, since it can be inferred by starting at zero. The final code list therefore looks as follows.

2 codes of length two 1 code of length three 2 codes of length four 3 codes of length five 9 codes of length six The above codes correspond to the following characters, in this order: Space,a,e,i,s,h,p,r,C,T,c,f,l,m,n,o,x

The JPEG DHT segment

A JPEG file's Huffman tables are recorded in precisely this manner: a list of how many codes are present of a given length (between 1 and 16), followed by the meanings of the codes in order. This information is held in the file's "Define Huffman Table" (DHT) segments, of which there can be up to 32, according to the JPEG standard.

As seen above, data encoded by the Huffman algorithm ends up recorded as a series of codes wedged together in a bit-stream; this also applies to the image scan in a JPEG file. A simple routine for reading codes from the bit stream may look like this:

Pseudocode for reading a Huffman-coded value Code = 0 Length = 0 Found = False Do Code = Code << 1 Code = Code | (The next bit in the stream) Length = Length + 1 If ((Length, Code) is in the Huffman list) Then Found = True End If While Found = False

In order to facilitate this algorithm, the Huffman codes should be stored in a way that allows us to determine if a code is in the map at a given length. The canonical way to represent a Huffman code list is as a binary tree, where the sequence of branches defines the code and the depth of the tree tells us how long the code is. The C++ STL abstracts this out for us, into the map construct.

Since there are up to 32 possible Huffman tables that can be defined in a JPEG file, our implementation will require 32 maps to be available. It's also worth defining at this point how the DHT segment handler will be called by the parseSeg method developed in the previous part of this series.

jpeg.h: DHT data definition class JPEG { private: // Defines a tuple of length and code, for use in the Huffman maps typedef std::pair<int, u16> huffKey; // The array of Huffman maps: (length, code) -> value std::map huffData[32]; // DHT segment handler int DHT(); }; jpeg.cpp: Passing control to the segment handler int JPEG::parseSeg() { ... switch (id) { // The DHT segment defines a Huffman table. The handler should // read exactly as many bytes from the file as are in the // segment; if not, something's gone wrong case 0xFFC4: size = READ_WORD() - 2; if (DHT() != size) { printf("Unexpected end of DHT segment\n"); return JPEG_SEG_ERR; } break; ... } return JPEG_SEG_OK; } jpeg.cpp: DHT segment handler, builds a Huffman map int JPEG::DHT() { int i, j; // A counter of how many bytes have been read int ctr = 0; // The incrementing code to be used to build the map u16 code = 0; // First byte of a DHT segment is the table ID, between 0 and 31 u8 table = fgetc(fp); ctr++; // Next sixteen bytes are the counts for each code length u8 counts[16]; for (i = 0; i < 16; i++) { counts[i] = fgetc(fp); ctr++; } // Remaining bytes are the data values to be mapped // Build the Huffman map of (length, code) -> value for (i = 0; i < 16; i++) { for (j = 0; j < counts[i]; j++) { huffData[table][huffKey(i + 1, code)] = fgetc(fp); code++; ctr++; } code <<= 1; } // Once the map has been built, print it out printf("Huffman table #%02X:\n", table); std::map::iterator iter; for (iter = huffData[table].begin(); iter != huffData[table].end(); iter++ ) { printf(" %04X at length %d = %02X\n", iter->first.second, iter->first.first, iter->second); } return ctr; }

As with the previous part, the JPEG class can be instantiated with a filename; if this is done, the above code will produce output along the following lines:

Found segment at file position 177: Huffman table Huffman table #00: 0000 at length 2 = 04 0002 at length 3 = 02 0003 at length 3 = 03 0004 at length 3 = 05 0005 at length 3 = 06 0006 at length 3 = 07 000E at length 4 = 01 001E at length 5 = 00 003E at length 6 = 08 007E at length 7 = 09

Next time: Reading the bitstream

Once the Huffman maps have been built for a JPEG file, the image scan can be decoded for further processing. In the next part, I'll take a look at the Huffman decoding of the scan, in the wider context of reading blocks from the image and examining the process through which they are transformed.

Imran Nazar <tf@imrannazar.com>, Feb 2013.

Let's Build a JPEG Decoder: File Structure

Sun, 20 Jan 2013 11:30:18 +0000

This article is part two of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Download the code: http://imrannazar.com/content/files/jpegparse.zip

In the previous part, I gave a brief overview of the techniques used by JPEG to compress an image. Before examining the detailed implementation of those techniques, it's useful to look at the overall structure of a JPEG file, for two reasons:

Some of the processes used by the encoder employ tables of values, which are stored alongside the image information itself, so it's sensible to retrieve these into memory before they're needed.
We'll also need somewhere to put the implementation of the image decompression algorithm, and having a framework in place for that facilitates this.

The implementation developed in this series of articles will be written in C++, but the constructs can be transplanted to a language of your choice with little additional complexity.

Types of JPEG image

It should be stated at this juncture that the implementation developed here will only apply to one common subset of all the possible types of JPEG image. Firstly, there are four types of compression supported by the standard:

Baseline: The most common compression type, where all the image information is contained in one series of 8x8 blocks.
Extended Sequential: Most used in medical imaging, this type allows for more levels per pixel.
Progressive: Information from the frequency domain is written in a series of scans, the most important values for each block coming first in the file. This allows the whole image to be rendered at a low resolution, and the details filled in as the image downloads.
Lossless: A rare encoding based on a predicted difference between the target pixel and its surroundings.

Further, there are two forms of encoding that are applied on top of the image compression, to further compress the file data:

Huffman-based entropy: The image is made into a bit-stream, with the most common values encoded as short (2- or 3-bit) stream entries, and less common values recorded as longer strings of bits.
Arithmetic: A formerly patented encoding method, where the image data is represented as a series of probabilities of the values occurring, and combined into one fractional number.

This series will implement an entropy-coded baseline JPEG decoder.

File segments

A JPEG file is made up of segments of varying length, each of which starts with a "marker" to denote which kind of segment it is. There are 254 possible types of segment, but only a few are found in the type of image we'll be decoding:

Name	Short Name	Marker	Description	Length (bytes)
Start of Image	SOI	FF D8	Delimits the start of the file	2
Define Quantisation Table	DQT	FF DB	Values used by the decoder	69
Define Huffman Table	DHT	FF C4	Values used by the decompressor	Variable
Start of Frame	SOF	FF C0	Information for an entropy-coded baseline frame	10
Start of Scan	SOS	FF DA	Encoded and compressed image bitstream	Variable
End of Image	EOI	FF D9	Delimits the end of the file	2

Table 1: Segments present in an entropy-coded baseline JPEG file

Most of the different types of segment have a "length" value just after the marker, which denotes how long the segment is in bytes (including the length value); this can be used to skip over segments that a decoder doesn't know about. There are three exceptions to this general rule:

SOI and EOI: Because these are more delimiters than markers, they consist only of the marker value.
SOS: The scan is a bitstream, and automatically "ends" when the image is fully coded. As such, there is no length written into the file for an SOS segment. There are two strategies for dealing with this: either we can assume the rest of the file is part of the scan, or we can read through the file looking for markers which would denote the start of a new segment.

For this article, I'll assume the rest of the file is part of a scan if we run into an SOS segment, and skip straight to the EOI.

Implementation: Listing the segments in a JPEG file

As a first step, it makes sense to write a program to open a JPEG file, and run through it looking for segment markers. The structure of such a program can be expanded upon with implementation for processing of the different kinds of segments, and the mechanism for skipping over segments given their size can be used later to skip over the parts of the file which are non-essential to the decoding process.

Since the sizes of values in a JPEG file are specified in absolute terms of number of bytes, it's a good idea to abstract the basic integer types into types which refer to size. For this, we'll use a short header file.

inttypes.h: Architecture-independent integer size definitions #ifndef __INTTYPES_H_ #define __INTTYPES_H_ typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef signed char s8; typedef signed short s16; typedef signed int s32; #endif//__INTTYPES_H_

The above file is set up for 32-bit compilation, but can be adapted if 64- or 16-bit code is required. The advantage of this is that references to integers in the JPEG decoder implementation itself can be agnostic of architecture, and simply refer to u16 and other types defined here.

With these abstractions in place, the implementation of a segment listing is quite simple. Since we'll be building the decoding functionality into a class, it's worth defining the class itself at this time.

jpeg.h: JPEG decoder class definition #ifndef __JPEG_H_ #define __JPEG_H_ #include "inttypes.h" #include #include #include #include // Macro to read a 16-bit word from file #define READ_WORD() ((fgetc(fp) << 8) | fgetc(fp)) // Segment parsing error codes #define JPEG_SEG_ERR 0 #define JPEG_SEG_OK 1 #define JPEG_SEG_EOF -1 class JPEG { private: // Names of the possible segments std::string segNames[64]; // The file to be read from, opened by constructor FILE *fp; // Segment parsing dispatcher int parseSeg(); public: // Construct a JPEG object given a filename JPEG(std::string); }; #endif//__JPEG_H_ jpeg.cpp: JPEG segment listing implementation #include "jpeg.h" #include #include #include //------------------------------------------------------------------------- // Function: Parse JPEG file segment (parseSeg) // Purpose: Retrieves 16-bit block ID from file, shows name int JPEG::parseSeg() { if (!fp) { printf("File failed to open.\n"); return JPEG_SEG_ERR; } u32 fpos = ftell(fp); u16 id = READ_WORD(), size; if (id < 0xFFC0) { printf("Segment ID expected, not found.\n"); return JPEG_SEG_ERR; } printf( "Found segment at file position %d: %s\n", fpos, segNames[id-0xFFC0].c_str()); switch (id) { // The SOI and EOI segments are the only ones not to have // a length, and are always a fixed two bytes long; do // nothing to advance the file position case 0xFFD9: return JPEG_SEG_EOF; case 0xFFD8: break; // An SOS segment has a length determined only by the // length of the bitstream; for now, assume it's the rest // of the file less the two-byte EOI segment case 0xFFDA: fseek(fp, -2, SEEK_END); break; // Any other segment has a length specified at its start, // so skip over that many bytes of file default: size = READ_WORD(); fseek(fp, size-2, SEEK_CUR); break; } return JPEG_SEG_OK; } //------------------------------------------------------------------------- // Function: Array initialisation (constructor) // Purpose: Fill in arrays used by the decoder, decode a file // Parameters: filename (string) - File to decode JPEG::JPEG(std::string filename) { // Debug messages used by parseSeg to tell us which segment we're at segNames[0x00] = std::string("Baseline DCT; Huffman"); segNames[0x01] = std::string("Extended sequential DCT; Huffman"); segNames[0x02] = std::string("Progressive DCT; Huffman"); segNames[0x03] = std::string("Spatial lossless; Huffman"); segNames[0x04] = std::string("Huffman table"); segNames[0x05] = std::string("Differential sequential DCT; Huffman"); segNames[0x06] = std::string("Differential progressive DCT; Huffman"); segNames[0x07] = std::string("Differential spatial; Huffman"); segNames[0x08] = std::string("[Reserved: JPEG extension]"); segNames[0x09] = std::string("Extended sequential DCT; Arithmetic"); segNames[0x0A] = std::string("Progressive DCT; Arithmetic"); segNames[0x0B] = std::string("Spatial lossless; Arithmetic"); segNames[0x0C] = std::string("Arithmetic coding conditioning"); segNames[0x0D] = std::string("Differential sequential DCT; Arithmetic"); segNames[0x0E] = std::string("Differential progressive DCT; Arithmetic"); segNames[0x0F] = std::string("Differential spatial; Arithmetic"); segNames[0x10] = std::string("Restart"); segNames[0x11] = std::string("Restart"); segNames[0x12] = std::string("Restart"); segNames[0x13] = std::string("Restart"); segNames[0x14] = std::string("Restart"); segNames[0x15] = std::string("Restart"); segNames[0x16] = std::string("Restart"); segNames[0x17] = std::string("Restart"); segNames[0x18] = std::string("Start of image"); segNames[0x19] = std::string("End of image"); segNames[0x1A] = std::string("Start of scan"); segNames[0x1B] = std::string("Quantisation table"); segNames[0x1C] = std::string("Number of lines"); segNames[0x1D] = std::string("Restart interval"); segNames[0x1E] = std::string("Hierarchical progression"); segNames[0x1F] = std::string("Expand reference components"); segNames[0x20] = std::string("JFIF header"); segNames[0x21] = std::string("[Reserved: application extension]"); segNames[0x22] = std::string("[Reserved: application extension]"); segNames[0x23] = std::string("[Reserved: application extension]"); segNames[0x24] = std::string("[Reserved: application extension]"); segNames[0x25] = std::string("[Reserved: application extension]"); segNames[0x26] = std::string("[Reserved: application extension]"); segNames[0x27] = std::string("[Reserved: application extension]"); segNames[0x28] = std::string("[Reserved: application extension]"); segNames[0x29] = std::string("[Reserved: application extension]"); segNames[0x2A] = std::string("[Reserved: application extension]"); segNames[0x2B] = std::string("[Reserved: application extension]"); segNames[0x2C] = std::string("[Reserved: application extension]"); segNames[0x2D] = std::string("[Reserved: application extension]"); segNames[0x2E] = std::string("[Reserved: application extension]"); segNames[0x2F] = std::string("[Reserved: application extension]"); segNames[0x30] = std::string("[Reserved: JPEG extension]"); segNames[0x31] = std::string("[Reserved: JPEG extension]"); segNames[0x32] = std::string("[Reserved: JPEG extension]"); segNames[0x33] = std::string("[Reserved: JPEG extension]"); segNames[0x34] = std::string("[Reserved: JPEG extension]"); segNames[0x35] = std::string("[Reserved: JPEG extension]"); segNames[0x36] = std::string("[Reserved: JPEG extension]"); segNames[0x37] = std::string("[Reserved: JPEG extension]"); segNames[0x38] = std::string("[Reserved: JPEG extension]"); segNames[0x39] = std::string("[Reserved: JPEG extension]"); segNames[0x3A] = std::string("[Reserved: JPEG extension]"); segNames[0x3B] = std::string("[Reserved: JPEG extension]"); segNames[0x3C] = std::string("[Reserved: JPEG extension]"); segNames[0x3D] = std::string("[Reserved: JPEG extension]"); segNames[0x3E] = std::string("Comment"); segNames[0x3F] = std::string("[Invalid]"); // Open the requested file, keep parsing blocks until we run // out of file, then close it. fp = fopen(filename.c_str(), "rb"); if (fp) { while(parseSeg() == JPEG_SEG_OK); fclose(fp); } else { perror("JPEG"); } }

When constructed with a file, an object of this JPEG class will provide output similar to the following.

Output of JPEG segment listing Found segment at file position 0: Start of image Found segment at file position 2: JFIF header Found segment at file position 20: Quantisation table Found segment at file position 89: Quantisation table Found segment at file position 158: Baseline DCT; Huffman Found segment at file position 177: Huffman table Found segment at file position 208: Huffman table Found segment at file position 289: Huffman table Found segment at file position 318: Huffman table Found segment at file position 371: Start of scan Found segment at file position 32675: End of image

Next time: Examining the encoding of a scan

As can be seen above, the "scan" constitutes the majority of an entropy-coded baseline JPEG; since the entirety of the image data is encoded within the scan, this makes sense. Entropy coding is based on the Huffman compression algorithm, so in the next article I'll examine the parts of a JPEG file that provide the information needed to decode the scan from a bitstream into something usable for further processing.

Imran Nazar <tf@imrannazar.com>, Jan 2013.

Let's Build a JPEG Decoder: Concepts

Sat, 05 Jan 2013 00:03:48 +0000

This article is part one of a series exploring the concepts and implementation of JPEG decoding; four parts are currently available, and others are expected to follow.

Build a JPEG decoder? Whatever for, when we have so many of them already?

JPEG is something we all take for granted: most of the Web comprises pictures transmitted as JPEG files, and video files based on JPEG technology. As it turns out, the concepts that lie behind these images span nearly two hundred years of mathematics and computing theory, and going from the raw file to an image takes a bunch of interesting work.

At the core: frequency analysis

In An Introduction to Compression, I looked at the difference between "lossless" and "lossy" compression: the difference between the two methods is that lossless compression preserves all the inherent information of the input, whereas lossy compression throws much of it away. Throwing information away only works when it can be deemed unnecessary for proper handling of the file; this would never be the case for a computer program, for example, where every byte is a statement that must be retained.

Images or videos, and their cousins in the audio world, rely on perception to work out what needs to be thrown away: just as the human ear can only distinguish sounds in a particular region of frequencies, the human eye has a particular resolution and any colour changes that happen within a very short distance are essentially invisible. Resolution can also be thought of as "visual frequency", and can be manipulated in much the same way as sound wave frequency or other kinds of wave.

It follows that a chunk of sound or a piece of an image can be compressed by removing those parts of the frequency range that are outside the human experience: those frequency ranges that we don't care about, without which the essence isn't lost. There are three steps to doing that:

Transform the sample: Sound waves are changes of sound level through time, and images are changes of colour through two dimensions of space. We need somehow to change these into frequency representations of themselves, through a transformation.
Band-pass: Once we're in the frequency domain, we need to cut out the section, or band, of frequencies that interests us. The technical term for this is a "band-pass filter".
Inverse transform: Our newly reduced signal can then be sent backwards through the transformation, resulting in a sound or image that's had its excess information stripped.

The transformation use is derived from the Fourier transform, which takes an integratable mathematical function f(x) and generates an equation f(s) for a frequency spectrum. The Fourier transform only works with continuous ranges of numbers, and extends from negative infinity to positive infinity; this makes it unusable for digital transformations like those we need here. Instead, a discrete version of the Fourier transform is used: the MP3 and JPEG techniques use the discrete cosine transform (DCT) to change a set of data values into an equivalent set of frequencies.

Two examples of the compression process are presented in the following figures: first for a short sound sample.

Figure 1: Chorus riff, "More Than a Feeling" (Boston, 1976)
Encoded to 40kbps MPEG audio Layer 3
Frequency analysis courtesy of ARTA for Windows, by Ivo Mateljan

The below figure represents the same process as above, but applied in two dimensions of space as opposed to one of time. It is generally considered inefficient to transform the entire image into a visual frequency domain at once, so the JPEG algorithm transforms blocks of eight pixels square at a time.

Figure 2: Icon from the DryIcons Shine set
Encoded to JPEG, q=25%

In Figure 2, the right-hand set of images show the DCT and its filtered version. The most important figure in each 8x8 block is the "DC component" in the top-left, which determines the base level for the whole block. Values to the right of this give information as to how often variances happen horizontally, and conversely values below the DC component provide vertical frequency information. It follows that values in the bottom-right of a DCT block describe the highest-fidelity changes in the image, and that filtering consists of drawing a diagonal across each block and retaining the top portion, throwing away the information regarding high-fidelity changes.

Colour spaces and downsampling

JPEG makes use of the fact that the human eye has a maximum resolution when it comes to visual changes. Another feature of the eye is that it is less sensitive to changes in colour than in brightness: the frequency-sensitive "cone" cells of the retina occur at a lower density than the simpler frequency-agnostic "rods", which means a lower visual resolution for colour.

It is possible for images to be further compressed by utilising this information, reducing the amount of information used to encode colour values in relation to brightness. Unfortunately, the traditional additive colour space of red/green/blue used by computer and television displays retains no information about relative brightness and colour saturation; in order to retrieve this information RGB values must be transformed to another colour model.

The JPEG format most commonly uses Y'CbCr colour, where Y' denotes the luminance of a particular pixel, and the Cb and Cr components describe the amount of chrominance on two axes, corresponding to percentages of blue and of red. Transformations from RGB pixel values to Y'CbCr act as a rotation between the cube of all possible RGB values and the cube of possible luminance/chrominance values, as shown in Figure 3.

Figure 3: The RGB cube, rotated within the Y'CbCr cube
From the Intel Integrated Performance Primitives manual

Once transformed to Y'CbCr, the chrominance channels are separated out and can be manipulated. In this case, "downsampling" is employed: the resolution of the colour channels is halved in both dimensions, such that one "block" of colour information covers the equivalent area of four luminance blocks. In Figure 4, the colour channels have been downsampled by this ratio: it can be seen that the Y' channel is the most important for the integrity of the image, and thus its resolution remains high.

Figure 4: A JPEG image broken into its Y'CbCr channels, with chrominance downsampling

For the remainder of this series of articles, bidirectional downsampling of the type shown here will be assumed: most JPEG images on the Web employ this colour compression, so it's useful to explore. Because of the reduction in resolution, the previously mentioned eight-pixel square used by the JPEG algorithm becomes a minimum unit size of 16 pixels square, with four 8x8 luminance blocks accompanying one block for each axis of colour information.

Next time: The file format

JPEG files store additional information alongside the encoded image: lookup tables to be used by the decoding process, comments and resolution information. In part two, I'll take a look at the segments that make up a JPEG file, and how to hold onto some of the information provided for use by the decoding implementation of subsequent parts.

Imran Nazar <tf@imrannazar.com>, Jan 2013.

svn branchlist: Print branch names for multiple repos

Wed, 03 Oct 2012 11:06:47 +0000

I sometimes run into the situation where I'm working on multiple related Subversion repositories, and all their working copies are in a "repository root" on my drive. With these repos being switched to different branches, it became tedious to flip through each one to find which branch they were attached to.

Like any lazy programmer, I automated the process into a Bash script.

svnbr: A script to print the current branch name of multiple Subversion working copies #!/bin/bash REPOROOT="/home/inazar/code" PREFIX="foo-" POSTFIX="-bar" t=`mktemp /tmp/svnbr_XXXXX` pushd . > /dev/null cd $REPOROOT for a in `find . -maxdepth 1 -name "$PREFIX*$POSTFIX"`; do cd $a b=`svn info 2> /dev/null | grep URL | awk -F'/' '{print $NF}'` c=${a%$POSTFIX} c=${c#$PREFIX} c=${c:2} d="" # Caveat: Display will fall over with repo names past 16 chars [[ ${#c} -lt 8 ]] && d="\t" echo -e "$c\t$d $b" >> $t cd .. done popd > /dev/null sort $t rm $t

How Emulators Work: a presentation

Wed, 27 Jun 2012 10:38:40 +0000

The following 17 slides constitute an Ignite talk given at deviantART's technology team meetup in France, earlier this month.

I've written a couple of emulation cores in my time, and people often ask me how computer emulators work. Well, let's take an example...

Doesn't look like a computer: there's no keyboard, no hard disk. But all the central components are there: processor, memory, keypad input, screen.

This is how the Gameboy looks on the inside: A processor talks, through a memory interface, to various memories. The program sits in one of those memories, but what _is_ the program?

All a program is, is a set of numbers. Each number means something to the processor, and it goes through a simple construct to work out what the number means.

Take a number from the "current location", work out what it means, do that. Fetch, Decode, Execute. How does the CPU know what the "current location" is, though?

As long as the CPU's running, it holds onto a few state variables. One of those is the "current location", or where to find the next instruction. There are others, to help with calculation.

Once we have a number, we decode it. Most instructions will either change one of the state variables, or change a piece of memory; some might request more numbers from the program to fully decode.

One thing to keep in mind is that none of this is free: it all takes time. For now, that's unimportant, but make a mental note of it.

To implement the Fetch/Decode/Execute cycle, the CPU needs a few distinct blocks of functionality, and an interface to memory in order to get more program, and hold onto results.

To emulate the cycle, we tear each block out and replace it with JavaScript: a big switch() to decode, discrete functions to execute routines and update the state, and a plain object to hold the state.

So those instructions we saw earlier, they become JS. Notice how each instruction changes PC, the "current location", so it points at the next one in the line.

That's the CPU: the memories are easy to emulate. They're just huge arrays. The interface object, "MMU", takes care of mapping out which memory the CPU's looking to talk to; that's generally another switch().

All well and good, but the CPU's plugging away without showing anything. Ideally, we need to emulate the other pieces: graphics, sound, keypad. The most important, of course, is graphics.

The Gameboy has 144 lines on its screen, and the graphics chip treats the screen like a traditional tube: scan a line, fly back to the left, at the bottom take a long time to fly back to the top-left.

Each of those scanlines, and blanking periods, has a precise time for how long it should take. But we're emulating the whole system, at a low precision.

Remember these from earlier; they allow the CPU to act as "emulated time". We can count the CPU instruction times, and know when to draw a GPU's scanline.

We end up with a central dispatcher, which runs the CPU until it's time to switch GPU mode. And that's a (very) basic introduction to how emulators work.

Sci-fi Shorts: Behind the Mirror

Tue, 24 Jan 2012 15:02:50 +0000

Coffee was John's friend this morning. A bottle of Everclear had vanished somewhere between 8pm and midnight yesterday, and it was likely that it'd gone into him: since his head felt three times as large as it should, that seemed to place him at the scene of the theft.

Another espresso later, John took to the task of surveying the damage. It wasn't too bad, actually: Ryan had been even more trashed, and had managed to spray the contents of the bookcase around the room. John found Tolkien behind the TV, and Dostoyevsky perched miraculously above the wall clock, but Asimov would have to wait until the pile of beer cans could be examined more closely. John wasn't feeling up to that right now.

It looked like one thing was broken: the frame around the mirror. Evidently, a coffee table had been lobbed at that wall at some point (a missing chunk of plaster, and the fine coating of gypsum on the table in question, attested to that). Amazingly, the mirror wasn't broken: the wooden frame had collapsed and lay around it, but the mirror itself was intact on the floor.

John picked up the glass, and felt that there was a slightly serrated edge on the bottom. He'd never seen the back of the mirror before, and looking closer at it revealed an intricate pattern of lines, converging on a group of 20-some parallel rows corresponding to the bumps along the bottom. John had seen something like this only a few times before, and every time it was something quite special.

An electronic circuit.

Obviously, no clue was forthcoming about what the circuit contained, or the secrets encoded within any solid-state memories that might be on the back of the mirror. John thought about enlisting Ryan's help to try to decipher the tracings, but he was still knocked out on the sofa, and would probably remain there for much of the morning.

John would have to go outside for this one.

Steganography with Brainfuck

Sat, 12 Nov 2011 18:16:58 +0000

Steganography is a name given for a set of techniques where the targeted message is hidden inside other data, most often in the context of an image in some form. Common forms of steganography include deviations in the level of individual pixels, or changes to the low-level noise components of the image. In this article, I'll look at a technique that produces an image which is also a program, which can be run to produce the targeted message.

In order to keep the examples here simple, I've sought a form of program that has a simple instruction set. Many processors have a simple instruction set: the 6502 has around 150 unique codes in its architecture, which can be reduced further with careful selection of instructions. However, having written about Brainfuck before, it seems the logical choice: eight instructions, each with a unique code, and all other codes are ignored during execution.

The program to be encoded, then, is as follows:

Message before encoding >>++++++++[<+++[<+++++>-]>-]<<[>+>+>+>+>+<<<<<-]++++++++[>->--> --->---->-----<<<<<-]<++++[>++++++++<-]>>>>>>++++.<<<.>+++++.<< <.>>+++++.++.-.---.>.<<+++++++++.<.>>--.<------.<.>>.+++++.<<.> +.>------.>.<<<.>>>-.<+.<-.>-.<++++.>>---.<<----.>.>++++.<<-. The monkey is in the dishwasher

Encoding program code into pixels

In order for the program to be converted to an image, the first step is to take the program as a stream of numbers. In Brainfuck, the eight operators can be treated as eight numbers to be inserted into a stream:

+	0x2B	-	0x2D
[	0x5B	]	0x5D
<	0x3C	>	0x3E
.	0x2E	,	0x2C

Table 1: Character values for the Brainfuck operators

Hiding these values within image data would be difficult if the data were full-colour (24- or 32-bit), since the colour component would need to correspond exactly at a given point in the image. Instead of using full-colour images, it makes more sense to use palette-based images, where an 8-bit palette index refers to a table of colours defined beforehand.

A simple example of palette data would be a graded greyscale image of sixteen shades, like the one below:

Figure 1: Lena, in sixteen-shade greyscale

If this image is produced in a standard fashion, a palette of sixteen entries and 240 blanks is saved alongside the image. By expanding the palette entries used to the maximum allotted number of 256, it's possible to use the extra entries for hiding information:

	B	C	D	E
0
1
2	+	,	-	.
3		<		>
4
5	[		]
6
7
8
9
A
B
C
D
E
F

Table 2: Sixteen-shade greyscale palette, with Brainfuck operators indicated

As can be seen above, each row of the palette is the same shade of grey, except where the Brainfuck operators map onto the palette. By changing arbitrary pixels in the image, it's possible to replace any given #2-grey pixel with, for example, a + operator without changing the row of the palette used. Shifting pixel values along the palette row in this fashion allows us to change the value without changing the look of the image.

Before applying this concept to the example image in Figure 1, it's important to consider the type of image needed for this to work.

Uncompressed images: the BMP format

The major issue presented by image formats is that of compression: in order to make the image file size smaller, various methods of compression can be used to translate areas of similarity in the image to a much smaller amount of data than that represented by the raw image. Of course, when this is done the resultant compressed data bears no superficial relation to the raw image data; attempting to encode a Brainfuck program into the compressed data stream is beyond the scope of this article.

Instead, I'll be focussing on the raw image data itself; storage of the raw data can be achieved through an uncompressed data format such as TIFF. Examples of uncompressed 8-bit palette image formats include BMP and PCX; the former will be used here, due to its prevalence as a major image file format. When used as an uncompressed 8-bit format, the structure of a BMP file is as follows:

Field	Size (bytes)	Value
Signature	2	"BM"
File size	4	Size of the full BMP
Reserved	4	0x00000000
Data offset	4	Offset of the pixel array
BITMAPINFOHEADER
Header size	4	0x00000028
Image width	4	Width of the image in pixels
Image height	4	Height of the image in pixels
Colour planes	2	0x0001
Color depth	2	0x0008
Compression method	4	0x00000000
Uncompressed image size	4	Width * height * bytes-per-pixel
Horizontal DPM	4	Pixels per metre, horizontal
Vertical DPM	4	Pixels per metre, vertical
Colours used	4	Number of unique colours
Important colours	4	Number of important colours
Colour table (256 entries)
Colour	4	RGBA value
Pixel data, 1 byte per pixel

Table 3: BMP file structure (all numeric fields are little-endian)

The end result of this process is to produce a bitmap file that can be run as a Brainfuck program; to that end, we must be careful that the header data and colour table don't contain Brainfuck operators that may corrupt the initial state of the program. This is, fortunately, relatively easy: all that is needed is to pick an image width and height that don't contain an operator within the byte values, and the rest should automatically fall into place.

One other quirk of the BMP format is the order in which pixel data is stored. Most image formats treat the first row as the top row, in accordance with computer graphics principles; BMP lines up with mathematical graphing principles, and treats the bottom row as the first.

Figure 2: Position of the rows in a BMP

This means that the Brainfuck operators need to be encoded into the image starting at the bottom, running left-right, then moving up to the next row. By examining each operator in the program in turn, and finding the next appropriate pixel (of the same row in the palette) to replace, the following result is obtained.

Figure 3: Lena, encoded with the targeted message, at 400% zoom

Once the appropriate pixels have been replaced with versions shifted along the palette row, the palette can be amended to set the Brainfuck operators to the same colour as the rest of the row in which they reside. Doing this, and producing an uncompressed bitmap of the result, provides the following.

Figure 4: The original image, and the image encoded with the target message

Running the bitmap image, shown on the right, through a Brainfuck interpreter results in the targeted message being revealed.

PHP Brainfuck interpreter list($c, $i) = split('!', file_get_contents($_SERVER['argv'][1])); $c = preg_replace('#[^\[\]\<\>,.+-]#', "", $c); $p = $v = 0; eval(strtr($c, array( "]" => "}", "[" => 'while($m[$p]){', "+" => '$m[$p]++;', "-" => '$m[$p]--;', ">" => '$p++;', "<" => '$p--;', "," => 'if(strlen($i)>$v)$m[$p]=ord($i[$v++]);', "." => 'echo chr($m[$p]);' ))); ?> Sample result $ php -f brainfuck.php http://imrannazar.com/content/img/bf-bmp-final.bmp The monkey is in the dishwasher$

Caveats and improvements

The process outlined above is simple enough that it can be automated: given a sixteen-step greyscale image, seek from the bottom left for a pixel in the desired palette row, replace the pixel and move forward. This, however, isn't ideal for steganographic purposes; the Brainfuck operators leave a unique signature in a bitmap of otherwise uniform numbers. There are two simple ways to increase the level of obfuscation:

Colour: Instead of using a greyscale image, which offers easy colour matching, it's possible to use a coloured palette with any 8-bit image as the source. This involves replacing any existing instances of the operator characters with a close colour match, then further colour matching can be used to fill in the program, hiding the signal in an increased level of noise.
Compression: Noise levels can be increased further by compressing the image once it has been encoded, hiding any relation between the bitmap data and any Brainfuck program contained within. This increases the complexity of both the encoding and decoding algorithms, since a compression step must be employed around the transmission of the image, but it allows the images to be transmitted in a format that avoids suspicion.

Imran Nazar <tf@imrannazar.com>, Nov 2011.

Sci-fi Shorts: Betel

Wed, 17 Aug 2011 19:06:20 +0000

There were never any more stars in the sky, or fewer. This far away from any other known solar system, there were only thirty stars visible up there, and every child grew up to know them well.

Cyrius, glowing a faint blue: the scientists said it was one of the brightest stars for a million light years, but out here it was a dull glow in the sky. Betel, a faraway red, constant in the sky since time immemorial. And the others, each one with a name known by all: even the alphabet had been adapted to thirty letters, one for each star in the sky.

Tonight was different. There were twenty nine stars: the letter B was missing. Betel had somehow gone out, its ruddy glow doused by the dark of space. Either something had happened to it, or the light was being... blocked in some way. The central astronomy agency soon found out that Betel wasn't missing; the latter suspicion was in fact true. A device of some sort was approaching from the rough direction of Betel, and only now was it close enough to block out the star's gaze upon us.

Morning came, and the strange entity was now visible as a blot against the iodine sky. Then it began to... expand, to unfurl, as though it were a net to cover the entire sky. Before too long, it had covered the sky in a grid of wire-thin lines. Then came a voice, deep and resonating: "SIMULATION ENDS IN TEN MINUTES."

Printable Opcodes in x86 Real Mode

Sat, 18 Jun 2011 21:30:30 +0000

One of the major historical issues with the transmission of computer programs over the Internet has been how to avoid encoding problems: with FTP, for example, a file sent in text mode may have its newline characters translated by the FTP client into different values. For a program file, this would change the meaning of the code, and may result in failure when the downloaded file is executed.

The issue of transmision encoding is obviously a solved problem: FTP has a binary mode, which avoids text-based translation, while email has MIME-formatted attachments for the inclusion of files. Another way to approach the issue, however, is to produce executables which avoid the occurrence of the problem. Since a program, like any file, is nothing more than a stream of numbers, we can alleviate the transmission issue by producing programs which only use numbers that would be found in a text file; in the ASCII encoding standard, we can define these numbers as the "printable range", between 32 and 126.

Depending on the target CPU for the executable program, the limitation of only using the printable range will have different effects: for some processors, it is impossible to write a program under these constraints. In this article, I'll be looking at the combination of the x86 PC and MS-DOS, since this allows for a wide range of instructions to be used.

Opcodes in the printable range

Since the x86 base instruction set is derived from the 8080 CPU, a number of groups of instructions fall under the printable range, including conditional branches and arithmetic manipulation. The full list is as follows.

20/	AND r/m,r (8-bit)	40/@	INC AX	60/`	PUSHA
21/!	AND r/m,r (16-bit)	41/A	INC CX	61/a	POPA
22/"	AND r,r/m (8-bit)	42/B	INC DX	62/b	BOUND r,m
23/#	AND r,r/m (16-bit)	43/C	INC BX	63/c	ARPL r/m,r (16-bit)
24/$	AND AL,imm	44/D	INC SP	64/d	FS:
25/%	AND AX,imm	45/E	INC BP	65/e	GS:
26/&	ES:	46/F	INC SI	66/f	N/A
27/'	DAA	47/G	INC DI	67/g	N/A
28/(	SUB r/m,r (8-bit)	48/H	DEC AX	68/h	PUSH imm16
29/)	SUB r/m,r (16-bit)	49/I	DEC CX	69/i	IMUL r,r/m,imm16
2A/*	SUB r,r/m (8-bit)	4A/J	DEC DX	6A/j	Push imm8
2B/+	SUB r,r/m (16-bit)	4B/K	DEC BX	6B/k	IMUL r,r/m,imm8
2C/,	SUB AL,imm8	4C/L	DEC SP	6C/l	INSB
2D/-	SUB AX,imm16	4D/M	DEC BP	6D/m	INSW
2E/.	CS:	4E/N	DEC SI	6E/n	OUTSB
2F//	DAS	4F/O	DEC DI	6F/o	OUTSW
30/0	XOR r/m,r (8-bit)	50/P	PUSH AX	70/p	JO rel8
31/1	XOR r/m,r (16-bit)	51/Q	PUSH CX	71/q	JNO rel8
32/2	XOR r,r/m (8-bit)	52/R	PUSH DX	72/r	JC rel8
33/3	XOR r,r/m (16-bit)	53/S	PUSH BX	73/s	JNC rel8
34/4	XOR AL,imm	54/T	PUSH SP	74/t	JZ rel8
35/5	XOR AX,imm	55/U	PUSH BP	75/u	JNZ rel8
36/6	SS:	56/V	PUSH SI	76/v	JBE rel8
37/7	AAA	57/W	PUSH DI	77/w	JNBE rel8
38/8	CMP r/m,r (8-bit)	58/X	POP AX	78/x	JS rel8
39/9	CMP r/m,r (16-bit)	59/Y	POP CX	79/y	JNS rel8
3A/:	CMP r,r/m (8-bit)	5A/Z	POP DX	7A/z	JP rel8
3B/;	CMP r,r/m (16-bit)	5B/[	POP BX	7B/{	JNP rel8
3C/<	CMP AL,imm	5C/\	POP SP	7C/\|	JL rel8
3D/=	CMP AX,imm	5D/]	POP BP	7D/}	JNL rel8
3E/>	DS:	5E/^	POP SI	7E/~	JLE rel8
3F/?	AAS	5F/_	POP DI

Table 1: Printable opcodes in the x86 instruction map

As can be seen in the above list, a good selection of opcodes is available in the x86 printable set; this allows three possible methods for producing programs using these operations:

Programs from scratch: Using printable opcodes from the first stages of program production, to make applications that can be directly transmitted over a plain-text channel. This is obviously the most difficult option, and provides no consideration for transmission of existing programs; as such, it will not be looked at here.
External decoding: Programs are encoded in a plain-text representation, similar to Base64, and transmitted; a decoding application at the receiving end then processes the transmission into the original program. This solution is used by the uuencode utility, but is not an integrated solution.
Inline decoding: The program is encoded as with external decoding, but is provided alongside a decoding routine written with printable opcodes; this eliminates the need for an external decoder, and is the simplest solution: the program can simply be run as received, and will perform as expected.

Producing the encoding

As mentioned above, the concept of Base64 can be applied in this case. Base64 encoding treats the program file as a long number, modulo 64: under this base, a number provided as a stream of bytes can be evenly divided into a stream of digits.

The canonical Base64 encoding is an adaptation of the concept behind hexadecimal: instead of using the standard denary digits in their normal position, the letters of the alphabet are used first. Values 0 to 25 are represented by the uppercase letters A-Z, 26 to 51 by the lowercase letters, and 52 to 61 by the digits 0-9. The remaining encodings 62 and 63 change between variations of the Base64 model, but are used as + and / in most variations.

This canonical encoding skips around the ASCII character set, which makes for complexity in the encoding and decoding algorithm. In the case of this article, the particular characters used for the encoding are unimportant as long as they are within the printable range: the algorithms are simplified if a contiguous range is used, so for simplicity a range from 32-95 will serve as the translation endpoint.

Encoding of the program for transmission can be performed by an external utility; the following C extract will produce a contiguous-base64 encoding of a source file.

cb64-encode.c: Encoder to contiguous base64 #include int main(int argc, char **argv) { FILE *in, *out; int fsize, i, n, o[4]; if(argc != 2) { printf("Usage: encode \n"); return 1; } in = fopen(argv[1], "rb"); out = fopen("encode.out", "wb"); /* Find out the size of the file */ fseek(in, 0, SEEK_END); fsize = ftell(in); fseek(in, 0, SEEK_SET); for(i=0; i3) { /* Retrieve 24 bits from the file */ n = (fgetc(in) << 0); n |= (fgetc(in) << 8); n |= (fgetc(in) << 16); /* Break out into four 6-bit values */ o[0] = ((n >> 18) & 63) + 32; o[1] = ((n >> 12) & 63) + 32; o[2] = ((n >> 6) & 63) + 32; o[3] = ((n ) & 63) + 32; /* Write encoded values */ fputc(o[0], out); fputc(o[1], out); fputc(o[2], out); fputc(o[3], out); } fclose(in); fclose(out); return 0; }

The inline decoder

With the source program encoded into the printable range, the remaining part of the process is the inline decoder. In theory, this is a simple reversal of the above code; the issue is complicated by the fact that the decoder must be written using printable opcodes only. In addition to using printable opcodes, some of the opcodes in the x86 instruction set use an optional ModR/M byte to specify their arguments: this additional byte defines the source and destination for the operation. A few examples:

ModR/M samples XOR AX,BX ; 33 C3 MOV AL,BYTE [SI] ; 8B 04 AND CX,WORD [SI+0293] ; 23 8C 93 02

As can be seen above, the ModR/M byte can take on any value, and each of the possible values encodes to a combination of source and destination. Of course, only some of these combinations encode to printable byte values: those selections are shown below.

	Destination
	8-bit	AL	CL	DL	BL	AH	CH	DH	BH
	16-bit	AX	CX	DX	BX	SP	BP	SI	DI
Source	[BX+SI]					20	28	30	38
	[BX+DI]					21	29	31	39
	[BP+SI]					22	2A	32	3A
	[BP+DI]					23	2B	33	3B
	[SI]					24	2C	34	3C
	[DI]					25	2D	35	3D
	[addr]					26	2E	36	3E
	[BX]					27	2F	37	3F
	[BX+SI+disp]	40	48	50	58	60	68	70	78
	[BX+DI+disp]	41	49	51	59	61	69	71	79
	[BP+SI+disp]	42	4A	52	5A	62	6A	72	7A
	[BP+DI+disp]	43	4B	53	5B	63	6B	73	7B
	[SI+disp]	44	4C	54	5C	64	6C	74	7C
	[DI+disp]	45	4D	55	5D	65	6D	75	7D
	[BP+disp]	46	4E	56	5E	66	6E	76	7E
	[BX+disp]	47	4F	57	5F	67	6F	77

Table 2: Printable values for the ModR/M byte
addr is 16-bit, disp is 8-bit

Due to the limitations imposed by both the opcode range and the allowable addressing of ModR/M bytes, a few techniques have to be employed in code production:

Use of DOS initial values: When a program is loaded by MS-DOS or the Windows VDM, the registers have certain values at the start of execution. One of the important instances of this is that AX is zero. This value can be used throughout the program if it's saved to a register that's used solely for the purpose of providing zeroes. For example: ; Use BX for zero register PUSH AX POP BX
XOR masking: Since the MOV opcode is disallowed from use, and most immediate values cannot be represented as printable bytes, XOR can be used to clear unused bits. For example: ; MOV AX, 0013h PUSH BX POP AX XOR AX, 2020h XOR AX, 2033h ; MOV AL, [SI+1Fh] DEC SI PUSH BX POP AX XOR AX, [SI+20h]
Jump padding: Since jumps in the printable range are relative, forward jumps can only be made to destinations more than 31 bytes away. To facilitate this, extra bytes can be inserted if the destination isn't far enough away. This requires that the encoded output of the operations between the jump and its destination is known, since the code length is required to calculate the amount of padding required.
Manual encoding: Some operations will not be intuitively selected by the assembler, and will be encoded into the non-printable range. To avoid this, the operation can be assembled by hand and inserted at the appropriate point. For the example of XOR where the source byte is 65 bytes from the opcode: ; XOR AL, [SI+($-PRSTART)] DB 32h, 44h, 41h

Using these techniques, it's possible to write the reverse decoder for the printable-opcode encoder detailed above, and attach the encoded program to it. The resultant program is shown below, in NASM assembly format.

linenoise.com: Input program to be encoded mov al, 0x13 int 0x10 ; Set 320x200x8 graphics mode push 0xA000 pop es ; Set destination segment lineloop: mov cx, 0x0140 ; For 200 lines in ax, 0x40 ; Get a "random" number from the timer and ax, 0xBF mul cx mov di, ax ; Write to that line on screen (mod 191) in al, 0x40 ; Get a "random" number rep stosb ; Write a line of that colour in al, 0x60 ; Check for a key dec al jnz lineloop ; Loop back if not ESC end: ret ; Return to DOS linenoise-encoded.com: Encoded into printable opcodes pusha ; Save all registers for post-decoding ; Initialise registers and zero-holder push ax push ax push ax push ax push ax pop edx pop ebx pop cx ; Rewrite jpouter and jpinner xor al, jpouter-256-96 ; Get a printable-range 8-bit value xor ax, 0x235B xor ax, 0x225B ; Add 256 push ax pop si ; Use this as the address push bx pop ax xor al, 0x20 sub al, 0x55 ; AL = 0xCB xor [si+0x60], al ; Opcode = 0xEB (JMP rel) xor [si+0x62], al xor al,0x34 ; AL = 0xFF xor [si+0x61], al ; Set jump point to known value xor [si+0x63], al inc ax ; AX = 0x0100 push ax pop bp ; BP = 0x0100 ; Set destination point (CS:1000 - 32) xor ax,0x3E30 xor ax,0x3030 sub al,32 push ax pop di ; Set source point (+256+8) push bx pop ax xor ax,0x3120 xor ax,0x307E push ax pop si ; Perform the decoding lpouter: ; In blocks of 4 push bx push bx pop eax and [di+0x20],eax inc cx inc cx inc cx inc cx ; Read 6 bits lpinner: push bx pop ax db 0x32, 0x44, 0x41 ; xor al, [si+(source-$)] sub al,32 cmp al,94 je end ; Push along by 6 bits, shift in the new bits imul edx,[di+0x20],byte 64 and [di+0x20],ebx xor [di+0x20],edx ; [Dest] = Shifted val xor [di+0x20],al ; [Dest] += next inc si dec cx jnz jpinner ; Once the block of 4 is done, DI += 3 inc di inc di inc di jnz jpouter ; Jump indirect ; Jump padding, since the above code is 25 bytes long db '@@@@@@@' end: push bp pop ax ; AX = 0x0100 ; Rewrite jump to CS:1000 xor al,jpouter-256-64 push ax pop si push bx pop ax and [si+0x60],ax and [si+0x62],ax ; Clear opcode sub al,0x37 xor al,0x20 ; AL = 0xE9 xor [si+0x60],al ; JMP (rel16) push bx pop ax xor ax,0x3E30 xor ax,0x3072 ; AX = 0x1000 - $ xor [si+0x61],ax dec di ; Clear Z flag popa ; Retrieve all registers jnz jpouter+32 ; Jump indirect to CS:1000 ; Rewritable jump points jpouter: db 0x20 db ($-lpouter) jpinner: db 0x20 db ($-lpinner) ; The encoded program source: db 'S1.P &@0N0>@Y0% OR5 X?< Y,>)JO- _F#DZG7(___#' db 126 ; EOF marker

The above encoded program runs in the same manner as the original, and produces the following output:

Figure 1: Output of linenoise-encoded.com

Caveats and conclusion

There are a few issues with this encoding mechanism. Firstly, it is heavily tied to MS-DOS .com programs, since the decoder is itself a DOS program and relies on the initial conditions provided by the MS-DOS program loader. Furthermore, it has a fixed destination for the decoded program of CS:1000h, which only allows programs of a little under 4 kilobytes to be used without modifying the decoder.

These issues, however, can be overlooked if the use of the encoding mechanism is kept to the domain of graphic demos and other simple programs which are to be written using printable opcodes. That domain may, of course, not be a large one.

Imran Nazar <tf@imrannazar.com>, Jun 2011.

Edge Detection with Android Native Code

Sat, 21 May 2011 22:40:13 +0000

The source code to this project is now available in full, at: http://imrannazar.com/content/files/android-sobel.zip

In the previous part of this set of articles, I began an introduction to augmented reality, using the simple example of edge detection on Android smartphones; in that part, the camera hardware was introduced, and the framework of an application developed for the use of the camera preview. In this concluding part, the edge detection algorithm itself and its implementation will be explored.

The Sobel operator

The algorithm that will be used is the Sobel operator, which works as a filter applied to each pixel in an image. The process iterates over each pixel in a row of the image, and over each row in turn, performing a factorised multiplication for each pixel value:

Figure 1: Sobel operator formulation and example

For Y = 1 to (Height-1) For X = 1 to (Width-1) Horiz_Sobel = (Input[Y-1][X-1] * -1) + (Input[Y-1][X] * 0) + (Input[Y-1][X+1] * 1) + (Input[Y] [X-1] * -2) + (Input[Y] [X] * 0) + (Input[Y] [X+1] * 2) + (Input[Y+1][X-1] * -1) + (Input[Y+1][X] * 0) + (Input[Y+1][X+1] * 1) Vert_Sobel = (Input[Y-1][X-1] * -1) + (Input[Y-1][X] * -2) + (Input[Y-1][X+1] * -1) + (Input[Y] [X-1] * 0) + (Input[Y] [X] * 0) + (Input[Y] [X+1] * 0) + (Input[Y+1][X-1] * 1) + (Input[Y+1][X] * 2) + (Input[Y+1][X+1] * 1) Output[Y][X] = Pythag(Horiz_Sobel, Vert_Sobel) Next X Next Y

The calculation of the Sobel operator index can be simplified in two ways:

Removal of multiplication: Some of the indices used by the algorithm are zero, which means that the associated terms are not used in the calculation at all; conversely, some indices are negative, which means a negative value must be added. Replacing multiplications with addition and subtraction of terms means that fewer operations are required to produce the value, making the calculation quicker.
Approximation of Pythagorean addition: For the purposes of this application, an exact value for the resultant Sobel value is not required, merely an approximation; a relatively close approximation of the Pythagorean operator is a simple average of the two values involved. This average will always be higher than the actual value, but will serve as a fair replacement.

With these modifications, the calculation can be adapted to the following.

For Y = 1 to (Height-1) For X = 1 to (Width-1) Horiz_Sobel = Input[Y+1][X+1] - Input[Y+1][X-1] + Input[Y][X+1] + Input[Y][X+1] - Input[Y][X-1] - Input[Y][X-1] + Input[Y-1][X+1] - Input[Y-1][X-1] Vert_Sobel = Input[Y+1][X+1] + Input[Y+1][X] + Input[Y+1][X] + Input[Y+1][X-1] - Input[Y-1][X+1] - Input[Y-1][X] - Input[Y-1][X] - Input[Y-1][X-1] Output[Y][X] = Clamp((Horiz_Sobel + Vert_Sobel) / 2) Next X Next Y

Before this filter can be applied to the camera preview image, the image must be taken from the camera and made ready for processing.

Handling the Camera Preview

As introduced in Part 1, the camera hardware is capable of automatically calling a predefined function whenever a frame of the preview is ready; this function is referred to as the "preview callback", and receives a byte[] containing the raw image data. By default, the preview image is in NV21 format, a standard luminance/chrominance format; for the example of a 320x240 pixel NV21 image:

The first 76,800 bytes of the image are a direct luminance map, with each byte corresponding to a "brightness" or greyscale value for the corresponding pixel in the image;
The following 38,400 bytes are a 2x2 subsampling of chrominance: for each 2x2-pixel block in the image, one byte encodes a U-chrominance, and the following byte a V-value.

It's relatively straightforward to perform a Sobel calculation on the luminance part of the NV21 image, and a thresholded result can be placed into the overlay canvas for each output pixel:

src/sobel/OverlayView.java: Sobel operation private int[] mFrameSobel; private void setPreviewSize(Camera.Size s) { // Allocate a 32-bit buffer as large as the preview mFrameSobel = new int[s.width * s.height]; mFrameSize = s; } private void setCamera(Camera c) { mCam = c; mCam.setPreviewCallback(new PreviewCallback() { // Called by camera hardware, with preview frame public void onPreviewFrame(byte[] frame, Camera c) { Canvas cOver = mOverSH.lockCanvas(null); try { int x, y; int w = mFrameSize.width, pos; int sobelX, sobelY, sobelFinal; for(y=1; y<(mFrameSize.height-1); y++) { pos = y * w + 1; for(x=1; x<(mFrameSize.width-1); x++) { sobelX = frame[pos+w+1] - frame[pos+w-1] + frame[pos+1] + frame[pos+1] - frame[pos-1] - frame[pos-1] + frame[pos-w+1] - frame[pos-w-1]; sobelY = frame[pos+w+1] + frame[pos+w] + frame[pos+w] + frame[pos+w-1] - frame[pos-w+1] - frame[pos-w] - frame[pos-w] - frame[pos-w-1]; sobelFinal = (sobelX + sobelY) / 2; // Threshold at 48 (for example) if(sobelFinal < 48) sobelFinal = 0; if(sobelFinal >= 48) sobelFinal = 255; // Build a 32-bit RGBA value, either // transparent black or opaque white mFrameSobel[pos] = (sobelFinal << 0) + (sobelFinal << 8) + (sobelFinal << 16) + (sobelFinal << 24); } } // Copy calculated frame to bitmap, then // translate onto overlay canvas Rect src = new Rect(0, 0, mFrameSize.width, mFrameSize.height); Rect dst = new Rect(0, 0, cOver.getWidth(), cOver.getHeight()); Paint pt = new Paint(); Bitmap bmp = Bitmap.createBitmap(mFrameSobel, mFrameSize.width, mFrameSize.height, Bitmap.Config.ARGB_8888); pt.setColor(Color.WHITE); pt.setAlpha(0xFF); cOver.drawBitmap(bmp, src, dst, pt); } catch(Exception e) { // Log/trap rendering errors } finally { mOverSH.unlockCanvasAndPost(cOver); } } }); }

The above code, when run as part of the camera preview, yields the following view.

Figure 2: Application output

Optimising the operation

As written, there's a problem with this application: speed. When run on a hardware device, the overlay calculation is incapable of maintaining a near-real-time speed of augmented display; in the case of my own hardware, a rendering speed of around 3 frames per second was achieved. This is due, in the main, to the calculations being performed within a buffer of managed memory in the Dalvik virtual machine: every access to the camera preview data is checked for boundary conditions, as is every pixel value written to the overlay canvas. All of these checks for boundary conditions take time away from the Sobel operation.

To alleviate this issue, the calculation can be performed in native code bypassing the virtual machine; this is done through the Android Native Development Kit (NDK). The NDK is an implementation of the Java Native Interface (JNI), and as such behaves in a very similar way to standard JNI: native code is placed into functions conforming to a particular naming standard, and they can then be called from the Java VM as specially marked native functions.

NDK native functions are named according to the package and class they're destined for: the standard format is Java___. In this particular case, the destination is package sobel and class OverlayView, so the interface can be built as below.

jni/native.c: NDK processing interface #include JNIEXPORT void JNICALL Java_sobel_OverlayView_nativeSobel( /* Two parameters passed to every JNI function */ JNIEnv *env, jobject this, /* Four parameters specific to this function */ jbyteArray frame, jint width, jint height, jobject out) { /* Perform Sobel operation, filling "out" */ } src/sobel/OverlayView.java: Native function definition class OverlayView { private native void nativeSobel(byte[] frame, int width, int height, IntBuffer out); }

Note that in the above code, the int[] array used beforehand for overlay output has been replaced by an IntBuffer; this is to allow access to the raw memory buffer for native work, since a standard int[] has memory allocated by the JVM, and cannot be written to by the JNI. Buffers are designed to allow direct access to the buffer memory through the object's GetDirectBufferAddress function, which we can use for writing the output of the Sobel operation.

The Java code shown above for the operation can be translated directly to C code, as below:

jni/native.c: Sobel implementation #include JNIEXPORT void JNICALL Java_sobel_OverlayView_nativeSobel( JNIEnv *env, jobject this, jbyteArray frame, jint width, jint height, jobject out) { /* Get a pointer to the raw output buffer */ jint *dest_buf = (jint*) ((*env)->GetDirectBufferAddress(env, out)); /* Get a pointer to (probably a copy of) the input */ jboolean frame_copy; jint *src_buf = (*env)->GetByteArrayElements(env, frame, &frame_copy); int x, y, w = width, pos = width+1; int maxX = width-1, maxY = height-1; int sobelX, sobelY, sobelFinal; for(y=1; y2) { for(x=1; x1] - src_buf[pos+w-1] + src_buf[pos+1] + src_buf[pos+1] - src_buf[pos-1] - src_buf[pos-1] + src_buf[pos-w+1] - src_buf[pos-w-1]; sobelY = src_buf[pos+w+1] + src_buf[pos+w] + src_buf[pos+w] + src_buf[pos+w-1] - src_buf[pos-w+1] - src_buf[pos-w] - src_buf[pos-w] - src_buf[pos-w-1]; sobelFinal = (sobelX + sobelY) >> 1; if(sobelFinal < 48) sobelFinal = 0; if(sobelFinal >= 48) sobelFinal = 255; dest_buf[pos] = (sobelFinal << 0) | (sobelFinal << 8) | (sobelFinal << 16) | (sobelFinal << 24); } } } src/sobel/OverlayView.java: Calling the native function private IntBuffer mFrameSobel; private void setPreviewSize(Camera.Size s) { // Allocate a 32-bit direct buffer as large as the preview mFrameSobel = ByteBuffer.allocateDirect(s.width * s.height * 4) .asIntBuffer(); mFrameSize = s; } private void setCamera(Camera c) { mCam = c; mCam.setPreviewCallback(new PreviewCallback() { // Called by camera hardware, with preview frame public void onPreviewFrame(byte[] frame, Camera c) { Canvas cOver = mOverSH.lockCanvas(null); try { nativeSobel(frame, mFrameSize.width, mFrameSize.width, mFrameSobel); // Rewind the array after operation mFrameSobel.position(0); Rect src = new Rect(0, 0, mFrameSize.width, mFrameSize.height); Rect dst = new Rect(0, 0, cOver.getWidth(), cOver.getHeight()); Paint pt = new Paint(); Bitmap bmp = Bitmap.createBitmap(mFrameSobel, mFrameSize.width, mFrameSize.height, Bitmap.Config.ARGB_8888); pt.setColor(Color.WHITE); pt.setAlpha(0xFF); cOver.drawBitmap(bmp, src, dst, pt); } catch(Exception e) { // Log/trap rendering errors } finally { mOverSH.unlockCanvasAndPost(cOver); } } }); }

Once the Java code has been configured to call the native function for processing, the lack of extraneous work by the JVM results in a significant speed-up: under testing on my hardware, a speed of 15-20 frames per second was easily achievable, and this can be improved through further optimisation of the algorithm.

In conclusion

The Android documentation for the NDK states:

"Using native code does not result in an automatic performance increase, but always increases application complexity."

In the case of the memory-intensive processing presented here, the NDK has a significant advantage over the Java virtual machine, in that it doesn't perform bounds checking on array and pointer accesses. Since most augmented reality applications will need to work on the camera preview image, and provide an overlay on top of the preview, the technique of shunting processing into an NDK function can be useful.

Imran Nazar <tf@imrannazar.com>, May 2011.

Camera Overlays with Android Native Code

Thu, 21 Apr 2011 19:07:47 +0000

The source code to this project is now available in full, at: http://imrannazar.com/content/files/android-sobel.zip

One of the most demanding tasks for a smartphone application to take on is "augmented reality": producing a display of the world with information overlaid in real-time. This is generally done by using the smartphone's camera, in preview mode, to provide a base for a translucent overlay; the intensity of the task lies in calculating the contents of the overlay in a time-sensitive environment.

This article hopes to provide a gentle two-part introduction to augmented reality as implemented on Android-based smartphone devices. The process will be introduced using the example of an edge detector run on the camera's current view, and updated alongside the camera view in real-time. Many of the processes involved in producing such a view will apply to any software that seeks to provide a view based on the camera, so the code presented here will have wider application to programs of this class.

The edge detection algorithm that will be used in this article is the Sobel operator; the algorithm will be covered in detail later, but the application developed here will, as a whole, be named after this operator. An example output for the application is shown below.

Figure 1: Sample output

Providing a camera view

In order to overlay data on the camera preview screen, it's a prerequisite to be able to display the camera preview; this is done by rendering the preview onto a surface. For that to occur, the simplest method is to place a SurfaceView-type view on the application's main layout, and position it such that it covers the screen. This can be done through the standard layout XML:

res/layout/main.xml: Main layout for camera preview <FrameLayout xmlns:android="http://schemas.android.com/apk/res/android" android:orientation="vertical" android:layout_width="fill_parent" android:layout_height="fill_parent"> <SurfaceView android:id="@+id/surface_camera" android:layout_width="fill_parent" android:layout_height="fill_parent" /> </FrameLayout>

With a SurfaceView made available, the application's main activity can place a surface and its associated canvas onto the view. To do this, the application needs to act as a SurfaceHolder, and implement the methods of a SurfaceHolder.Callback; this allows the Android operating system to treat the activity as an end-point for rendering surfaces. In code, it's a simple process to define an activity as a surface holder callback: three methods are made available by the SurfaceHolder.Callback interface.

src/sobel/Sobel.java: Main activity package sobel; public class Sobel extends Activity implements SurfaceHolder.Callback { /* Activity event handlers */ // Called when activity is initialised by OS @Override public voidonCreate(Bundle inst) { super.onCreate(inst); setContentView(R.layout.main); // Initialise camera initCamera(); } // Called when activity is closed by OS @Override public void onDestroy() { // Turn off the camera stopCamera(); } /* SurfaceHolder event handlers */ // Called when the surface is first created public void surfaceCreated(SurfaceHolder sh) { // No action required } // Called when surface dimensions etc change public void surfaceChanged(SurfaceHolder sh, int format, int width, int height) { // Start camera preview startCamera(sh, width, height); } // Called when the surface is closed/destroyed public void surfaceDestroyed(SurfaceHolder sh) { // No action required } }

The above code will deal with the initialisation of the application and its surface, but the camera hardware needs to be initialised and setup for the preview to be available. This is done in three steps:

Open the camera, when the application is initialised;
Set parameters for the camera, including the width and height of the preview;
Start preview and set surface for the preview output.

The camera helper functions mentioned in the above code sample can be filled in to perform these steps:

src/sobel/Sobel.java: Camera initialisation private Camera mCam; private SurfaceView mCamSV; private SurfaceHolder mCamSH; // Initialise camera and surface private void initCamera() { mCamSV = (SurfaceView)findViewById(R.id.surface_camera); mCamSH = mCamSV.getHolder(); mCamSH.addCallback(this); mCam = Camera.open(); } // Setup camera based on surface parameters private void startCamera(SurfaceHolder sh, int width, int height) { Camera.Parameters p = mCam.getParameters(); p.setPreviewSize(width, height); mCam.setParameters(p); try { mCam.setPreviewDisplay(sh); } catch(Exception e) { // Log surface setting exceptions } mCam.startPreview(); } // Stop camera when application ends private void stopCamera() { mCamSH.removeCallback(this); mCam.stopPreview(); mCam.release(); }

One consideration to make when setting up the camera is that the size of the surface prepared for preview may not be a size supported by the camera subsystem. If this is the case, and the activity attempts to set a preview size based on the surface size, the application may force-close when it starts. A work-around for this is not to use the surface's dimensions when setting a preview size, but instead to ask the camera which preview sizes are supported, and to use one of those. The list of preview sizes can be retrieved through the camera's Parameters object:

src/sobel/Sobel.java: Using supported preview sizes private void startCamera(SurfaceHolder sh, int width, int height) { Camera.Parameters p = mCam.getParameters(); for(Camera.Size s : p.getSupportedPreviewSizes()) { // In this instance, simply use the first available // preview size; could be refined to find the closest // values to the surface size p.setPreviewSize(s.width, s.height); break; } mCam.setParameters(p); try { mCam.setPreviewDisplay(sh); } catch(Exception e) { // Log surface setting exceptions } mCam.startPreview(); }

The application is now equipped to produce a preview of the camera's current field of view. The preview may appear alongside an application title bar, notification area and so forth; to remove these and gain an unobstructed rendering of the preview, the application can request to be made fullscreen:

src/sobel/Sobel.java: Fullscreen activity @Override public void onCreate(Bundle inst) { super.onCreate(inst); getWindow().setFlags(WindowManager.LayoutParams.FLAG_FULLSCREEN, WindowManager.LayoutParams.FLAG_FULLSCREEN); setContentView(R.layout.main); initCamera(); }

Providing an overlay

Now that the camera preview is being rendered into a SurfaceView, the next step in augmented reality is the ability to draw pixels and/or shapes over the preview image. Since the camera hardware is directly drawing to the surface made available to it, this surface cannot be used for additional drawing: any output made to the surface will be automatically overwritten by the camera.

This problem can be resolved by providing an additional surface, positioned over the top of the camera preview, onto which things can be drawn by the application. The new surface can also be a SurfaceView, but if the base Android view is utilised in this instance, it cannot be used to draw dynamic content: the SurfaceView must be extended into a new class. For the purposes of this application, the class can be referred to as OverlayView:

src/sobel/OverlayView.java: Class definition package sobel; public class OverlayView extends SurfaceView { private SurfaceHolder mOverSH; public void OverlayView(Context ctx, AttributeSet attr) { super(ctx, attr); mOverSH = getHolder(); } } src/sobel/Sobel.java: Initialising the OverlayView private OverlayView mOverSV; private void initCamera() { mCamSV = (SurfaceView)findViewById(R.id.surface_camera); mCamSH = mCamSV.getHolder(); mCamSH.addCallback(this); mCam = Camera.open(); mOverSV = (OverlayView)findViewById(R.id.surface_overlay); mOverSV.getHolder().setFormat(PixelFormat.TRANSLUCENT); mOverSV.setCamera(mCam); } private void startCamera(SurfaceHolder sh, int width, int height) { Camera.Parameters p = mCam.getParameters(); for(Camera.Size s : p.getSupportedPreviewSizes()) { p.setPreviewSize(s.width, s.height); mOverSV.setPreviewSize(s); break; } // ... }

In order to lay this new view class over the camera's preview surface, the layout XML needs to be modified to load in the overlay view beforehand:

res/layout/main.xml: Main layout for overlaid preview <FrameLayout xmlns:android="http://schemas.android.com/apk/res/android" android:orientation="vertical" android:layout_width="fill_parent" android:layout_height="fill_parent"> "@+id/surface_overlay" android:layout_width="fill_parent" android:layout_height="fill_parent" /> <SurfaceView android:id="@+id/surface_camera" android:layout_width="fill_parent" android:layout_height="fill_parent" /> </FrameLayout>

With an overlay in place, the content on the overlay needs to be drawn, and regularly updated. Drawing onto a surface is a familiar concept from computer graphics, requiring the locking of a canvas and the drawing of primitives to the canvas; keeping the canvas regularly updated against the camera preview is a little less familiar. A regular update can be achieved in one of two ways:

Timing: A method of the OverlayView is called every few milliseconds, which fetches the current camera preview from its rendered surface. This is a theoretically sound concept, but the camera hardware keeps the preview surface permanently locked, so the application is unable to gain access to it.
Callback: A method of the OverlayView is defined as a "preview callback", and is called automatically by the camera hardware whenever a preview is rendered. The innate advantage to this method is that the camera provides a byte[] of the contents of the camera preview, which can easily be used for calculation of an overlay.

To set up a callback to a method in the OverlayView, the view must first know about the camera: a handle to the camera must be passed over from the main activity. In addition, it's useful for the OverlayView to know the size of preview image it's working with, since the callback method doesn't provide dimensions. The calls to these methods can be seen in the above code sample from Sobel.java, made at initialisation time; the methods are outlined below.

src/sobel/OverlayView.java: Setting up a callback private Camera mCam; private Camera.Size mFrameSize; // Called by Sobel.surfaceChanged, to set dimensions private void setPreviewSize(Camera.Size s) { mFrameSize = s; mFrameCount = 0; } // Called by Sobel.initCamera, to set callback private void setCamera(Camera c) { mCam = c; mCam.setPreviewCallback(new PreviewCallback() { private int mFrameCount; // Called by camera hardware, with preview frame public void onPreviewFrame(byte[] frame, Camera c) { Canvas cOver = mOverSH.lockCanvas(null); try { // Perform overlay rendering here // Here, draw an incrementing number onscreen Paint pt = new Paint(); pt.setColor(Color.WHITE); pt.setTextSize(16); cOver.drawText(Integer.toString(mFrameCount++), 10, 10, pt); } catch(Exception e) { // Log/trap rendering errors } finally { mOverSH.unlockCanvasAndPost(cOver); } } }); }

Running the above code on hardware results in something akin to the following image:

Figure 2: Overlay canvas rendering

In Part 2: Edge detection

The above code takes the application to a point where it can retrieve data from the camera preview (through the preview frame callback's byte[] parameter), and render an overlay. In the second part of this article, I'll look at how the preview data can be run through the Sobel edge detection filter, and how the result can be displayed on the overlay.

Imran Nazar <tf@imrannazar.com>, Apr 2011.

GameBoy Emulation in JavaScript: Timers

Fri, 25 Feb 2011 00:43:38 +0000

This is part 10 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

Since the first computers were put together, one of their basic functions has been to keep time: to coordinate actions according to timers. Even the simplest of games has an element of time to it: Pong, for example, needs to move the ball across the screen at a particular rate. In order to handle these timing issues, every games console has some form of timer to allow for things to happen at a given moment, or at a specific rate.

The GameBoy is no exception to this rule, and contains a set of registers which automatically increment based on a programmable schedule. In this part of the series, I'll be investigating the structure and operation of the timer, and how it can be used to seed pseudo-random number generators, such as the one contained in Tetris and its various clones. One example of a Tetris clone which uses the timer, to pick random pieces for the game, is demonstrated below.

Reset | Run

Figure 1: jsGB implementation with timer

Timer structure

The GameBoy's CPU, as described in the first part of this series, runs on a 4,194,304Hz clock, with two internal measures of the time taken to execute each instruction: the T-clock, which increments with each clock step, and the M-clock, which increments at a quarter of the speed (1,048,576Hz). These clocks are used as the source of the timer, which counts up, in turn, at a quarter of the rate of the M-clock: 262,144Hz. In this article, I'll refer to this final value as the timer's "base speed".

The GameBoy's timer hardware offers two separate timer registers: the system works by incrementing the value in each of these registers at a pre-determined rate. The "divider" timer is permanently set to increment at 16384Hz, one sixteenth of the base speed; since it's only an eight-bit register, its value will go back to zero after it reaches 255. The "counter" timer is more programmable: it can be set to one of four speeds (the base divided by 1, 4, 16 or 64), and it can be set to go back to a value that isn't zero when it overflows past 255. In addition, the timer hardware will send an interrupt to the CPU, as described in part 8, whenever the "counter" timer does overflow.

There are four registers used by the timer; these are made available for use by the system as part of the I/O page, just like the graphics and interrupt registers:

Address

Details

0xFF04

Divider

Counts up at a fixed 16384Hz;
reset to 0 whenever written to

0xFF05

Counter

Counts up at the specified rate
Triggers INT 0x50 when going 255->0

0xFF06

Modulo

When Counter overflows to 0,
it's reset to start at Modulo

0xFF07

Control

Bits	Function	Details
0-1	Speed	00: 4096Hz 01: 262144Hz 10: 65536Hz 11: 16384Hz
2	Running	1 to run timer, 0 to stop
3-7	Unused

Table 1: Timer registers

Since the "counter" timer triggers an interrupt when it overflows, it can be especially useful if a game requires something to happen at a regular interval. However, a Gameboy game can generally use the vertical blank to much the same effect, since it occurs at a regular pace of almost 60Hz; the vertical blanking handler can be used not only to refresh the screen contents, but to check the keypad and update the game state. Therefore, there's little call for use of the timer in traditional Gameboy games, though it can be used to greater effect in graphic demos.

Implementing the timer emulation

The emulation developed in this article series uses the CPU's clock as the basic unit of time. For that reason, it's simplest to maintain a clock for the timer that runs in step with the CPU clock, and is updated by the dispatch function. It's convenient at this stage to keep the DIV register as a separate entity to the controllable timer, incremented at 1/16th the rate again of the fastest timer step:

Timer.js: Clock increment TIMER = { _clock: { main: 0, sub: 0, div: 0 }, _reg: { div: 0, tima: 0, tma: 0, tac: 0 }, inc: function() { // Increment by the last opcode's time TIMER._clock.sub += Z80._r.m; // No opcode takes longer than 4 M-times, // so we need only check for overflow once if(TIMER._clock.sub >= 4) { TIMER._clock.main++; TIMER._clock.sub -= 4; // The DIV register increments at 1/16th // the rate, so keep a count of this TIMER._clock.div++; if(TIMER._clock.div == 16) { TIMER._reg.div = (TIMER._reg.div+1) & 255; TIMER._clock.div = 0; } } // Check whether a step needs to be made in the timer TIMER.check(); } }; Z80.js: Dispatcher while(true) { // Run execute for this instruction var op = MMU.rc(Z80._r.pc++); Z80._map[op](); Z80._r.pc &= 65535; Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; // Update the timer TIMER.inc(); Z80._r.m = 0; Z80._r.t = 0; // If IME is on, and some interrupts are enabled in IE, and // an interrupt flag is set, handle the interrupt if(Z80._r.ime && MMU._ie && MMU._if) { // Mask off ints that aren't enabled var ifired = MMU._ie & MMU._if; if(ifired & 0x01) { MMU._if &= (255 - 0x01); Z80._ops.RST40(); } } Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; // Update timer again, in case a RST occurred TIMER.inc(); }

From here, the controllable timer is made up of varying divisions of the base speed, making it relatively simple to check whether the timer values need to be stepped up, and to provide the registers as part of the memory I/O page. The interface between the following section of code and the MMU I/O page handler, is left as an exercise for the reader.

Timer.js: Register check and update check: function() { if(TIMER._reg.tac & 4) { switch(TIMER._reg.tac & 3) { case 0: threshold = 64; break; // 4K case 1: threshold = 1; break; // 256K case 2: threshold = 4; break; // 64K case 3: threshold = 16; break; // 16K } if(TIMER._clock.main >= threshold) TIMER.step(); } }, step: function() { // Step the timer up by one TIMER._clock.main = 0; TIMER._reg.tima++; if(TIMER._reg.tima > 255) { // At overflow, refill with the Modulo TIMER._reg.tima = TIMER._reg.tma; // Flag a timer interrupt to the dispatcher MMU._if |= 4; } }, rb: function(addr) { switch(addr) { case 0xFF04: return TIMER._reg.div; case 0xFF05: return TIMER._reg.tima; case 0xFF06: return TIMER._reg.tma; case 0xFF07: return TIMER._reg.tac; } }, wb: function(addr, val) { switch(addr) { case 0xFF04: TIMER._reg.div = 0; break; case 0xFF05: TIMER._reg.tima = val; break; case 0xFF06: TIMER._reg.tma = val; break; case 0xFF07: TIMER._reg.tac = val & 7; break; } }

Seeding a pseudo-random number generator

A major component of many games is unpredictability: Tetris, for instance, will throw an unknown pattern of pieces down the well, and the game consists of building rows using these pieces. Ideally, a computer provides unpredictability by generating random numbers, but this runs contrary to the methodical nature of a computer; it's not possible for a computer to provide a truly random pattern of numbers. Various algorithms exist to produce sequences of numbers that look superficially like they're random, and these are called pseudo-random number generation (PRNG) algorithms.

A PRNG is generally implemented as a formula that, given a particular input number, will produce another number with almost no relation to the input. For Tetris, nothing so complicated is required; instead, the following code is used to produce a seemingly random block.

Tetris.asm: Select new block BLK_NEXT = 0xC203 BLK_CURR = 0xC213 REG_DIV = 0x04 NBLOCK: ld hl, BLK_CURR ; Bring the next block ld a, (BLK_NEXT) ; forward to current ld (hl),a and 0xFC ; Clear out any rotations ld c,a ; and hold onto previous ld h,3 ; Try the following 3 times .seed: ldh a, (REG_DIV) ; Get a "random" seed ld b,a .loop: xor a ; Step down in sevens .seven: dec b ; until zero is reached jr z, .next ; This loop is equivalent inc a ; to (a%7)*4 inc a inc a inc a cp 28 jr z, .loop jr .seven .next: ld e,a ; Copy the new value dec h ; If this is the jr z, .end ; last try, just use this or c ; Otherwise check and 0xFC ; against the previous block cp c ; If it's the same again, jr z, .seed ; try another random number .end: ld a,e ; Get the copy back ld (BLK_NEXT), a ; This is our next block

The basis of the Tetris block selector is the DIV register: since the selection routine is only run once every few seconds, the register will have an unknown value on any given run, and it thus makes a fair approximation of a random number source. With the timer system having been emulated, Tetris and its clones can be emulated to full functionality, as shown in Figure 1.

Coming up: Sound

One aspect of game emulation which has been overlooked until now is the generation of sound, and the synchronisation of sound to the speed of the emulation. Over and above the aspect of sound generation by the emulator, is the method by which sound is output to the browser; the next part of this series will investigate the issues surrounding sound output mechanisms, and whether a coherent strategy can be put together for sound production in JavaScript.

Imran Nazar <tf@imrannazar.com>, Feb 2011.

Sci-fi Shorts: The Harness

Wed, 02 Feb 2011 09:55:13 +0000

The war had almost drained our combined resources of energy: even though we were closer to the Sun, the solar power simply couldn't be collected without materials to build solar collectors. With every last scrap of metal engaged in the war effort, or otherwise up in space somewhere, it was impossible to get anything done on the surface.

The black hole changed that. It was spotted a few years ago, slowly drifting across the plane of the solar system; it was only seen at all because it crossed the path of Venus, and the planet changed shape for a few hours. In their haste to focus attacks on Earth, the invaders missed the shift in the appearance of Venus, and we got there first.

With so much detritus up in high orbit, and so many people based on the Atlantic tower platform, it didn't take long for an old warship module to be staffed out with a skeleton crew, and pushed off in the direction of the black hole. The crew ended up being us, the second-shift engineering staff of the Nagios. The warship was a decrepit 301 with no gun batteries to speak of, but it did have one thing in its favour: the experimental coil we were dispatched to try out.

I took on the role of pilot, simply because I'd done some strong gravity training back at the Academy. Jackson was our chief engineer, and Fuller had the day off, so he got roped in too. Three of us, and a crazy theory, to try to change the war.

The theory was simple, when compared to others in the field of quantum relativity. A black hole has a gravitational field, which radiates out in lines, much like a magnetic field. Black holes also spin, which mean the lines of gravitational field are moving all the time. Vassilev theorised that a superconducting coil placed around the hole, near the Schwarzschild radius, would cut through the field lines and generate an electrogravitic field in the coil. It was impossible to even attempt a confirmation of the theory at the time, since there were no black holes, and no superconductive material to spare.

With the war, the situation was different. Even our old and broken 301 had a few miles of ceramic-wire running to the weapons systems, which could carry incredible amounts of charge. Measurements of the black hole mass put it at around one thousandth of the sun's mass, which left the hole's Schwarzschild radius at 30 metres: there would be plenty of wire for the coil. Getting the coil into place, that was my job as ship's pilot.

As we came up towards Venus, I called down from the pilot's chair, into the pit. "Jackson, get the ceramic down the back; let's give this hole a good run."

It's not easy to steer a course around any strong gravity source, as we'd all found out when the Weichinger carrier came too close to an enemy neutron barrage: it had been torn clean in half by the tidal forces. A black hole was more predictable, but much scarier at the same time. At least with a neutron barrage, it was possible to come out the other side.

I'd done this kind of thing in simulators before, with a whole term of spaceflight at the Academy focussed on getting around when strong gravity was in the area. One thing you never expect is that the outside universe gets smaller. We were close enough to the hole now, that I could barely see the Earth as a small point in what seemed like the far distance, and stars were beginning to show trails as we moved.

The 301 wasn't a highly manouverable warship at the height of its career, over thirty years ago: it was often referred to as the Bucket Class by the current cadets of the Academy. Moving around near a black hole was another adventure, with the constant tug to the left as I wound the ship anti-clockwise around the point.

"For what it's worth," Fuller called up from the cargo hold, "we're nearly out of gas for the manouvering thrusters."

"We're low on everything," I said in reply. "No worries, I'm nearly done here."

The hole was at least uniform, so it was pretty short work to get the coil deployed in four loops around its equator. Jackson came up the ladder to the viewscreen, while Fuller stayed down in the hold to get the coil's ends into a power meter. Based on the shape of the black hole, and how stationary I could keep the ship against it, we were looking at maybe three hundred gigawatts: not bad for a proof of concept.

Fuller fired up the meter, and there was silence for a few seconds. "Er. Jackson, Irvine? You'll want to see this."

"I'm holding the ship straight," I answered. "Do I need to come down there to see what the meter says?"

"Well, it's off the scale, and the scale goes up to a terawatt."

Jackson looked down into the pit with a slightly incredulous look. "Nah, that can't be right. Lemme get down there, I'll check the wire."

Even superconducting ceramic heats up when enough charge flows through it. Jackson put his hand against the coil, and drew it back quickly. "That's, er, warm. You might want to get someone else out here, Irvine; we might have more than we bargained for here."

That's how it started, of course. Eventually, we had wires criss-crossing that hole and pulling out around ten terawatts: more than enough to feed a new generation of powered weapons. Thanks to the hole, we fought the invaders off, and Earth managed to claw its way out into the light.

A few years after the war ended, Stan Vassilev was awarded the Peace Prize, which he found suitably ironic. It was his opinion that we'd bled our new power source dry, and it would soon evaporate; a month later, the hole vanished, and the 10TW of power went with it.

I still don't know where it came from, and why it showed up to help Earth when it did. Religion's outlawed by the military, so I can't be a religious man, but I do wonder sometimes. Even as an engineer, it strikes me as odd that Earth was dealt such a massive stroke of luck: someone out there's looking after our interests.

GameBoy Emulation in JavaScript: Memory Banking

Fri, 03 Dec 2010 19:20:30 +0000

This is part 9 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

Thus far in this series, we've been dealing with the loading and emulation of a simple memory map for the GameBoy, with the entirety of the game ROM fitting into the lower half of memory. There aren't many games that fit into memory in full (Tetris is one of the few); most games are larger than this, and have to employ an independent mechanism to swap "banks" of game ROM into the GameBoy CPU's view.

Some of the first games in the GameBoy library were built with a Memory Bank Controller inside the cartridge, which did this job of swapping banks of ROM into view; over the years, various versions of the cartridge MBC were built for increasingly large games. In the particular example of the demo associated with this part, the first version of the MBC is used to handle the loading of a 64kB ROM.

Reset | Run

Figure 1: jsGB implementation with MBC1 support

Banking and memory expansion

Through the years, many computer systems have had to deal with the problem of having too much program to fit into memory. Traditionally, there have been two ways to deal with this problem.

Increase the address space: Build a new CPU with more address lines, allowing it to see and understand a larger amount of memory. This is the preferred solution, but requires significant time to redevelop the computer system in question, and may need more changes to be made in the supporting chipset for the CPU.
Virtual memory: This can either refer to the holding of chunks of RAM on disk, and their swapping in when required; or the swapping in of chunks of pre-written ROM when required. In both cases, the system hardware needs little extension but any software for the system has to be aware of the paging/banking system, in order to use it.

Since the GameBoy is a fixed hardware platform with wide distribution, there's no way to increase the address space when larger games are produced; instead, the Memory Bank Controller built into the cartridge offers a way to switch 16kB banks of ROM into view. In addition to this, the MBC1 supports up to 32kB of "external RAM", which is writable memory in the cartridge; this can be banked into the [A000-BFFF] space in the memory map, if it's available.

In order to facilitate software that uses the MBC1, the first 16kB bank of ROM (bank 0) is fixed at address 0000; the second half of the ROM space can be made into a window on any ROM bank between 1 and 127, for a maximum ROM size of 2048kB. One of the oddities of the MBC1 is that it deals internally in 32's: banks #32, #64 and #96 are inaccessible, since they're treated within the banking system as bank #0. This means that 125 banks apart from the fixed bank #0 are usable.

There are four registers within the MBC1 chip, that allow for switching of banks for the ROM and RAM; these can be changed by writing to the (normally read-only) ROM space anywhere within a certain range. The details are given in the below table.

Locations	Register	Details
0000-1FFF	Enable external RAM	4 bits wide; value of 0x0A enables RAM, any other value disables
2000-3FFF	ROM bank (low 5 bits)	Switch between banks 1-31 (value 0 is seen as 1)
4000-5FFF	ROM bank (high 2 bits) RAM bank	ROM mode: switch ROM bank "set" {1-31}-{97-127} RAM mode: switch RAM bank 0-3
6000-7FFF	Mode	0: ROM mode (no RAM banks, up to 2MB ROM) 1: RAM mode (4 RAM banks, up to 512kB ROM)

Table 1: MBC1 register set

MBCs and the cartridge header

Since there are multiple kinds of controller for banking, any given game must state which MBC is used, in the cartridge header data. This is the first chunk of data in the cartridge ROM, and follows a specific format.

Location(s)	Value	Size (bytes)	Details
0100-0103h	Entry point	4	Where the game starts Usually "NOP; JP 0150h"
0104-0133h	Nintendo logo	48	Used by the BIOS to verify checksum
0134-0143h	Title	16	Uppercase, padded with 0
0144-0145h	Publisher	2	Used by newer GameBoy games
0146h	Super GameBoy flag	1	Value of 3 indicates SGB support
0147h	Cartridge type	1	MBC type/extras
0148h	ROM size	1	Usually between 0 and 7 Size = 32kB << [0148h]
0149h	RAM size	1	Size of external RAM
014Ah	Destination	1	0 for Japan market, 1 otherwise
014Bh	Publisher	1	Used by older GameBoy games
014Ch	ROM version	1	Version of the game, usually 0
014Dh	Header checksum	1	Checked by BIOS before loading
014E-014Fh	Global checksum	2	Simple summation, not checked
0150h	Start of game

Table 2: Cartridge header format

In this particular case, we're interested in the value of 0147h, the cartridge type. The cartridge type can be one of the following values, if an MBC1 is fitted to the cartridge:

Value	Definition
00h	No MBC
01h	MBC1
02h	MBC1 with external RAM
03h	MBC1 with battery-backed external RAM

Table 3: Cartridge type values pertaining to MBC1

For the purposes of this article, a system of battery backing will not be implemented for the external RAM; this feature is often used by games to save their state for later use, and will be looked at in more detail in a later part.

Implementation of MBC1

The memory bank controllers are an obvious manipulation of memory, and thus fit neatly into the MMU. Since the first ROM bank (bank #0) is fixed, an offset need only be maintained for the MBC to indicate where it's reading for the second bank. In order to allow for more MBC handling to be added later, an array of data can be used to hold the state of a given controller:

MMU.js: MBC state and reset MMU = { // MBC states _mbc: [], // Offset for second ROM bank _romoffs: 0x4000, // Offset for RAM bank _ramoffs: 0x0000, // Copy of the ROM's cartridge-type value _carttype: 0, reset: function() { ... // In addition to previous reset code, // initialise MBC internal data MMU._mbc[0] = {}; MMU._mbc[1] = { rombank: 0, // Selected ROM bank rambank: 0, // Selected RAM bank ramon: 0, // RAM enable switch mode: 0 // ROM/RAM expansion mode }; MMU._romoffs = 0x4000; MMU._ramoffs = 0x0000; }, load: function(file) { ... MMU._carttype = MMU._rom.charCodeAt(0x0147); } }

As can be seen in the above code, the internal state of the MBC1's four registers is represented by an object within the MMU, associated with MBC type 1. When these are changed, the ROM and RAM offsets can be modified to point into the appropriate bank of memory; once the pointers are set, access to the memory can proceed almost as normal.

MMU.js: MBC1-based access MMU = { rb: function(addr) { switch(addr & 0xF000) { ... // ROM (switched bank) case 0x4000: case 0x5000: case 0x6000: case 0x7000: return MMU._rom.charCodeAt(MMU._romoffs + (addr & 0x3FFF)); // External RAM case 0xA000: case 0xB000: return MMU._eram[MMU._ramoffs + (addr & 0x1FFF)]; } } };

The calculation of these pointer offsets is performed when the MBC registers are written, as shown below.

MMU.js: MBC1 control wb: function(addr, val) { switch(addr & 0xF000) { // MBC1: External RAM switch case 0x0000: case 0x1000: switch(MMU._carttype) { case 2: case 3: MMU._mbc[1].ramon = ((val & 0x0F) == 0x0A) ? 1 : 0; break; } break; // MBC1: ROM bank case 0x2000: case 0x3000: switch(MMU._carttype) { case 1: case 2: case 3: // Set lower 5 bits of ROM bank (skipping #0) val &= 0x1F; if(!val) val = 1; MMU._mbc[1].rombank = (MMU._mbc[1].rombank & 0x60) + val; // Calculate ROM offset from bank MMU._romoffs = MMU._mbc[1].rombank * 0x4000; break; } break; // MBC1: RAM bank case 0x4000: case 0x5000: switch(MMU._carttype) { case 1: case 2: case 3: if(MMU._mbc[1].mode) { // RAM mode: Set bank MMU._mbc[1].rambank = val & 3; MMU._ramoffs = MMU._mbc[1].rambank * 0x2000; } else { // ROM mode: Set high bits of bank MMU._mbc[1].rombank = (MMU._mbc[1].rombank & 0x1F) + ((val & 3) << 5); MMU._romoffs = MMU._mbc[1].rombank * 0x4000; } break; } break; // MBC1: Mode switch case 0x6000: case 0x7000: switch(MMU._carttype) { case 2: case 3: MMU._mbc[1].mode = val & 1; break; } break; ... // External RAM case 0xA000: case 0xB000: MMU._eram[MMU._ramoffs + (addr & 0x1FFF)] = val; break; } }

In the above control code, instances of MBC1 that are stated as having external RAM attached are the ones which have RAM banking. With this code in place, the demo shown in Figure 1 loads and runs properly; without the MBC1 handler, the code would crash while attempting to access sprite and background data for the display.

Coming up

Aside from being able to fit larger games into memory, one of the more important aspects of a game is the ability to keep time: a clock-based game, for example, is useless without some kind of timing mechanism on which to base its clock. As mentioned previously, many games use the vertical blanking interrupt for this timing, but some require a finer-grained time structure; this is provided in the GameBoy by a hardware timer, tied into the CPU clock.

The timer also provides a method of examining the CPU clock, which makes it useful as a seed for random number generators; Tetris, for example, picks its blocks using this functionality of the hardware timer. In the next part, I'll look at the details of how the timer works, and how it can be implemented.

Imran Nazar <tf@imrannazar.com>, Dec 2010.

Sci-fi Shorts: Power

Tue, 09 Nov 2010 19:39:48 +0000

Mera sat at his station, watching the slow spin of the heavy water reactor as it spat out brackish waste fluid. As the water cascaded through the generator, he had another problem to solve: one of power.

The ship had always been furtive while hopping through space: making random jumps like a fly being preyed upon. The engines were growing torpid with old age, jumping less often and not as far, with progress deteriorating by the day. The only way to fix things was to take them offline for refurbishment, which left the ship stationary and open to attack.

The go-ahead had been given, budgeted to two hours: it generally took the Alliance ships at least three to find them at rest stops, so two hours was plenty. Mera shut down the reactor; as its spin slowed, he opened the engine casing and saw the problem immediately.

Some of the waste water had crept into the engines, evaporated with the heat, and left salt deposits encrusted around the antimatter injectors. This would take more than two hours to fix; the ship would have to land on a safe planet for a full overhaul of the injector matrix.

Mera closed up the casing, and fired up the reactor. Nothing happened: the reactor had itself seized with salt. The catch was that the Jump engine needed power from the reactor: without the Jump drive, they'd never get to a planet before Alliance hordes were crawling over them.

Now they were screwed.

GameBoy Emulation in JavaScript: Interrupts

Fri, 05 Nov 2010 21:03:20 +0000

This is part 8 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

Please note: This article has been updated to remove an incorrect interrupt handling procedure. --12th Nov, 2010

In the previous part, the foundations for simulating a game were laid, with the introduction of sprites. However, one aspect was missing from the emulator: the vertical blanking interrupt. In this part, interrupts as a whole will be introduced, and the blanking interrupt in particular will be implemented; once this has been done, the emulator will run Tetris.

Reset | Run

Figure 1: jsGB implementation with vertical blanking interrupt

Imagine that you have a computer with a network card, and some software that processes data from the network. From the perspective of the computer, data only comes in every so often, so you need some way for the software to know that new data has arrived. There are two ways for this to happen:

Polling: The software asks the network card every so often whether new data has arrived. This is a simplistic way of doing things, but has disadvantages:
- The software doesn't know about new data until its periodic check, which means a delay between the data arriving at the computer and it being handled by the software;
- Time has to be taken out on a periodic basis for the checks to be made, taking time away from other work even if no data has arrived;
- If the polling process handles one piece of new data each time it checks, but data is arriving at a faster rate, a backlog of data is created in the network card, and there's potential for some data to be lost;
- If there is no other work to be done, the software still has to check for data, which keeps the computer running at full speed with no work to do.
Interrupts: The network card informs the software that new data has arrived. This is a more complicated way of receiving data, with more steps involved, but it alleviates all the disadvantages of polling:
- New data can be processed as soon as it arrives, with no delay between arrival and the data being handled;
- The software need only take time to handle data when there's definitely data to be handled, and the processing routines can be called as often as necessary to clear any backlogs;
- If there is no other work to be done, the computer can enter a low-power mode until the network card awakens it for new data.

Interrupts and interrupt handlers

It's obvious that the concept of interrupts is a useful one, but there are both hardware and software requirements for interrupts to work. In hardware terms, the CPU has to temporarily stop execution of what it's doing when an interrupt arrives, and instead begin execution of an interrupt handler (sometimes referred to as an Interrupt Service Routine). In the above scenario, a wire is run between the network card and the CPU, allowing the card to inform the CPU when data has arrived.

Figure 2: Hardware implementation of interrupts

The CPU will check its interrupt inputs at the end of every instruction. If an interrupt signal has been given by some attached peripheral like the network card, steps are taken by the CPU to start the interrupt handler: the CPU will save the location where it left off normal execution, register the fact that the interrupt happened, and jump across to the handler.

Figure 3: CPU interrupt handling procedure

In the GameBoy, there are five different interrupt wires, feeding in from the various peripherals. Each one has its own ISR at a different address in memory; the list of interrupts is as follows.

Interrupt	ISR address (hex)
Vertical blank	`0040`
LCD status triggers	`0048`
Timer overflow	`0050`
Serial link	`0058`
Joypad press	`0060`

Table 1: Interrupts in the GameBoy

In the case of the vertical blank, a wire is threaded into the bottom of the LCD; as soon as the GPU has finished scanning all the LCD lines and runs into the bottom of the screen, the interrupt fires and the CPU jumps to 0040, executing the blanking ISR.

Implementation: Interrupt flags

Most CPUs contain a "master flag" for interrupts: they will only be handled by the CPU if this flag is enabled. The Z80 in the GameBoy is no exception, but there are additional registers that deal with the individual interrupts available in the GameBoy. These are memory registers, so they are handled by the memory management unit:

Interrupt
enable

FFFF

When bits are set,
the corresponding
interrupt can
be triggered

Bit	When 0	When 1
0	Vblank off	Vblank on
1	LCD stat off	LCD stat on
2	Timer off	Timer on
3	Serial off	Serial on
4	Joypad off	Joypad on

Interrupt
flags FF0F When bits are set,
an interrupt
has happened Bits in the same
order as FFFF

Table 2: Interrupt flags in the MMU

Since these are memory registers, their implementation is something for the MMU:

MMU.js: Interrupt flags MMU = { _ie: 0, _if: 0, rb: function(addr) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // Zero-page case 0xF00: if(addr == 0xFFFF) { return MMU._ie; } else if(addr >= 0xFF80) { return MMU._zram[addr & 0x7F]; } else { // I/O control handling switch(addr & 0x00F0) { case 0x00: if(addr == 0xFF0F) return MMU._if; break; ... } return 0; } } } }, ... };

The Z80's "master enable" switch is, in a similar manner, something for the Z80 implementation. The CPU provides opcodes for software to flick the master enable into either On or Off position, so these will also need to be implemented:

Z80.js: Interrupt master enable Z80 = { _r: { ime: 0, ... }, reset: function() { ... Z80._r.ime = 1; }, // Disable IME DI: function() { Z80._r.ime = 0; Z80._r.m = 1; Z80._r.t = 4; }, // Enable IME EI: function() { Z80._r.ime = 1; Z80._r.m = 1; Z80._r.t = 4; } };

Implementation: Interrupt handling

With the interrupt flags in place, the main execution loop can be redeveloped, to fall more in line with the execution path from figure 3. After execution, the interrupt flags need checking to see whether an enabled interrupt has occurred; if it has, its handler can be called.

Z80.js: Vblank interrupt handler Z80 = { _ops: { ... // Start vblank handler (0040h) RST40: function() { // Disable further interrupts Z80._r.ime = 0; // Save current SP on the stack Z80._r.sp -= 2; MMU.ww(Z80._r.sp, Z80._r.pc); // Jump to handler Z80._r.pc = 0x0040; Z80._r.m = 3; Z80._r.t = 12; }, // Return from interrupt (called by handler) RETI: function() { // Restore interrupts Z80._r.ime = 1; // Jump to the address on the stack Z80._r.pc = MMU.rw(Z80._r.sp); Z80._r.sp += 2; Z80._r.m = 3; Z80._r.t = 12; } } }; while(true) { // Run execute for this instruction var op = MMU.rc(Z80._r.pc++); Z80._map[op](); Z80._r.pc &= 65535; Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; Z80._r.m = 0; Z80._r.t = 0; // If IME is on, and some interrupts are enabled in IE, and // an interrupt flag is set, handle the interrupt if(Z80._r.ime && MMU._ie && MMU._if) { // Mask off ints that aren't enabled var ifired = MMU._ie & MMU._if; if(ifired & 0x01) { MMU._if &= (255 - 0x01); Z80._ops.RST40(); } } Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; }

Next time: Bigger games

As shown in Figure 1, the emulator has reached a reasonable stage: it's able to emulate a released game in at least some form. It does, however, have the problem of game size. Tetris is a 32kB ROM, and fits perfectly into the "ROM" space in the memory map. Games tend to have larger ROMs than this, and the cartridge follows a process of mapping portions of the ROM into memory. Next time, I'll look at the simplest available form of ROM mapping for the GameBoy, and its implementation on a 64kB game ROM.

Imran Nazar <tf@imrannazar.com>, Nov 2010.

GameBoy Emulation in JavaScript: Sprites

Sun, 10 Oct 2010 22:34:05 +0000

This is part 7 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

Previously in this series, the emulator was extended to enable keypad input, which meant that a game of tic-tac-toe could be played. The problem left by this was that the game had to be played blind: there was no indication of where the next move would be made, nor of to where on the game a keypress would move you. Traditionally, two-dimensional gaming consoles have solved this issue through the use of sprites: movable object blocks that can be placed independently of the background, and which contain data separate to that of the background.

The GameBoy is no exception in this regard: it provides for sprites to be placed above or below the background, and multiple sprites to be on screen at the same time. Once this has been implemented in the emulator, the tic-tac-toe game runs as below.

Reset | Run

Figure 1: jsGB implementation with sprites

Introduction: GameBoy sprites

GameBoy sprites are graphic tiles, just like those used for the background: this means that each sprite is 8x8 pixels. As stated above, a sprite can be placed anywhere on the screen, including halfway or all the way off-screen, and it can be placed above or below the background. What this means technically is that sprites below the background show through where the background has colour value 0.

Figure 2: Sprite priorities

In the above figure, the sprite above the background shows the background through the middle of it, since these pixels in the sprite are set to colour 0; in the same way, the background lets through the sprites below it where the background colour is 0. In order to simulate this in an emulator, the simplest procedure would be to render the sprites below the background, then the background itself, and finally the sprites above it. However, this is a somewhat naive algorithm, since it duplicates the sprite rendering process; it's simpler instead to draw the background first, then work out whether a given pixel in the sprite should appear based on its priority and the background colour at that position.

Pseudocode for sprite rendering For each row in sprite If this row is on screen For each pixel in row If this pixel is on screen If this pixel is transparent * Do nothing Else If the sprite has priority Draw pixel Else if this pixel in the background is 0 Draw pixel Else * Do nothing End If End If End If End For End If End For

One additional complication to the GameBoy sprite system is that a sprite can be "flipped" horizontally or vertically by the hardware, at the time it's rendered; this saves space in the game, since (for example) a spaceship flying backwards can be represented by the same sprite as forward motion, with the appropriate flip applied.

Sprite data: Object Attribute Memory

The GameBoy can hold information about 40 sprites, in a dedicated region of memory called Object Attribute Memory (OAM). Each of the 40 sprites has four bytes of data in the OAM associated with it, as detailed below.

Byte

Description

Y-coordinate of top-left corner
(Value stored is Y-coordinate minus 16)

X-coordinate of top-left corner
(Value stored is X-coordinate minus 8)

Data tile number

Options

Bit	Description	When 0	When 1
7	Sprite/background priority	Above background	Below background (except colour 0)
6	Y-flip	Normal	Vertically flipped
5	X-flip	Normal	Horizontally flipped
4	Palette	OBJ palette #0	OBJ palette #1

Table 1: OAM data for a sprite

In order to more easily access this information when it comes to rendering a scanline, it's useful to build a structure to hold the sprite data, which is filled in based on the contents of the OAM. When data is written to the OAM, the MMU in consort with the graphics emulation can update this structure for later use. An implementation of this would be as follows.

MMU.js: OAM access rb: function(addr) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // OAM case 0xE00: return (addr < 0xFEA0) ? GPU._oam[addr & 0xFF] : 0; } } }, wb: function(addr) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // OAM case 0xE00: if(addr < 0xFEA0) GPU._oam[addr & 0xFF] = val; GPU.buildobjdata(addr - 0xFE00, val); break; } } } GPU.js: Sprite structure _oam: [], _objdata: [], reset: function() { // In addition to previous reset code: for(var i=0, n=0; i < 40; i++, n+=4) { GPU._oam[n + 0] = 0; GPU._oam[n + 1] = 0; GPU._oam[n + 2] = 0; GPU._oam[n + 3] = 0; GPU._objdata[i] = { 'y': -16, 'x': -8, 'tile': 0, 'palette': 0, 'xflip': 0, 'yflip': 0, 'prio': 0, 'num': i }; } }, buildobjdata: function(addr, val) { var obj = addr >> 2; if(obj < 40) { switch(addr & 3) { // Y-coordinate case 0: GPU._objdata[obj].y = val-16; break; // X-coordinate case 1: GPU._objdata[obj].x = val-8; break; // Data tile case 2: GPU._objdata[obj].tile = val; break; // Options case 3: GPU._objdata[obj].palette = (val & 0x10) ? 1 : 0; GPU._objdata[obj].xflip = (val & 0x20) ? 1 : 0; GPU._objdata[obj].yflip = (val & 0x40) ? 1 : 0; GPU._objdata[obj].prio = (val & 0x80) ? 1 : 0; break; } } }

Sprite palettes

As hinted above, the GPU offers a choice of two palettes for the sprites: each of the 40 sprites can use one of the two palettes, as specified in its OAM entry. These object palettes are stored in the GPU, in addition to the background palette, and can be changed through I/O registers in much the same manner as the palette for the background.

GPU.js: Sprite palette handling _pal: { bg: [], obj0: [], obj1: [] }, wb: function(addr) { switch(addr) { // ... // Background palette case 0xFF47: for(var i = 0; i < 4; i++) { switch((val >> (i * 2)) & 3) { case 0: GPU._pal.bg[i] = [255,255,255,255]; break; case 1: GPU._pal.bg[i] = [192,192,192,255]; break; case 2: GPU._pal.bg[i] = [ 96, 96, 96,255]; break; case 3: GPU._pal.bg[i] = [ 0, 0, 0,255]; break; } } break; // Object palettes case 0xFF48: for(var i = 0; i < 4; i++) { switch((val >> (i * 2)) & 3) { case 0: GPU._pal.obj0[i] = [255,255,255,255]; break; case 1: GPU._pal.obj0[i] = [192,192,192,255]; break; case 2: GPU._pal.obj0[i] = [ 96, 96, 96,255]; break; case 3: GPU._pal.obj0[i] = [ 0, 0, 0,255]; break; } } break; case 0xFF49: for(var i = 0; i < 4; i++) { switch((val >> (i * 2)) & 3) { case 0: GPU._pal.obj1[i] = [255,255,255,255]; break; case 1: GPU._pal.obj1[i] = [192,192,192,255]; break; case 2: GPU._pal.obj1[i] = [ 96, 96, 96,255]; break; case 3: GPU._pal.obj1[i] = [ 0, 0, 0,255]; break; } } break; } }

Rendering sprites

The GameBoy graphics system renders each line of the screen as it's encountered: this includes not only the background, but the sprites below and above it. In other words, rendering of the sprites must be added to the scanline renderer, as a process that occurs after drawing the background. Just as with the background, there's a switch to enable sprites within the LCDC register, and this must be added to the I/O handling for the GPU.

Since a sprite can be anywhere on the screen, including positioned somewhere off-screen, the renderer has to check which sprites are positioned within the current scanline. The simplest algorithm for this is to check the position of each one, and render the appropriate line of the sprite if it falls within the bounds of the scanline. The sprite data can be retrieved in the same way as it is for the background, through the pre-calculated tile set. An example of these things brought together is as follows.

GPU.js: Rendering a scanline with sprites renderscan: function() { // Scanline data, for use by sprite renderer var scanrow = []; // Render background if it's switched on if(GPU._switchbg) { var mapoffs = GPU._bgmap ? 0x1C00 : 0x1800; mapoffs += ((GPU._line + GPU._scy) & 255) >> 3; var lineoffs = (GPU._scx >> 3); var y = (GPU._line + GPU._scy) & 7; var x = GPU._scx & 7; var canvasoffs = GPU._line * 160 * 4; var colour; var tile = GPU._vram[mapoffs + lineoffs]; // If the tile data set in use is #1, the // indices are signed; calculate a real tile offset if(GPU._bgtile == 1 && tile < 128) tile += 256; for(var i = 0; i < 160; i++) { // Re-map the tile pixel through the palette colour = GPU._pal.bg[GPU._tileset[tile][y][x]]; // Plot the pixel to canvas GPU._scrn.data[canvasoffs+0] = colour[0]; GPU._scrn.data[canvasoffs+1] = colour[1]; GPU._scrn.data[canvasoffs+2] = colour[2]; GPU._scrn.data[canvasoffs+3] = colour[3]; canvasoffs += 4; // Store the pixel for later checking scanrow[i] = GPU._tileset[tile][y][x]; // When this tile ends, read another x++; if(x == 8) { x = 0; lineoffs = (lineoffs + 1) & 31; tile = GPU._vram[mapoffs + lineoffs]; if(GPU._bgtile == 1 && tile < 128) tile += 256; } } } // Render sprites if they're switched on if(GPU._switchobj) { for(var i = 0; i < 40; i++) { var obj = GPU._objdata[i]; // Check if this sprite falls on this scanline if(obj.y <= GPU._line && (obj.y + 8) > GPU._line) { // Palette to use for this sprite var pal = obj.pal ? GPU._pal.obj1 : GPU._pal.obj0; // Where to render on the canvas var canvasoffs = (GPU._line * 160 + obj.x) * 4; // Data for this line of the sprite var tilerow; // If the sprite is Y-flipped, // use the opposite side of the tile if(obj.yflip) { tilerow = GPU._tileset[obj.tile] [7 - (GPU._line - obj.y)]; } else { tilerow = GPU._tileset[obj.tile] [GPU._line - obj.y]; } var colour; var x; for(var x = 0; x < 8; x++) { // If this pixel is still on-screen, AND // if it's not colour 0 (transparent), AND // if this sprite has priority OR shows under the bg // then render the pixel if((obj.x + x) >= 0 && (obj.x + x) < 160 && tilerow[x] && (obj.prio || !scanrow[obj.x + x])) { // If the sprite is X-flipped, // write pixels in reverse order colour = pal[tilerow[obj.xflip ? (7-x) : x]]; GPU._scrn.data[canvasoffs+0] = colour[0]; GPU._scrn.data[canvasoffs+1] = colour[1]; GPU._scrn.data[canvasoffs+2] = colour[2]; GPU._scrn.data[canvasoffs+3] = colour[3]; canvasoffs += 4; } } } } } }, rb: function(addr) { switch(addr) { // LCD Control case 0xFF40: return (GPU._switchbg ? 0x01 : 0x00) | (GPU._switchobj ? 0x02 : 0x00) | (GPU._bgmap ? 0x08 : 0x00) | (GPU._bgtile ? 0x10 : 0x00) | (GPU._switchlcd ? 0x80 : 0x00); // ... } }, wb: function(addr, val) { switch(addr) { // LCD Control case 0xFF40: GPU._switchbg = (val & 0x01) ? 1 : 0; GPU._switchobj = (val & 0x02) ? 1 : 0; GPU._bgmap = (val & 0x08) ? 1 : 0; GPU._bgtile = (val & 0x10) ? 1 : 0; GPU._switchlcd = (val & 0x80) ? 1 : 0; break; // ... } }

Coming up

With sprites in place, basic games like the tic-tac-toe running in Figure 1 can work in full. Many games, however, will not run without something else: a method of determining when the screen can be redrawn. Almost every game will perform a "refresh" of the screen data while the screen is in vertical blanking, since changes to the screen won't show up until the next time the GPU comes to draw a frame.

Basic games and demos sometimes do this by checking whether the GPU has hit line #144 in its redrawing process, but this takes up a lot of processing power in repeated looping. The more common method is for the game to be informed when an event has occurred: this message is referred to as an interrupt. In the next part, I'll take a look at the vertical blanking interrupt in particular, and how it can be simulated to provide this message passing process to an emulated game.

Imran Nazar <tf@imrannazar.com>, Oct 2010.

GameBoy Emulation in JavaScript: Input

Sun, 19 Sep 2010 21:41:40 +0000

This is part 6 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

With a working emulator and interface developed over the previous five parts, the emulation system is able to run a basic test ROM, and produce graphical output. What the emulator is currently unable to do is take keypresses as keypad input, and feed them through to the ROM under test; in order for this to be done, the keypad's influence on the I/O registers must be emulated.

With the addition of keypad I/O, the emulator runs as follows.

Reset | Run

Figure 1: jsGB implementation with key input

The keypad

The GameBoy has a single method of input, an eight-key pad out of which any number of keys can be depressed. With most keyboards, the keys are laid out in a grid of columns and rows: these can be treated as wires, between which a key can form a connection. When one of the columns is activated, any rows connected to that column will also activate, and the hardware is able to detect the active rows to determine the currently pressed keys.

With the GameBoy, the keyboard grid has two columns and four rows, which has the advantage that all the required connections can be made within one 8-bit I/O register.

Figure 2: Keyboard wiring

Since all six lines are tied to the same register, the GameBoy procedure for reading a key is slightly convoluted:

Write either 0x10 or 0x20 to JOYP: this will activate either bit 4 or 5, one of the column lines;
Wait a few cycles for the row connections to propagate to JOYP;
Check the low four bits of JOYP, to find which rows were active for this column.

Implementation of the keypad

Writing code to simulate keypad presses is relatively simple, but two factors complicate the issue: allowing for a column to be set in the grid before rows are read, and the keypress codes that are used by JavaScript. In order to accommodate the two columns, two values must be used by the emulation, each of which holds the intersections between that column and the rows. One additional factor to take into account is that the values are reversed for the keypad: a row is left at high voltage by default, and is dropped to zero voltage when it intersects a column. This is interpreted by the I/O register as the row bits being 1 for no key pressed, and 0 for a keypress.

The JavaScript keydown and keyup events can be used to find out when a key has been pressed or released; tying these into the keypad handler can be done in the following manner.

Key.js: Object interface KEY = { _rows: [0x0F, 0x0F], _column: 0, reset: function() { KEY._rows = [0x0F, 0x0F]; KEY._column = 0; }, rb: function(addr) { switch(KEY._column) { case 0x10: return KEY._rows[0]; case 0x20: return KEY._rows[1]; default: return 0; } }, wb: function(addr, val) { KEY._column = val & 0x30; }, kdown: function(e) { // Reset the appropriate bit }, kup: function(e) { // Set the appropriate bit } }; window.onkeydown = KEY.kdown; window.onkeyup = KEY.kup;

In addition to this, the MMU must be extended to handle the keypad I/O register, with an addition to the zero-page handling routines; an example of this is given below.

MMU.js: Keypad I/O interface rb: function(addr) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // Zero-page case 0xF00: if(addr >= 0xFF80) { return MMU._zram[addr & 0x7F]; } else if(addr >= 0xFF40) { // GPU (64 registers) return GPU.rb(addr); } else switch(addr & 0x3F) { case 0x00: return KEY.rb(); default: return 0; } } } }

With the keypad handler plumbed in, the remaining issue is the handling of keypresses, and the ability of the keypad code to distinguish between different keys being pressed. This can be done through the JavaScript event object; any event that runs through the browser, such as a mouse click or a keypress, will be passed to the code if it's requested, along with an object that describes the event that's just occurred. In the case of a keypress, the event object contains a character code and a "key scan" code, which both describe the key in question.

Through testing by Peter-Paul Koch, it has been determined that the character code passed by browsers to JavaScript code is unreliable, and will change depending on which browser is used. The only case on which all browsers agree is the key-scan code produced for keyup and keydown events; in any browser, pressing a given key will yield a particular value.

For the purposes of this emulator, eight keys need to be handled by the keypad code:

Scan code	Key	Mapping
13	Enter	Start
32	Space	Select
37	Left arrow	Left
38	Up arrow	Up
39	Right arrow	Right
40	Down arrow	Down
88	X	B
90	Z	A

Table 1: Key-scan codes used by jsGB

As stated above, the appropriate bits must be reset when a key is pressed, and set when the key is released. This can be implemented as follows.

Key.js: Keypress handling kdown: function(e) { switch(e.keyCode) { case 39: KEY._keys[1] &= 0xE; break; case 37: KEY._keys[1] &= 0xD; break; case 38: KEY._keys[1] &= 0xB; break; case 40: KEY._keys[1] &= 0x7; break; case 90: KEY._keys[0] &= 0xE; break; case 88: KEY._keys[0] &= 0xD; break; case 32: KEY._keys[0] &= 0xB; break; case 13: KEY._keys[0] &= 0x7; break; } }, kup: function(e) { switch(e.keyCode) { case 39: KEY._keys[1] |= 0x1; break; case 37: KEY._keys[1] |= 0x2; break; case 38: KEY._keys[1] |= 0x4; break; case 40: KEY._keys[1] |= 0x8; break; case 90: KEY._keys[0] |= 0x1; break; case 88: KEY._keys[0] |= 0x2; break; case 32: KEY._keys[0] |= 0x4; break; case 13: KEY._keys[0] |= 0x8; break; } }

Testing and next steps

Figure 1 above shows the result of these additions to the emulator, when running a basic tic-tac-toe game. In this example, the initial screen can be advanced to the credits by pressing the Start key, which is mapped to Enter by this emulator. Another press of the Start key will bring up the game screen, and the game can be played with the player as one side, and the computer as the other; pressing the GameBoy's A key (mapped to Z) will place a cross or circle on behalf of the player.

Right now, the game must be played blind, since there is no indicator of where the player places a mark. The game produces this indicator by using a sprite: a tile which can be placed by the graphics chip above the background, and moved independently. Most games produce their gameplay through use of sprites, so building them into the simulation is an important next step for this series. Next time, I'll be taking a look at the facilities provided by the GameBoy for the rendering of sprites, and how they can be implemented in JavaScript.

Imran Nazar <tf@imrannazar.com>, Sep 2010.

GameBoy Emulation in JavaScript: Integration

Sun, 05 Sep 2010 19:11:40 +0000

This is part 5 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

In part 4, the GameBoy's graphics subsystem was explored in detail, and an emulation put together. Without a set of register mappings for the GPU to be dealt with in software, the graphics subsystem cannot be used by the emulator; once these registers have been made available, the emulator is essentially ready for basic use.

With the additions detailed below to add the GPU registers, and a basic interface for the control of the emulator, the result is as follows.

Reset | Run

Figure 1: jsGB implementation with graphics

GPU registers

The graphics unit of the GameBoy has a series of registers which are mapped into memory, in the I/O space of the memory map. In order to get a working emulation with a background image, the following registers will be needed by the GPU (other registers are also available to the GPU, and will be explored in later parts of this series).

Address	Register	Status
0xFF40	LCD and GPU control	Read/write
0xFF42	Scroll-Y	Read/write
0xFF43	Scroll-X	Read/write
0xFF44	Current scan line	Read only
0xFF47	Background palette	Write only

Table 1: Basic GPU registers

The background palette register has previously been explored, and consists of four 2-bit palette entries. The scroll registers and scanline counter are full-byte values; this leaves the LCD control register, which is made up of 8 separate flags controlling the sections of the GPU.

Bit	Function	When 0	When 1
0	Background: on/off	Off	On
1	Sprites: on/off	Off	On
2	Sprites: size (pixels)	8x8	8x16
3	Background: tile map	#0	#1
4	Background: tile set	#0	#1
5	Window: on/off	Off	On
6	Window: tile map	#0	#1
7	Display: on/off	Off	On

Table 2: GPU control register

In the above table, the additional features of the GPU appear: a "window" layer which can appear above the background, and sprite objects which can be moved against the background and window. These additional features will be covered as the need for them arises; in the meantime, the background flags are most important for basic rendering functions. In particular, it can be seen here how the background tile map and tile set can be changed, simply by flipping bits in the register 0xFF40.

Implementation: GPU registers

Armed with the conceptual GPU register layout, an emulation can be implemented simply by adding handlers for these addresses to the MMU. This can either be done by hard-coding the GPU updates into the MMU, or defining a range of registers wherein the GPU will be called from the MMU, for more specialised handling to be done from there. In the interests of modularity, the latter approach has been taken here.

MMU.js: Zero-page I/O: GPU rb: function(addr) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // Zero-page case 0xF00: if(addr >= 0xFF80) { return MMU._zram[addr & 0x7F]; } else { // I/O control handling switch(addr & 0x00F0) { // GPU (64 registers) case 0x40: case 0x50: case 0x60: case 0x70: return GPU.rb(addr); } return 0; } } } }, wb: function(addr, val) { switch(addr & 0xF000) { ... case 0xF000: switch(addr & 0x0F00) { ... // Zero-page case 0xF00: if(addr >= 0xFF80) { MMU._zram[addr & 0x7F] = val; } else { // I/O switch(addr & 0x00F0) { // GPU case 0x40: case 0x50: case 0x60: case 0x70: GPU.wb(addr, val); break; } } break; } break; } } GPU.js: Register handling rb: function(addr) { switch(addr) { // LCD Control case 0xFF40: return (GPU._switchbg ? 0x01 : 0x00) | (GPU._bgmap ? 0x08 : 0x00) | (GPU._bgtile ? 0x10 : 0x00) | (GPU._switchlcd ? 0x80 : 0x00); // Scroll Y case 0xFF42: return GPU._scy; // Scroll X case 0xFF43: return GPU._scx; // Current scanline case 0xFF44: return GPU._line; } }, wb: function(addr, val) { switch(addr) { // LCD Control case 0xFF40: GPU._switchbg = (val & 0x01) ? 1 : 0; GPU._bgmap = (val & 0x08) ? 1 : 0; GPU._bgtile = (val & 0x10) ? 1 : 0; GPU._switchlcd = (val & 0x80) ? 1 : 0; break; // Scroll Y case 0xFF42: GPU._scy = val; break; // Scroll X case 0xFF43: GPU._scx = val; break; // Background palette case 0xFF47: for(var i = 0; i < 4; i++) { switch((val >> (i * 2)) & 3) { case 0: GPU._pal[i] = [255,255,255,255]; break; case 1: GPU._pal[i] = [192,192,192,255]; break; case 2: GPU._pal[i] = [ 96, 96, 96,255]; break; case 3: GPU._pal[i] = [ 0, 0, 0,255]; break; } } break; } }

Running one frame

At present, the dispatch loop for the emulator's CPU runs forever, without pause. The most basic interface for an emulator allows for the simulation to be reset or paused; in order to allow for this, a known amount of time must be used as the base unit of the emulator interface. There are three possible units of time that can be used for this:

Instruction: Providing the opportunity to pause after every CPU instruction. This causes a great deal of overhead, since the dispatch function must be called for each step made by the CPU; at 4.19MHz, many steps must be made for an appreciable amount to happen.
Scanline: Pausing after the rendering of each line by the GPU. This produces less of an overhead, but the dispatcher must still be called a few thousand times a second; in addition, the emulation can be paused in a state where the canvas display doesn't correspond to the current scanline.
Frame: Allowing for the emulation to stop after a whole frame is emulated, rendered and pushed to the canvas. This provides the best compromise of timing accuracy and optimal speed, while ensuring that the emulated canvas is consistent with the GPU state.

Since a frame is made of 144 scanlines and a 10-line vertical blank, and each scanline takes 456 clock cycles to run, the length of a frame is 70224 clocks. In conjunction with an emulator-level reset function, which initialises each subsystem at the start of the emulation, the emulator itself can be run, and a rudimentary interface provided.

index.html: Emulator interface Reset | Run jsGB.js: Reset and dispatch jsGB = { reset: function() { GPU.reset(); MMU.reset(); Z80.reset(); MMU.load('test.gb'); }, frame: function() { var fclk = Z80._clock.t + 70224; do { Z80._map[MMU.rb(Z80._r.pc++)](); Z80._r.pc &= 65535; Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; GPU.step(); } while(Z80._clock.t < fclk); }, _interval: null, run: function() { if(!jsGB._interval) { jsGB._interval = setTimeout(jsGB.frame, 1); document.getElementById('run').innerHTML = 'Pause'; } else { clearInterval(jsGB._interval); jsGB._interval = null; document.getElementById('run').innerHTML = 'Run'; } } }; window.onload = function() { document.getElementById('reset').onclick = jsGB.reset; document.getElementById('run').onclick = jsGB.run; jsGB.reset(); };

Testing

Previously shown in Figure 1 is the result of bringing this code together: the emulator is capable of loading and running a graphics-based demo. In this case, the test ROM being loaded is a scrolling test written by Doug Lanford: the background displayed will scroll when one of the directional keypad buttons is pressed. In this particular case, with the keypad un-emulated, a static background is displayed.

In the next part, this piece of the jigsaw will be put in place: a keypad simulation which can provide the appropriate inputs to the emulated program. I'll also be looking at how the keypad works, and how the inputs are mapped into memory.

Imran Nazar <tf@imrannazar.com>, Sep 2010.

GameBoy Emulation in JavaScript: Graphics

Wed, 25 Aug 2010 12:21:45 +0000

This is part 4 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

Previously in this series, the shape of a GameBoy emulator was brought together, and the timings established between the CPU and graphics processor. A canvas has been initialised and is ready for graphics to be drawn by the emulated GameBoy; the GPU emulation now has structure, but is still unable to render graphics to the framebuffer. In order for the emulation to render graphics, the concepts behind GameBoy graphics must be briefly examined.

Backgrounds

Just like most consoles of the era, the GameBoy didn't have enough memory to allow for a direct framebuffer to be held in memory. Instead, a tile system is employed: a set of small bitmaps is held in memory, and a map is built using references to these bitmaps. The innate advantage to this system is that one tile can be used repeatedly through the map, simply by using its reference.

The GameBoy's tiled graphics system operates with tiles of 8x8 pixels, and 256 unique tiles can be used in a map; there are two maps of 32x32 tiles that can be held in memory, and one of them can be used for the display at a time. There is space in the GameBoy memory for 384 tiles, so half of them are shared between the maps: one map uses tile numbers from 0 to 255, and the other uses numbers between -128 and 127 for its tiles.

In video memory, the layout of the tile data and maps runs as follows.

Region	Usage
8000-87FF	Tile set #1: tiles 0-127
8800-8FFF	Tile set #1: tiles 128-255 Tile set #0: tiles -1 to -128
9000-97FF	Tile set #0: tiles 0-127
9800-9BFF	Tile map #0
9C00-9FFF	Tile map #1

Table 1: VRAM layout

When a background is defined, its map and tile data interact to produce the graphical display:

Figure 1: Background mapping

The background map is, as previously mentioned, 32x32 tiles; this comes to 256 by 256 pixels. The display of the GameBoy is 160x144 pixels, so there's scope for the background to be moved relative to the screeen. The GPU achieves this by defining a point in the background that corresponds to the top-left of the screen: by moving this point between frames, the background is made to scroll on the screen. For this reason, the definition of the top-left corner is held by two GPU registers: Scroll X and Scroll Y.

Figure 2: Background scroll registers

Palettes

The GameBoy is often described as a monochrome machine, capable of displaying only black and white. This isn't quite true: the GameBoy can also handle light and dark grey, for a total of four colours. Representing one of these four colours in the tile data takes two bits, so each tile in the tile data set is held in (8x8x2) bits, or 16 bytes.

One additional complication for the GameBoy background is that a palette is intersticed between the tile data and the final display: each of the four possible values for a tile pixel can correspond to any of the four colours. This is used mainly to allow easy colour changes for the tile set; if, for example, a set of tiles is held corresponding to the English alphabet, an inverse-video version can be built by changing the palette, instead of taking up another part of the tile set. The four palette entries are all updated at once, by changing the value of the Background Palette GPU register; the colour references used, and the structure of the register, are shown below.

Value	Pixel	Emulated colour
0	Off	`[255, 255, 255]`
1	33% on	`[192, 192, 192]`
2	66% on	`[96, 96, 96]`
3	On	`[0, 0, 0]`

Table 2: Colour reference values

Figure 3: Background palette register

Implementation: tile data

As stated above, each pixel in the tile data set is represented by two bits: these bits are read by the GPU when the tile is referenced in the map, run through the palette and pushed to screen. The hardware of the GPU is wired such that one whole row of the tile is accessible at the same time, and the pixels are cycled through by running up the bits. The only issue with this is that one row of the tile is two bytes: from this results the slightly convoluted scheme for storage of the bits, where each pixel's low bit is held in one byte, and the high bit in the other byte.

Figure 4: Tile data bitmap structure

Since JavaScript isn't ideally suited for manipulating bitmap structures quickly, the most time-efficient way of handling the tile data set is to maintain an internal data set alongside the video memory, with a more expanded view where each pixel's value has been pre-calculated. In order for this to accurately reflect the tile data set, any writes to the video RAM must trigger the function to update the GPU's internal tile data.

GPU.js: Internal tile data _tileset: [], reset: function() { // In addition to previous reset code: GPU._tileset = []; for(var i = 0; i < 384; i++) { GPU._tileset[i] = []; for(var j = 0; j < 8; j++) { GPU._tileset[i][j] = [0,0,0,0,0,0,0,0]; } } }, // Takes a value written to VRAM, and updates the // internal tile data set updatetile: function(addr, val) { // Get the "base address" for this tile row addr &= 0x1FFE; // Work out which tile and row was updated var tile = (addr >> 4) & 511; var y = (addr >> 1) & 7; var sx; for(var x = 0; x < 8; x++) { // Find bit index for this pixel sx = 1 << (7-x); // Update tile set GPU._tileset[tile][y][x] = ((GPU._vram[addr] & sx) ? 1 : 0) + ((GPU._vram[addr+1] & sx) ? 2 : 0); } } MMU.js: Tile update trigger wb: function(addr, val) { switch(addr & 0xF000) { // Only the VRAM case is shown: case 0x8000: case 0x9000: GPU._vram[addr & 0x1FFF] = val; GPU.updatetile(addr, val); break; } }

Implementation: Scan rendering

With these pieces in place, it's possible to begin rendering the GameBoy screen. Since this is being done on a line-by-line basis, the renderscan function referred to in Part 3 must, before it renders a scanline, work out where it is on the screen. This involves calculating the X and Y coordinates of the position in the background map, using the scroll registers and the current scanline counter. Once this has been determined, the scan renderer can advance through each tile in that row of the map, pulling in new tile data as it encounters each tile.

GPU.js: Scan rendering renderscan: function() { // VRAM offset for the tile map var mapoffs = GPU._bgmap ? 0x1C00 : 0x1800; // Which line of tiles to use in the map mapoffs += ((GPU._line + GPU._scy) & 255) >> 3; // Which tile to start with in the map line var lineoffs = (GPU._scx >> 3); // Which line of pixels to use in the tiles var y = (GPU._line + GPU._scy) & 7; // Where in the tileline to start var x = GPU._scx & 7; // Where to render on the canvas var canvasoffs = GPU._line * 160 * 4; // Read tile index from the background map var colour; var tile = GPU._vram[mapoffs + lineoffs]; // If the tile data set in use is #1, the // indices are signed; calculate a real tile offset if(GPU._bgtile == 1 && tile < 128) tile += 256; for(var i = 0; i < 160; i++) { // Re-map the tile pixel through the palette colour = GPU._pal[GPU._tileset[tile][y][x]]; // Plot the pixel to canvas GPU._scrn.data[canvasoffs+0] = colour[0]; GPU._scrn.data[canvasoffs+1] = colour[1]; GPU._scrn.data[canvasoffs+2] = colour[2]; GPU._scrn.data[canvasoffs+3] = colour[3]; canvasoffs += 4; // When this tile ends, read another x++; if(x == 8) { x = 0; lineoffs = (lineoffs + 1) & 31; tile = GPU._vram[mapoffs + lineoffs]; if(GPU._bgtile == 1 && tile < 128) tile += 256; } } }

Next steps: Output

With a CPU, memory handling and a graphics subsystem, the emulator is nearly capable of producing output. In part 5, I'll be looking at what's required to get the system from a disparate set of module files to a coherent whole, capable of loading and running a simple ROM file: tying the graphics registers to the MMU, and a simple interface to control the running of the emulation.

Imran Nazar <tf@imrannazar.com>, Aug 2010.

GameBoy Emulation in JavaScript: GPU Timings

Sat, 14 Aug 2010 22:14:31 +0000

This is part 3 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

The emulator described in this series is available in source form: https://github.com/Two9A/jsGB

In the previous parts of this series, a structure for a GameBoy emulator was laid out, and brought to the point where a game ROM could be loaded, and stepped through by the emulated CPU. With the emulated processor attached to a memory mapping structure, it's now possible to attach peripherals to the system. One of the primary peripherals used by the GameBoy, and by any games console, is the graphics processor (GPU): it's the primary method of output for the console, and much of the processor's work goes on generating graphics for the GPU.

Emulating the screen

Nintendo's internal name for the GameBoy is "Dot Matrix Game"; its display is a pixel LCD of dimensions 160x144. If each pixel in the LCD is treated as a pixel in a HTML5 , a direct mapping can be made to a canvas of width 160 and height 144. In order to directly address each pixel in the LCD, the contents of the canvas can be manipulated as a "framebuffer": a single block of memory containing the entirety of the canvas, as a series of 4-byte RGBA values.

index.html: Canvas tag GPU.js: Canvas initialisation GPU = { _canvas: {}, _scrn: {}, reset: function() { var c = document.getElementById('screen'); if(c && c.getContext) { GPU._canvas = c.getContext('2d'); if(GPU._canvas) { if(GPU._canvas.createImageData) GPU._scrn = GPU._canvas.createImageData(160, 144); else if(GPU._canvas.getImageData) GPU._scrn = GPU._canvas.getImageData(0,0, 160,144); else GPU._scrn = { 'width': 160, 'height': 144, 'data': new Array(160*144*4) }; // Initialise canvas to white for(var i=0; i<160*144*4; i++) GPU._scrn.data[i] = 255; GPU._canvas.putImageData(GPU._scrn, 0, 0); } } } }

Once a block of memory has been allocated for the screen data, an individual pixel's colour can be set by writing RGBA components to the four values at that pixel position in the block; the pixel position can be determined by the formula y * 160 + x.

Raster graphics

With a canvas in place to receive the graphic output of the GameBoy, the next step is to emulate the production of graphics. The original GameBoy hardware simulates a cathode-ray tube (CRT) in its timings: in a CRT, the screen is scanned in rows by an electron beam, and the scanning process returns to the top of the screen after the end of scanning.

Figure 1: Scanlines and blanking periods

As can be seen above, a CRT requires more time to draw a scanline than simply running over the pixels in question: a "horizontal blanking" period is needed, for the beam to move from the end of one line to the start of the next. Similarly, the end of each frame means a "vertical blanking" period, while the beam travels back to the top-left corner. Since the beam has to move further in vertical blanking, this time period is commonly much longer than the horizontal blanking time.

In the same way, a GameBoy display exhibits horizontal and vertical blanking periods. In addition, time spent within the scanline itself is separated into two parts: the GPU flips between accessing video memory, and accessing sprite attribute memory, while it draws the scanline. For the purpose of this emulation, these two parts are distinct, and follow each other. The following table states how long the GPU stays in each period, in terms of the CPU's T-clock which runs at 4194304 Hz.

Period	GPU mode number	Time spent (clocks)
Scanline (accessing OAM)	2	80
Scanline (accessing VRAM)	3	172
Horizontal blank	0	204
One line (scan and blank)		456
Vertical blank	1	4560 (10 lines)
Full frame (scans and vblank)		70224

Table 1: GPU frame timings

In order to maintain these timings relative to the emulated CPU, a timing update function must exist, which gets called after the execution of every instruction. This can be done from an expanded version of the CPU dispatch process, covered in part 1.

Z80.js: Dispatcher while(true) { Z80._map[MMU.rb(Z80._r.pc++)](); Z80._r.pc &= 65535; Z80._clock.m += Z80._r.m; Z80._clock.t += Z80._r.t; GPU.step(); } GPU.js: Clock step _mode: 0, _modeclock: 0, _line: 0, step: function() { GPU._modeclock += Z80._r.t; switch(GPU._mode) { // OAM read mode, scanline active case 2: if(GPU._modeclock >= 80) { // Enter scanline mode 3 GPU._modeclock = 0; GPU._mode = 3; } break; // VRAM read mode, scanline active // Treat end of mode 3 as end of scanline case 3: if(GPU._modeclock >= 172) { // Enter hblank GPU._modeclock = 0; GPU._mode = 0; // Write a scanline to the framebuffer GPU.renderscan(); } break; // Hblank // After the last hblank, push the screen data to canvas case 0: if(GPU._modeclock >= 204) { GPU._modeclock = 0; GPU._line++; if(GPU._line == 143) { // Enter vblank GPU._mode = 1; GPU._canvas.putImageData(GPU._scrn, 0, 0); } else { GPU._mode = 2; } } break; // Vblank (10 lines) case 1: if(GPU._modeclock >= 456) { GPU._modeclock = 0; GPU._line++; if(GPU._line > 153) { // Restart scanning modes GPU._mode = 2; GPU._line = 0; } } break; } }

Next time: backgrounds and palettes

In the above code, the timings for the GPU are established, but the work of the GPU isn't yet in place: renderscan is where the work happens. In the next part of this series, the concepts behind the GameBoy's background graphics system will be looked at, and code will be put inside the rendering function to emulate them.

Imran Nazar <tf@imrannazar.com>, Aug 2010.

GameBoy Emulation in JavaScript: Memory

Mon, 02 Aug 2010 12:03:45 +0000

This is part 2 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

The emulator described in this series is available in source form: https://github.com/Two9A/jsGB

In the previous part of this series, the computer was introduced as a processing unit, which fetches its instructions from memory. In almost every case, a computer's memory is not a simple contiguous region; the GameBoy is no exception in this regard. Since the GameBoy CPU can access 65,536 individual locations on its address bus, a "memory map" can be drawn of all the regions where the CPU has access.

Figure 1: Memory map of the GameBoy address bus

A more detailed look at the memory regions is as follows:

[0000-3FFF] Cartridge ROM, bank 0: The first 16,384 bytes of the cartridge program are always available at this point in the memory map. Special circumstances apply:
- [0000-00FF] BIOS: When the CPU starts up, PC starts at 0000h, which is the start of the 256-byte GameBoy BIOS code. Once the BIOS has run, it is removed from the memory map, and this area of the cartridge rom becomes addressable.
- [0100-014F] Cartridge header: This section of the cartridge contains data about its name and manufacturer, and must be written in a specific format.
[4000-7FFF] Cartridge ROM, other banks: Any subsequent 16k "banks" of the cartridge program can be made available to the CPU here, one by one; a chip on the cartridge is generally used to switch between banks, and make a particular area accessible. The smallest programs are 32k, which means that no bank-selection chip is required.
[8000-9FFF] Graphics RAM: Data required for the backgrounds and sprites used by the graphics subsystem is held here, and can be changed by the cartridge program. This region will be examined in further detail in part 3 of this series.
[A000-BFFF] Cartridge (External) RAM: There is a small amount of writeable memory available in the GameBoy; if a game is produced that requires more RAM than is available in the hardware, additional 8k chunks of RAM can be made addressable here.
[C000-DFFF] Working RAM: The GameBoy's internal 8k of RAM, which can be read from or written to by the CPU.
[E000-FDFF] Working RAM (shadow): Due to the wiring of the GameBoy hardware, an exact copy of the working RAM is available 8k higher in the memory map. This copy is available up until the last 512 bytes of the map, where other areas are brought into access.
[FE00-FE9F] Graphics: sprite information: Data about the sprites rendered by the graphics chip are held here, including the sprites' positions and attributes.
[FF00-FF7F] Memory-mapped I/O: Each of the GameBoy's subsystems (graphics, sound, etc.) has control values, to allow programs to create effects and use the hardware. These values are available to the CPU directly on the address bus, in this area.
[FF80-FFFF] Zero-page RAM: A high-speed area of 128 bytes of RAM is available at the top of memory. Oddly, though this is "page" 255 of the memory, it is referred to as page zero, since most of the interaction between the program and the GameBoy hardware occurs through use of this page of memory.

Interfacing to the CPU

In order for the emulated CPU to access these regions separately, each must be handled as a special case in the memory management unit. This part of the code was alluded to in the previous part, and a basic interface described for the MMU object; the fleshing out of the interface can be as simple as a switch statement.

MMU.js: Mapped read MMU = { // Flag indicating BIOS is mapped in // BIOS is unmapped with the first instruction above 0x00FF _inbios: 1, // Memory regions (initialised at reset time) _bios: [], _rom: [], _wram: [], _eram: [], _zram: [], // Read a byte from memory rb: function(addr) { switch(addr & 0xF000) { // BIOS (256b)/ROM0 case 0x0000: if(MMU._inbios) { if(addr < 0x0100) return MMU._bios[addr]; else if(Z80._r.pc == 0x0100) MMU._inbios = 0; } return MMU._rom[addr]; // ROM0 case 0x1000: case 0x2000: case 0x3000: return MMU._rom[addr]; // ROM1 (unbanked) (16k) case 0x4000: case 0x5000: case 0x6000: case 0x7000: return MMU._rom[addr]; // Graphics: VRAM (8k) case 0x8000: case 0x9000: return GPU._vram[addr & 0x1FFF]; // External RAM (8k) case 0xA000: case 0xB000: return MMU._eram[addr & 0x1FFF]; // Working RAM (8k) case 0xC000: case 0xD000: return MMU._wram[addr & 0x1FFF]; // Working RAM shadow case 0xE000: return MMU._wram[addr & 0x1FFF]; // Working RAM shadow, I/O, Zero-page RAM case 0xF000: switch(addr & 0x0F00) { // Working RAM shadow case 0x000: case 0x100: case 0x200: case 0x300: case 0x400: case 0x500: case 0x600: case 0x700: case 0x800: case 0x900: case 0xA00: case 0xB00: case 0xC00: case 0xD00: return MMU._wram[addr & 0x1FFF]; // Graphics: object attribute memory // OAM is 160 bytes, remaining bytes read as 0 case 0xE00: if(addr < 0xFEA0) return GPU._oam[addr & 0xFF]; else return 0; // Zero-page case 0xF00: if(addr >= 0xFF80) { return MMU._zram[addr & 0x7F]; } else { // I/O control handling // Currently unhandled return 0; } } } }, Read a 16-bit word rw: function(addr) { return MMU.rb(addr) + (MMU.rb(addr+1) << 8); } };

In the above section of code, it should be noted that the region of memory between 0xFF00 and 0xFF7F is unhandled; these locations are used as memory-mapped I/O for the various chips that provide I/O, and will be defined as these systems are covered in later parts.

Writing a byte is handled in a very similar manner; each operation is reversed, and values are written to the various regions of memory instead of returned from the function. For this reason, it is not necessary to provide a full extrapolation of the wb function here.

Loading a ROM

Just as a CPU emulation is useless without its supporting elements of memory access, graphics and so on, being able to read a program from memory is useless without a program loaded. There are two main ways to pull a program into an emulator: hard-code it into the emulator's source code, or allow for loading of a ROM file from a certain location. The obvious disadvantage of hard-coding the program is that it's fixed, and cannot easily be changed.

In the case of this JavaScript emulator, the GameBoy BIOS is hard-coded into the MMU, because it isn't liable to change; the program file is, however, loaded from the server asynchronously, after the emulator has initialised. This can be done through XMLHTTP, using a binary file reader such as Andy Na's BinFileReader; the result of this is a string containing the ROM file.

MMU.js: ROM file loading MMU.load = function(file) { var b = new BinFileReader(file); MMU._rom = b.readString(b.getFileSize(), 0); };

Since the ROM file is held as a string, instead of an array of numbers, the rb and wb functions must be changed to index a string:

MMU.js: ROM file indexing case 0x1000: case 0x2000: case 0x3000: return MMU._rom.charCodeAt(addr);

Next steps

With a CPU and MMU in place, it is possible to watch a program being executed, step by step: an emulation can be achieved, and produce the expected values in the right registers. What's missing is a sense of what that means for graphical output. In the next part of this series, the issue of graphics will be looked at, including how the GameBoy structures its graphic output, and how to render graphics onto the screen.

As with part 1, the source for this article is available at: http://imrannazar.com/content/files/jsgb.mmu.js.

Imran Nazar <tf@imrannazar.com>, Aug 2010.

GameBoy Emulation in JavaScript: The CPU

Thu, 22 Jul 2010 18:11:40 +0000

This is part 1 of an article series on emulation development in JavaScript; ten parts are currently available, and others are expected to follow.

Part 1: The CPU

Part 2: Memory

Part 3: GPU Timings

Part 4: Graphics

Part 5: Integration

Part 6: Input

Part 7: Sprites

Part 8: Interrupts

Part 9: Memory Banking

Part 10: Timers

The emulator described in this series is available in source form: https://github.com/Two9A/jsGB

It's often stated that JavaScript is a special-purpose language, designed for use by web sites to enable dynamic interaction. However, JavaScript is a full object-oriented programming language, and is used in arenas besides the Web: the Widgets available for recent versions of Windows and Apple's Mac OS are implemented in JavaScript, as is the GUI for the Mozilla application suite.

With the recent introduction of the tag to HTML, the question arises as to whether a JavaScript program is capable of emulating a system, much like desktop applications are available to emulate the Commodore 64, GameBoy Advance and other gaming consoles. The simplest way of checking whether this is viable is, of course, to write such an emulator in JavaScript.

This article sets out to implement the basis for a GameBoy emulation, by laying the groundwork for emulating each part of the physical machine. The starting point is the CPU.

The model

The traditional model of a computer is a processing unit, which gets told what to do by a program of instructions; the program might be accessed with its own special memory, or it might be sitting in the same area as normal memory, depending on the computer. Each instruction takes a short amount of time to run, and they're all run one by one. From the CPU's perspective, a loop starts up as soon as the computer is turned on, to fetch an instruction from memory, work out what it says, and execute it.

In order to keep track of where the CPU is within the program, a number is held by the CPU called the Program Counter (PC). After an instruction is fetched from memory, the PC is advanced by however many bytes make up the instruction.

Figure 1: The fetch-decode-execute loop

The CPU in the original GameBoy is a modified Zilog Z80, so the following things are pertinent:

The Z80 is an 8-bit chip, so all the internal workings operate on one byte at a time;
The memory interface can address up to 65,536 bytes (a 16-bit address bus);
Programs are accessed through the same address bus as normal memory;
An instruction can be anywhere between one and three bytes.

In addition to the PC, other numbers are held inside the CPU that can be used for calculation, and they're referred to as registers: A, B, C, D, E, H, and L. Each of them is one byte, so each one can hold a value from 0 to 255. Most of the instructions in the Z80 are used to handle values in these registers: loading a value from memory into a register, adding or subtracting values, and so forth.

If there are 256 possible values in the first byte of an instruction, that makes for 256 possible instructions in the basic table. That table is detailed in the Gameboy Z80 opcode map released on this site. Each of these can be simulated by a JavaScript function, that operates on an internal model of the registers, and produces effects on an internal model of the memory interface.

There are other registers in the Z80, that deal with holding status: the flags register (F), whose operation is discussed below; and the stack pointer (SP) which is used alongside the PUSH and POP instructions for basic LIFO handling of values. The basic model of the Z80 emulation would therefore require the following components:

An internal state:
- A structure for retaining the current state of the registers;
- The amount of time used to execute the last instruction;
- The amount of time that the CPU has run in total;
Functions to simulate each instruction;
A table mapping said functions onto the opcode map;
A known interface to talk to the simulated memory.

The internal state can be held as follows:

Z80.js: Internal state values Z80 = { // Time clock: The Z80 holds two types of clock (m and t) _clock: {m:0, t:0}, // Register set _r: { a:0, b:0, c:0, d:0, e:0, h:0, l:0, f:0, // 8-bit registers pc:0, sp:0, // 16-bit registers m:0, t:0 // Clock for last instr } };

The flags register (F) is important to the functioning of the processor: it automatically calculates certain bits, or flags, based on the result of the last operation. There are four flags in the Gameboy Z80:

Zero (0x80): Set if the last operation produced a result of 0;
Operation (0x40): Set if the last operation was a subtraction;
Half-carry (0x20): Set if, in the result of the last operation, the lower half of the byte overflowed past 15;
Carry (0x10): Set if the last operation produced a result over 255 (for additions) or under 0 (for subtractions).

Since the basic calculation registers are 8-bits, the carry flag allows for the software to work out what happened to a value if the result of a calculation overflowed the register. With these flag handling issues in mind, a few examples of instruction simulations are shown below. These examples are simplified, and don't calculate the half-carry flag.

Z80.js: Instruction simulations Z80 = { // Internal state _clock: {m:0, t:0}, _r: {a:0, b:0, c:0, d:0, e:0, h:0, l:0, f:0, pc:0, sp:0, m:0, t:0}, // Add E to A, leaving result in A (ADD A, E) ADDr_e: function() { Z80._r.a += Z80._r.e; // Perform addition Z80._r.f = 0; // Clear flags if(!(Z80._r.a & 255)) Z80._r.f |= 0x80; // Check for zero if(Z80._r.a > 255) Z80._r.f |= 0x10; // Check for carry Z80._r.a &= 255; // Mask to 8-bits Z80._r.m = 1; Z80._r.t = 4; // 1 M-time taken } // Compare B to A, setting flags (CP A, B) CPr_b: function() { var i = Z80._r.a; // Temp copy of A i -= Z80._r.b; // Subtract B Z80._r.f |= 0x40; // Set subtraction flag if(!(i & 255)) Z80._r.f |= 0x80; // Check for zero if(i < 0) Z80._r.f |= 0x10; // Check for underflow Z80._r.m = 1; Z80._r.t = 4; // 1 M-time taken } // No-operation (NOP) NOP: function() { Z80._r.m = 1; Z80._r.t = 4; // 1 M-time taken } };

Memory interfacing

A processor that can manipulate registers within itself is all well and good, but it must be able to put results into memory to be useful. In the same way, the above CPU emulation requires an interface to emulated memory; this can be provided by a memory management unit (MMU). Since the Gameboy itself doesn't contain a complicated MMU, the emulated unit can be quite simple.

At this point, the CPU only needs to know that an interface is present; the details of how the Gameboy maps banks of memory and hardware onto the address bus are inconsequential to the processor's operation. Four operations are required by the CPU:

MMU.js: Memory interface MMU = { rb: function(addr) { /* Read 8-bit byte from a given address */ }, rw: function(addr) { /* Read 16-bit word from a given address */ }, wb: function(addr, val) { /* Write 8-bit byte to a given address */ }, ww: function(addr, val) { /* Write 16-bit word to a given address */ } };

With these in place, the rest of the CPU instructions can be simulated. Another few examples are shown below:

Z80.js: Memory-handling instructions // Push registers B and C to the stack (PUSH BC) PUSHBC: function() { Z80._r.sp--; // Drop through the stack MMU.wb(Z80._r.sp, Z80._r.b); // Write B Z80._r.sp--; // Drop through the stack MMU.wb(Z80._r.sp, Z80._r.c); // Write C Z80._r.m = 3; Z80._r.t = 12; // 3 M-times taken }, // Pop registers H and L off the stack (POP HL) POPHL: function() { Z80._r.l = MMU.rb(Z80._r.sp); // Read L Z80._r.sp++; // Move back up the stack Z80._r.h = MMU.rb(Z80._r.sp); // Read H Z80._r.sp++; // Move back up the stack Z80._r.m = 3; Z80._r.t = 12; // 3 M-times taken } // Read a byte from absolute location into A (LD A, addr) LDAmm: function() { var addr = MMU.rw(Z80._r.pc); // Get address from instr Z80._r.pc += 2; // Advance PC Z80._r.a = MMU.rb(addr); // Read from address Z80._r.m = 4; Z80._r.t=16; // 4 M-times taken }

Dispatch and reset

With the instructions in place, the remaining pieces of the puzzle for the CPU are to reset the CPU when it starts up, and to feed instructions to the emulation routines. Having a reset routine allows for the CPU to be stopped and "rewound" to the start of execution; an example is shown below.

Z80.js: Reset reset: function() { Z80._r.a = 0; Z80._r.b = 0; Z80._r.c = 0; Z80._r.d = 0; Z80._r.e = 0; Z80._r.h = 0; Z80._r.l = 0; Z80._r.f = 0; Z80._r.sp = 0; Z80._r.pc = 0; // Start execution at 0 Z80._clock.m = 0; Z80._clock.t = 0; }

In order for the emulation to run, it has to emulate the fetch-decode-execute sequence detailed earlier. "Execute" is taken care of by the instruction emulation functions, but fetch and decode require a specialist piece of code, known as a "dispatch loop". This loop takes each instruction, decodes where it must be sent for execution, and dispatches it to the function in question.

Z80.js: Dispatcher while(true) { var op = MMU.rb(Z80._r.pc++); // Fetch instruction Z80._map[op](); // Dispatch Z80._r.pc &= 65535; // Mask PC to 16 bits Z80._clock.m += Z80._r.m; // Add time to CPU clock Z80._clock.t += Z80._r.t; } Z80._map = [ Z80._ops.NOP, Z80._ops.LDBCnn, Z80._ops.LDBCmA, Z80._ops.INCBC, Z80._ops.INCr_b, ... ];

Usage in a system emulation

Implementing a Z80 emulation core is useless without an emulator to run it. In the next part of this series, the work of emulating the Gameboy begins: I'll be looking at the Gameboy's memory map, and how a game image can be loaded into the emulator over the Web.

The complete Z80 core is available at: http://imrannazar.com/content/files/jsgb.z80.js; please feel free to let me know if you encounter any bugs in the implementation.

Imran Nazar <tf@imrannazar.com>, Jul 2010.

Memory Usage of Constants in PHP

Thu, 24 Jun 2010 21:11:30 +0000

A common requirement in the development of web applications is that the interface is portable between languages: any phrases which appear within the interface must be in the viewer's preferred language. The consequence of this is that, instead of the phrases themselves, definitions which refer to the phrases must be used throughout the website.

In PHP, one of the more common ways to implement this is to generate multiple "phrase files", each of which contains all the phrases required for a given language. Each phrase is defined as a constant; an example of such a phrase file could run as follows.

phrases.tr.php: Turkish phrase file define('_WELCOME', 'Hoşgeldiniz!'); define('_FREE_DELIVERY', 'Ücretsiz dağıtma'); define('_BRANDS', 'Markalar'); define('_COMPUTERS', 'Bilgisayarlar');

The object-oriented approach yields another method of holding these translation phrases, as class constants within a phrases class. This alternative approach would be performed as below.

phrases.tr.php: Turkish phrase file class Phrases { const WELCOME = 'Hoşgeldiniz!'; const FREE_DELIVERY = 'Ücretsiz dağıtma'; const BRANDS = 'Markalar'; const COMPUTERS = 'Bilgisayarlar'; }

Since there are probably a large number of these phrases, it makes sense to seek the most memory-efficient of these two constructs: in other words, the one which takes the least amount of memory to be held. It might be expected that both methods of defining the phrase file would cause the same amount of memory to be taken up; this is not the case.

(A note: the analysis presented below is based on the definitions held in PHP 5.2.10; the values produced may differ in future versions.)

Storage of a global constant

The first of the two methods above defines a series of constants in the global scope; internally to PHP, these were initially stored as an array of struct zend_constant. The layout of this data structure, and the memory used, are derived from the basic zval structure used by PHP to hold values.

Figure 1: Original storage scheme for global constants

This data structure is memory-efficient, being able to store the name and value of the constant, as well as the other data required by PHP for a value, such as the reference count. The problem with defining zend_constants as an array is that the time taken to find a particular constant grows linearly with the number of constants that are defined; since the PHP interpreter itself sets constants such as PHP_VERSION, there will always be a disadvantage for user code in terms of speed.

To resolve this, a HashTable data structure was introduced with PHP 3, to allow for logarithmic-time searching of the constants. The HashTable uses Buckets to store its data entries, each of which has a name attached.

Figure 1a: Hashtable storage scheme for global constants

The trade-off in using a hash structure like this, is that more memory is taken for storage of the hashes and key lengths. In total, storage of a global constant takes 42 bytes before the strings for name and value are counted.

Storage of a class constant

With the introduction of objects in PHP 4, a new line of thinking was employed for class-level constants: if a HashTable is used, and a hash of the constant name is stored, the name itself doesn't need to be stored alongside it. This allows a good chunk of space to be saved, and the structure of the storage to be simplified somewhat.

Figure 2: Storage scheme for class constants

As can be seen here, the name of the constant isn't stored at all once its hash has been calculated. This means that the zend_constant structure can be eliminated, leaving only the zval. As a result, a class constant needs 26 bytes to be stored, before the value is counted.

Conclusions

Two conclusions can be drawn from this analysis:

If you're putting together a script that demands a long series of constants, such as a translation table, it may be advisable to use a class to hold the constants instead of defining them in the global scope;
If Derick Rethans informs you that class constants take up less memory than global ones, you can take him at his word.

Imran Nazar <tf@imrannazar.com>, June 2010.

Venn Diagrams in PHP with imagick

Thu, 20 May 2010 08:26:00 +0000

One common problem in data visualisation is the representation of two sets of data, which have a common subset of elements: a percentage of their contents which are present in both. The obvious solution to this issue is to draw a Venn diagram: two overlapping circles, where the overlap represents the percentage of common elements.

Figure 1: A sample Venn diagram, with data sets of green and blue, and common elements in cyan

The main issue in drawing a Venn diagram is, given the percentage of overlap, determining the placement of the circles such that it visually matches the stated overlap. Once the circle dimensions and placements have been worked out, the image manipulation is relatively straightforward.

In this article, I'll be using PHP to demonstrate the implementation, and the imagick interface to ImageMagick in order to draw and output the image.

Geometry: Overlapping circles

Mathematically, two overlapping circles will cross each other at two points: a line between these two points is a chord of the circles, and the area contained within each chord segment by this line is 50% of the total overlap.

Figure 2: Circle segment dimensions

In order to correctly place the circles, it's important to find out what x and h are in the above diagram; knowing these values will allow for easy calculation of the horizontal positions. By using the standard formulae for area and angle of a circle segment, the following equation can be obtained.

Formula 1: Length of the sagitta

By solving this equation, we can get the length of the sagitta, x. The problem presented by this equation, however, is that it cannot be solved analytically by working with the equation terms. A numerical approach will need to be used, to find a solution.

The Newton-Raphson iterative method

One of the most common numerical algorithms for solving an equation is the Newton-Raphson method, also known as Newton's method. It uses the gradient of the function at a particular point, to guess the next point. By picking a good starting point, it's possible to quickly narrow down a solution to the function (the point at which it crosses the x-axis).

Figure 3: An example of the Newton-Raphson method

As can be seen in the above figure, the algorithm follows the gradient line down to the x-axis, and uses the crossing point there as its next guess for the solution. Taking another gradient from the function at that point, the algorithm homes in on the solution within (in the above case) 4 or 5 iterations. When used on a formula, the gradient is represented by the differential of the formula in question; for the sagitta length formula, the differential is:

Formula 2: Differential of the sagitta length

With both formulae to hand (the function itself and the differential), the iteration process is a simple calculation:

Formula 3: The Newton-Raphson algorithm

This calculation can be repeated until the answer is close to the expected solution: in other words, when successive iterations don't result in a significant change to the answer. The definition of "significant change" depends on the problem: in this case, I'll be using "the same to four decimal places".

As an example, suppose that the Venn diagram in Figure 1 is being generated: a diagram showing 20% overlap, where each circle has a radius of 150 pixels. Plugging these values into the Newton-Raphson solver shows the following values for each iteration.

Iteration results for an example diagram 0.0000 94.2478 102.7742 103.0570 103.0573 103.0573

As can be seen, the solver quickly converges on the answer for the length x. From here, h can be calculated as the difference between x and the radius, and the angle θ as:

Formula 4: Angle for circle segment

Implementing the solver

In PHP, the solver can be implemented by defining the sagitta formula and its differential as two functions, and using a recursive function to run through their values. The following implementation contains a "safety valve" for the solver, for the general case where the equations may cause a divergence if the solver starts at x=0. In the case of this equation, the safety valve is unnecessary, since the algorithm will always converge if it starts at 0; it is included below for completeness.

Iterative solver implementation class Sagitta { // The sagitta length formula static function f($x, $r, $P) { return acos($x/$r)-(M_PI*$P)-(($x/($r*$r))*sqrt(($r+$x)*($r-$x))); } // Differential of the length static function fp($x, $r, $P) { $s = sqrt(1-(($x*$x)/($r*$r))); return (((($x*$x)/($r*$r*$r)) - (1/$r)) / $s) - ($s/$r); } // Recursive solver // Built-in safety valve at 10 levels down static function solve($x, $r, $P, $level=10, $precision=0.0001) { $xn = $x - (self::f($x,$r,$P) / self::fp($x,$r,$P)); if($level && (abs($xn - $xn) > $precision)) return self::solve($xn, $r, $P, $level-1); else return $xn; } } $radius = 150; $overlap = 0.2; // Each circle contains half of the overlap; use this to calculate x $x0 = 0; $x = Sagitta::solve($x0, $radius, $overlap/2);

Drawing the Venn diagram

Using PHP and imagick, the Venn diagram can be drawn quickly and efficiently based on the value for x obtained above. There are, however, a few issues that must be resolved:

Circle placement: In order to plot a circle in imagick, its centre coordinate must be given, and this must be calculated horizontally for both circles. For the left-hand circle, this is simply one radius in from the left of the image. The right-hand circle would be three radii from the left edge, if there were no overlap; from Figure 2, it can be seen that there is 2h of overlap, so this must be subtracted from the horizontal coordinate of the right-hand circle.
Intersection: The two circles can be drawn easily, but the circle segments representing the intersection could cause difficulty. Fortunately, imagick provides a construct for drawing an ellipse segment: given two angles, it will plot the arc and chord between them, and fill the space in the "fill colour" defined beforehand.
Image size: As with the circle placement, the horizontal dimension of the image is lower than may be expected. Without overlap, the image would be as wide as both circle diameters put together; the overlap of 2h must again be subtracted from this if the image is not to be too wide.

Having taken these issues into account, the following code will generate the Venn diagram given x.

Image rendering implementation $h = $radius - $x; $theta = acos($x/$r) * (180 / M_PI); // 5 pixels of padding around the Venn $padding = 5; $overlap_width = 2*$h; $im = new Imagick(); $im->newImage($r*4 - $overlap_width + ($padding*2), $r*2 + ($padding*2), new ImagickPixel('white')); $draw = new ImagickDraw(); // Left-hand circle, in green $draw->setFillColor(new ImagickPixel('#88ff88')); $draw->ellipse($r + $padding, $r + $padding, $r, $r, 0, 360); // Right-hand circle, in blue $draw->setFillColor(new ImagickPixel('#8888ff')); $draw->ellipse($r*3 - $overlap_width + $padding, $r + $padding, $r, $r, 0, 360); // Intersection, in cyan // Angles are specified in degrees, from the rightmost point of the circle $draw->setFillColor(new ImagickPixel('#88ffff')); // Left-hand segment (right half of intersection) // -theta is in the top right, +theta in the bottom right $draw->ellipse($r + $padding, $r + $padding, $r, $r, -$theta, $theta); // Right-hand segment (left half of intersection) // 180-theta is in the bottom left, 180+theta in the top left $draw->ellipse($r*3 - $overlap_width + $padding, $r + $padding, $r, $r, 180-$theta, 180+$theta); // Image bounding rectangle $draw->setStrokeColor(new ImagickPixel('black')); $draw->setStrokeWidth(1); $draw->setFillOpacity(0); $draw->rectangle(0, 0, $r*4 - $overlap_width + ($padding*2) - 1, $r*2 + $padding - 1); // Output image $im->drawImage($draw); $im->setImageFormat('png'); header('Content-type: image/png'); echo $im;

The above code results in Figure 1.

Issues and enhancements

One problem that remains with this implementation is the range of overlap percentages. If an overlap of less than 0% is given (if, in other words, the sets don't overlap), the equations above result in complex roots and PHP crashes while attempting to calculate them. Similarly, if the overlap is specified as more than 100%, this should reverse the positions of the sets in the Venn diagram; instead, the equations will produce a small section of one circle which is rendered as all intersection. A simple range check on the overlap percentage can alleviate these issues, and prevent them from being passed through to the script.

Another limitation is the inherent tie of two sets that is implied by this script; it is not possible to specify an overlap between three sets using this model. The geometry to allow for three intersecting circles to be specified is left as an exercise for the reader.

Imran Nazar <tf@imrannazar.com>, May 2010.

Sci-fi Shorts: Light

Sun, 02 May 2010 17:39:32 +0000

Out here, you can draw a square a hundred miles on a side, and find no-one living inside. The whole area had been evacuated when the nuclear pile under Cheyenne Mountain had grown to a critical size, and the meltdown irradiated the whole county. All the towns stood empty, cars had been left abandoned: nothing moved.

Apart from a motorcycle on I-70. The road is rutted, but she makes her way around the holes, weaving across the lanes towards Denver. She'd found out about Cheyenne, and had cut across fields to avoid the cordon; there was something under the mountain, and she was here to find out what.

She rolls up to the tunnel through the mountain, filled in by the collapse after the meltdown. The infill has been smoothed over, and a porthole window fitted at around waist height. She peers in, expecting to see only rock and stone, and sees —

Light. Spirals of light against a perfectly dark background, as if there were galaxies of stars through that porthole. A whole universe, underneath the mountain.

They say the universes in this ring start in the same place, and grow outward. They don't say where that place is.

Sci-fi Shorts: New Science

Mon, 15 Mar 2010 16:28:20 +0000

When approbation was finally given for the project, Ryan was immediately ready to put together the prototype. Just two weeks later, the team met in the portico of University College, to witness the results.

Ryan explained that, for testing purposes, the subject would be a starfish from the Bond Street aquarium. He wheeled the laser in, and fired; the starfish promptly vanished.

"I thought you were demonstrating levitation to six feet, Dr Ryan," said one of his assistants, as they gathered over a star-shaped hole in the pavement.

"The laser must be upside down; give me a minute," Ryan answered.

Discordian Dates in Java

Mon, 08 Mar 2010 18:03:28 +0000

The representation of dates is something that goes back many thousands of years: each tradition and religion has its own way of representing and calculating the calendar, and there is seldom an easy way to move between the calendars. Calculating a date in one calendar, given the same date in another calendar, can involve a laborious set of operations based on the phase of the moon, the orbital inclination and other such things.

The Discordian calendar

Discordianism uses a calendar inspired by the well-established Gregorian calendar, with prominence given to the number five. As opposed to twelve months, there are five seasons with a regular number of days in each season:

Season	Days
Chaos	73
Discord	73
Confusion	73
Bureaucracy	73
The Aftermath	73

Table 1: Seasons of the Discordian calendar

This results in a year of 365 days, in alignment with the Gregorian calendar; as a result, a given day in the Discordian calendar always corresponds to the same day in Gregorian.

In addition to there being five seasons, each week consists of five days: Sweetmorn, Boomtime, Pungenday, Prickle-Prickle and Setting Orange. Because the calendar is aligned to Gregorian, each Discordian year consists of 73 weeks of 5 days; because of this, each day in the calendar always has both the same day name and the same date.

In the Gregorian calendar, leap days are added in 97 out of 400 years, on a 4-yearly cycle. The same process applies in Discordianism, with St. Tib's day inserted between Chaos the 59th and 60th (February 28th and March 1st).

A final detail is that the Discordian calendar begins in 1166 BCE; years are counted in step with Gregorian since that time, and marked anno discordia, or "Years of Our Lady of Discord". A few examples of dates in both calendars follow.

Example dates in Discordian and Gregorian Chaos 1st, 3000 = January 1st, 1834 Bureaucracy 70th, 3155 = October 16th, 1989 St. Tib's Day, 3178 = February 29th, 2012

Converting between the calendars

Because the Discordian calendar is very regular, conversion between Discordian and Gregorian dates is relatively simple. All that is required is to calculate the offsets for year, day and month.

Discordian date calculation, given year and day of year DYear = Year + 1166 ; Handle leap years IF Year is-a-leap-year THEN IF Day = 59 THEN DSeasonday = "St. Tib's Day" ELSE IF Day > 59 ; Days after Feb 29th need to be shifted up to make this ; year into a regular 365-day year, for calculation purposes Day = Day - 1 END IF END IF SeasonNames = ["Chaos", "Discord", "Confusion", "Bureaucracy", "The Aftermath"] DayNames = ["Sweetmorn", "Boomtime", "Pungenday", "Prickle-Prickle", "Setting Orange"] IF DSeasonday is-not-already-set THEN DSeason = SeasonNames[Day / 73] DWeekday = DayNames[Day MOD 5] DSeasonday = Day MOD 73 END IF

The above is a pseudocode sample for converting a date into Discordian, and takes account of the special case for leap years. Converting from Discordian back into Gregorian dates is similarly simple. The only complication involved is the leap-year case, where the date as reported by the Discordian calendar is one day ahead of where it would be in other years.

Gregorian day/year calculation from Discordian Year = DYear - 1166 Day = (DSeasonNum - 1) * 73 + DSeasondayNum IF Year is-a-leap-year THEN IF DSeasonday = "St. Tib's Day" THEN Day = 60 ELSE IF Day >= 60 Day = Day + 1 END IF END IF

Java implementation

Writing the above algorithms in Java is made very simple by the existence of java.util.Calendar, the date/calendar calculation class; in particular, the GregorianCalendar subclass allows for calculation of leap years in a quick and efficient manner. The following code implements conversions from one calendar to the other, providing a readable representation of the date in either case.

ddate.java: Discordian/Gregorian date conversion import java.util.Date; import java.util.Calendar; import java.util.GregorianCalendar; public class ddate { private int _year, _season, _yearDay, _seasonDay, _weekDay; private boolean _isLeap; private String[] _seasonNames = {"Chaos","Discord","Confusion","Bureaucracy","The Aftermath"}; private String[] _dayNames = {"Sweetmorn","Boomtime","Pungenday","Prickle-Prickle","Setting Orange"}; public ddate(Date d) { GregorianCalendar gc = new GregorianCalendar(); gc.setTime(d); _year = gc.get(Calendar.YEAR) + 1166; _yearDay = gc.get(Calendar.DAY_OF_YEAR); _isLeap = gc.isLeapYear(gc.get(Calendar.YEAR)); int yd = _yearDay - 1; if(_isLeap && yd > 59) yd--; _season = (yd / 73) + 1; _weekDay = (yd % 5) + 1; _seasonDay = (yd % 73) + 1; } public int getYear() { return _year; } public int getSeason() { return _season; } public int getYearDay() { return _yearDay; } public int getSeasonDay() { return _seasonDay; } public String getSeasonName() { return _seasonNames[_season-1]; } public String getDayName() { return _dayNames[_yearDay-1]; } public String toString() { if(_isLeap && _yearDay == 59) { return "St. Tib's Day, " + Integer.toString(_year); } else { return _dayNames[_weekDay-1] + ", " + _seasonNames[_season-1] + " " + Integer.toString(_seasonDay) + ", " + Integer.toString(_year); } } public Date getTime() { GregorianCalendar gc = new GregorianCalendar(); gc.set(Calendar.YEAR, _year - 1166); gc.set(Calendar.DAY_OF_YEAR, _yearDay); return gc.getTime(); } } Sample usage of ddate.java Calendar foo = new Calendar(); foo.setTime(new Date()); foo.add(Calendar.DAY, -1); ddate bar = new ddate(foo.getTime()); System.out.println("Yesterday was " + bar.toString());

Extending the conversion

The conversion process detailed above doesn't include the ten Holy Days of the Discordian calendar: a Holy Day falls on the 5th and 50th of each season. Since these Days occur with such regularity, it poses no extra difficulty to provide an interface for this, and such an interface has been left out of the above code in the interest of brevity.

Imran Nazar <tf@imrannazar.com>, Mar 2010.

linecolor.pl: Rule-based Line Colours in irssi

Mon, 01 Feb 2010 14:27:30 +0000

As any user of IRC will tell you, the ignore command is a gift from the Gods: it allows you to ignore any output being generated by a particular person or nickname, whether it's plain noise or something downright malicious. In the irssi client, you can go one further and specify ignore for any messages which match a given regular expression, from whichever nick they originate.

In some cases, you may not wish to ignore a person entirely; they may have the occasional insight into a topic, but simply act foolishly for most of the day. Alternatively, a particularly chatty bot may have some useful features, but will simply get in the way of channel discussion most of the time. In these situations, a halfway house between full participation and total ignorance is required.

Line coloring

If the channel window of a given IRC client has a standard high-contrast colour scheme (either white text on a black background, or vice versa), it's trivial to define a halfway point between text being fully visible and text being altogether hidden from view: grey text. Similarly, any lines of text that need particular attention paid to them can be highlighted in red, as an example. What is required is a way to denote lines worthy of attention, and lines eligible for half-ignore.

For the irssi client, the nickcolor extension comes close: it is able to assign a specific colour to a given nick, and highlight the rest in random colours. Unfortunately, there are a few drawbacks with nickcolor as it stands:

Only the nick is coloured, leaving the line in its normal colour; this is insufficient for our purposes.
Other nicks get coloured in, aside from the ones required; ideally, these should be left at their normal state.
As the name implies, nickcolor only works by nickname; it cannot apply regex matching on messages for the purpose of highlighting.

To address these issues, an adapted version of the nickcolor script is put forward in this article, which I've renamed as linecolor. An example of its usage would be as follows.

linecolor: Sample usage /color set Bucket 14 /color rset ^Bucket 14

The above rules specify that any output from "Bucket" is to be marked as colour #14 (grey), and any output from any nick starting with the word "Bucket" is also to be marked grey. The resulting output is shown below.

Figure 1: Colouring IRC lines for half-ignore

The code is shown below, and is also available from http://imrannazar.com/content/img/linecolor.txt.

linecolor: An irssi script for rule-based line colouring # Line Color - Assign colours to lines from specific nicks, or matching patterns # Adapted from "Nick Color" by Timo Sirainen, as modified by Ian Petersi use strict; use Irssi 20020101.0250 (); use vars qw($VERSION %IRSSI); $VERSION = "1.2"; %IRSSI = ( authors => "Timo Sirainen, Ian Petersi, Imran Nazar", contact => "tss\@iki.fi", name => "Line Color", description => "assign colours to lines through nick/regex rules", license => "Public Domain", url => "http://irssi.org/", changed => "2010-01-28T18:30+0000" ); # hm.. i should make it possible to use the existing one.. Irssi::theme_register([ 'pubmsg_hilight', '{pubmsghinick $0 $3 $1}$2' ]); my %saved_colors; my %saved_regex_colors; my %session_colors = {}; my @colors = qw/2 3 4 5 6 7 9 10 11 12 13/; sub load_colors { open COLORS, "$ENV{HOME}/.irssi/saved_colors"; while () { # I don't know why this is necessary only inside of irssi my @lines = split "\n"; foreach my $line (@lines) { my($type, $nick, $color) = split ":", $line; if ($type eq "NICK") { $saved_colors{$nick} = $color; } elsif ($type eq "REGEX") { $saved_regex_colors{$nick} = $color; } } } close COLORS; } sub save_colors { open COLORS, ">$ENV{HOME}/.irssi/saved_colors"; foreach my $nick (keys %saved_colors) { print COLORS "NICK:$nick:$saved_colors{$nick}\n"; } foreach my $regex (keys %saved_regex_colors) { print COLORS "REGEX:$regex:$saved_regex_colors{$regex}\n"; } Irssi::print("Saved colors to $ENV{HOME}/.irssi/saved_colors"); close COLORS; } # If someone we've colored (either through the saved colors, or the hash # function) changes their nick, we'd like to keep the same color associated # with them (but only in the session_colors, ie a temporary mapping). sub sig_nick { my ($server, $newnick, $nick, $address) = @_; my $color; $newnick = substr ($newnick, 1) if ($newnick =~ /^:/); if ($color = $saved_colors{$nick}) { $session_colors{$newnick} = $color; } elsif ($color = $session_colors{$nick}) { $session_colors{$newnick} = $color; } } sub find_color { my ($server, $msg, $nick, $address, $target) = @_; my $chanrec = $server->channel_find($target); return if not $chanrec; my $nickrec = $chanrec->nick_find($nick); return if not $nickrec; my $nickmode = $nickrec->{op} ? "@" : $nickrec->{voice} ? "+" : ""; # Has the user assigned this nick a color? my $color = $saved_colors{$nick}; # Have -we- already assigned this nick a color? if (!$color) { $color = $session_colors{$nick}; } # Does the message match any color regexen? if (!$color) { foreach my $r (keys %saved_regex_colors) { if ($msg =~ m/($r)/i) { $color = $saved_regex_colors{$r}; last; } } } if (!$color) { $color = 0; } return $color; } # FIXME: breaks /HILIGHT etc. sub sig_public { my ($server, $msg, $nick, $address, $target) = @_; my $color = find_color(@_); if ($color) { $color = "0".$color if ($color < 10); $server->command('/^format pubmsg {pubmsgnick $2 {pubnick '. chr(3).$color.'$0'. chr(3).'15}}'.chr(3).$color.'$1'); } else { $server->command('/^format pubmsg {pubmsgnick $2 {pubnick $0}}$1'); } } sub sig_action { my ($server, $msg, $nick, $address, $target) = @_; my $color = find_color(@_); if($color) { $server->command('/^format action_public {pubaction '. chr(3).$color.'$0'. chr(3).'15}'.chr(3).$color.'$1'); } else { $server->command('/^format action_public {pubaction $0}$1'); } } sub cmd_color { my ($data, $server, $witem) = @_; my ($op, $nick, $color) = split " ", $data; $op = lc $op; if (!$op || $op eq "help") { Irssi::print ("Supported commands: preview (list possible colors and their codes) list (show current entries in saved_colors) set (associate a color to a nick) rset (colorize messages matching a regex) clear (delete color associated to nick) rclear (delete color associated to regex) save (save colorsettings to saved_colors file)"); } elsif ($op eq "save") { save_colors; } elsif ($op eq "set") { if (!$nick) { Irssi::print ("Nick not given"); } elsif (!$color) { Irssi::print ("Color not given"); } elsif ($color < 2 || $color > 14) { Irssi::print ("Color must be between 2 and 14 inclusive"); } else { $saved_colors{$nick} = $color; } Irssi::print ("Added ".chr (3) . "$saved_colors{$nick}$nick" . chr (3) . "1 ($saved_colors{$nick})"); } elsif ($op eq "rset") { if (!$nick) { Irssi::print ("Regex not given"); } elsif (!$color) { Irssi::print ("Color not given"); } elsif ($color < 2 || $color > 14) { Irssi::print ("Color must be between 2 and 14 inclusive"); } else { $saved_regex_colors{$nick} = $color; } Irssi::print ("Added ".chr (3) . "$saved_regex_colors{$nick}$nick" . chr (3) . "1 ($saved_regex_colors{$nick})"); } elsif ($op eq "clear") { if (!$nick) { Irssi::print ("Nick not given"); } else { delete ($saved_colors{$nick}); } Irssi::print ("Cleared ".$nick); } elsif ($op eq "rclear") { if (!$nick) { Irssi::print ("Regex not given"); } else { delete ($saved_regex_colors{$nick}); } Irssi::print ("Cleared ".$nick); } elsif ($op eq "list") { Irssi::print ("\nSaved colors:"); foreach my $nick (keys %saved_colors) { Irssi::print ("Nick: ".chr (3) . "$saved_colors{$nick}$nick" . chr (3) . "1 ($saved_colors{$nick})"); } foreach my $r (keys %saved_regex_colors) { Irssi::print ("Regex: ".chr (3) . "$saved_regex_colors{$r}$r" . chr (3) . "1 ($saved_regex_colors{$r})"); } } elsif ($op eq "preview") { Irssi::print ("\nAvailable colors:"); foreach my $i (2..14) { Irssi::print (chr (3) . "$i" . "Color #$i"); } } } load_colors; Irssi::command_bind('color', 'cmd_color'); Irssi::signal_add('message public', 'sig_public'); Irssi::signal_add('message irc action', 'sig_action'); Irssi::signal_add('event nick', 'sig_nick');

Imran Nazar <tf@imrannazar.com>, Feb 2010.

Audio Captchas in PHP

Sat, 09 Jan 2010 17:43:48 +0000

One of the problems most often cited with a Captcha-based submission verification system is the lack of accessible options; a plain-text alternative can't be provided, for the simple reason that it gives an easy route to circumvent the Captcha and thus defeat the point of putting such a system in place.

Any alternative means of finding out what the Captcha image shows must be accessible for people who can't view the image, while also presenting a level of difficulty for automated and spam submissions; an option that meets both of these criteria is the audio Captcha.

The concept

The idea behind an audio Captcha is simple: in addition to providing the Captcha image on-screen, a sound file representing the image is made available. This caters for most users who would otherwise be unable to enter the Captcha text. This sound file can be a simple RIFF wave file, but is more often encoded into a speech codec or the ubiquitous MP3 format.

In this article, I'll be looking at the implementation of a simple MP3 audio Captcha, which takes a short string of a few characters and creates a sound file. I'll assume for this article that it's only made up of lowercase letters; there are no digits or uppercase letters, and no punctuation, in order to keep things at a minimal level. The audio Captcha algorithm is based on a series of sound files, each representing one letter, which can then be concatenated into a representation of the whole string.

Figure 1: The concatenation algorithm

In the ideal case, it would be simple to take the contents of each file and run them together into one large file, by writing out the contents of the files one after the other. This would be a trivial concatenation process, but will unfortunately not work. For the reason behind that, it's important to look at what makes up a RIFF wave file.

The RIFF file format

A RIFF wave file is more than a basic recording of the digitised waveform; in addition to the waveform data, metadata is attached regarding the size of the data and its origin.

Byte 1	Byte 2	Byte 3	Byte 4
Chunk header: "RIFF"
RIFF chunk size (file size-8)
Chunk header: "WAVE"
Subchunk header: "fmt "
Format chunk size
Format (1=PCM)		Channel count
Sampling rate (Hz)
Bytes per second
Block alignment value		Bits per sample
Subchunk header: "data"
Data size
File data
File data

Table 1: RIFF wave file format

The table above shows the format of the simplest RIFF wave file. The format is capable of holding information about wave files intended for MIDI samplers, cue points for mixing, and various other additions; most wave files will not contain these, and will simply be a record of the waveform data with a header attached.

As can be seen, the wave file specifies not only the length of the digitised waveform, but also its sampling rate and channel count. A telephone-level wave file can easily be distinguished from a CD-quality file, by simply checking the sampling rate; in a similar manner, stereo waveform files and monoscopic files can be differentiated. The provision of this metadata about the file is the reason for the attachment of the header, since otherwise a sound player application would have no idea of the process for playing the sound file.

Unfortunately, this means that simple concatenation of two RIFF files won't result in a longer RIFF file. A sound player will read the headers at the start of the file, which indicate the length of the first segment to be concatenated, and play that segment; at this point, a reasonable player will deduce that the end of file has been reached, since its record of played samples is the same as the number indicated in the file header, and won't play any more of the file.

The solution to this problem is to use a more complex concatenation: instead of simply throwing the files together, they will need to be run through an external sound processor.

External sound processors

The sox command is a simple interface to an audio concatenation and processing tool, which can be used for this audio Captcha. If each letter's wave file is passed into sox, a wave file can be output consisting of all the input files together, with an updated format header containing the total data size and overall sampling rates. An example invocation would run as follows:

Invocation of sox: An example concatenation sox a.wav x.wav m.wav b.wav -t .wav axmb.wav

Since each letter is contained in its own wave file, it's a trivial matter to break up the Captcha text string and build a command line for sox to use. The following example assumes that the Captcha script is written in PHP, and the text is held in the session data after generation.

Building the concatenated wave file $parts = array(); for($i = 0; $i < strlen($_SESSION['captcha']); $i++) $parts[] = $_SESSION['captcha'][$i] . '.wav'; exec(sprintf('sox %s -t .wav %s.wav', join(' ', $parts), $_SESSION['captcha']));

What this doesn't do is generate an MP3 representing the Captcha text; for that, an MP3 encoder is required. lame allows for the encoding of MP3s at various sampling rates, but will normally take its sampling information from the input file. Since, as detailed above, a wave file contains detailed information about sampling and formatting, lame is able to use this to generate an MP3 file.

The example below is a slight modification of the sox invocation above, in order to pipe the output to lame and encode an MP3 file, and then to serve the MP3 out as a downloadable file.

Building a Captcha MP3 $parts = array(); $c = $_SESSION['captcha']; for($i = 0; $i < strlen($c); $i++) $parts[] = $c[$i] . '.wav'; exec(sprintf('sox %s -t .wav - | lame - %s.mp3', join(' ', $parts), $c)); header('Content-type: audio/mpeg'); header('Content-length: '.filesize("{$c}.mp3")); header('Content-disposition: attachment; name="'.$c.'.mp3"'); passthru("{$c}.mp3");

An example of this script's usage in a Captcha would be as follows.

Figure 2: Example Captcha with audio download option

Possible enhancements

In the above example, clearly voiced phrases have been used for the constituent letters of the audio Captcha. This provides a good level of accessibility, but compromises the security of the audio Captcha: any automatic circumventions will easily be able to work out the letters that make up the audio file. One solution to this is to overlay a level of noise on the audio file, to provide some level of obfuscation to the output; in addition to this, periods of silence can be inserted between the letter waveforms, making the output less regular.

Another enhancement that can be made to the audio Captcha output is to provide more formats for the file. At present, the audio Captcha is generated in RIFF wave and MP3 formats; provision for Windows audio and Ogg formats would allow for more widespread usage of the output file.

Imran Nazar <tf@imrannazar.com>, Jan 2010.

Binary-Coded Decimal Addition on Atmel AVR

Sat, 19 Dec 2009 18:04:45 +0000

UPDATE: Nov 2021

Petter Källström emailed me, saying that there are a couple of weirdnesses and inefficiencies in the implementation given at the bottom of this article, and provided alternative code as follows:

Petter's more better implementation -- input = r18, C and H flag -- output = r18 and C flag DAA: push r19 in r19, SREG ; Let r19 contain the SREG cpi r18, $9A ; Set C flag in r19 if r18 >= 9A, and H flag in SREG if lower nibble is < 10 brlo DAA_endif sbr r19, (1<sbrs r19, SREG_H ; If input H flag was set, then skip the H-flag test brhs DAA_hi ; If H indicate lower nibble is < 10, then jump over... subi r18, -$06 ; adjust (adjust if r19.H-flag set, or if lower nibble >= 10) DAA_hi: sbrc r19, SREG_C ; If output C=1 subi r18, -$60 ; ...adjust out SREG, r19 pop r19 ret

Back to our previously scheduled article:

The Atmel AVR series of microcontrollers can be used in a wide variety of applications, from radios right through to inkjet printers, but a popular application in hobbyist projects is for digital clocks and counters. There are two main portions to any clock or counter program: a piece of code to increment the internal counter, and another piece of code to format and output the counter on a display.

A problem arises where these two portions of code need to interact. If one of these tasks is made less taxing for the microcontroller, the other is made more complicated; as a result, there are two prevailing schools of thought on how to achieve this interaction.

Easier to calculate: A simple binary number can be held by the controller, which is very easy to increment. This forces the display code to iteratively divide the number down by ten, in order to extract the digits for display.
Easier to display: A packed binary-coded decimal (BCD) number can be used instead to hold the counter; this simplifies the display logic, but incrementing the number requires an adjustment process to be run in order to correctly align the BCD segments.

This article will examine the implications of choosing the second method: the use of a BCD number to hold the counter.

Packed BCD digits

The concept behind BCD is a simple one: instead of using a byte to represent any value between 0 and 255, a byte is used to represent the decimal digits only: 0 to 9. This allows for each segment of a multiple-digit display to be tied directly to a byte in the number to show, which greatly simplifies the logic behind showing the number.

Figure 1: Byte-wise (full) BCD display

The disadvantage of using a full byte to represent each digit is the waste produced: over 95% of the usable range of numbers in a byte is lost, and a large number of bytes have to be stored for a number of significant size. In a microcontroller environment, where memory space is often at such a premium that one extra byte is significant, this wastage is simply untenable.

An alternative scheme is to use each nybble of a byte to store a BCD digit: in this manner, two digits can be stored inside a byte, increasing the range of values available for storage ten-fold. The code required to pull out digits for display is still very straightforward, since simple boolean operations will yield the required result.

Figure 2: Packed BCD display

A packed BCD number can be held in half the space of the equivalent full-BCD value, and is a viable compromise between the full range of binary numbers and the ease of display of full-BCD. In addition to this, packed BCD (hereafter referred to as simply "BCD") can be trivially conceptualised, through conversion to hexadecimal: as an example, the BCD value 0x93 represents decimal 93.

BCD addition: the problem

Using BCD to display a decimal number may simplify the display logic a great deal compared to the alternative, but a problem arises when calculations need to be done on the numbers. A microcontroller, much like any other computer of the modern age, is a binary machine with a binary arithmetic unit: it has no understanding of BCD, and will dutifully treat each number coming into it as a plain binary number.

Example additions of BCD numbers 0x15 + 0x03 = 0x18 0x72 + 0x07 = 0x79 0x38 + 0x02 = 0x3A

It is in additions that cause a carry between digits that the problem appears. In the above example, the BCD numbers 0x38 and 0x02 should add to 0x40, but the addition has operated instead on the plain numbers and produced the wrong answer. What is required is a method of adjusting the value after addition, to account for the fact that the values being operated on are BCD.

The Intel IA-32 series of microprocessors contains such a method as part of the base instruction set: Decimal Adjust after Addition (DAA). If this instruction is run after an addition, the result stored in the accumulator will be adjusted.

DAA usage on Intel x86 mov al, 38h add al, 03h ; At this point, al = 0x3B daa ; al = 0x41

The Atmel AVR doesn't contain such a convenient instruction as DAA, but the algorithm behind the DAA instruction is documented as part of the Intel IA-32 Reference manual, and is simple both to understand and to re-implement.

The decimal adjustment algorithm

DAA will adjust a BCD value that has had a carry occur between digits. There are two situations where this applies:

BCD carry: This occurs in a situation much like the one detailed above, where a result is too large to fit into a BCD digit but is still large enough for a binary nybble. Checking for this is simple: if the nybble has a value over 9, BCD carry has occurred.
Binary carry: This will happen if a BCD digit addition result is not just larger than a BCD digit, but larger than 15: the nybble containing the BCD digit will itself carry, and end up with a value lower than 9. Results of this type would not be caught by the check for values over 9.

Most processor architectures maintain a status flag denoting when a byte has carried past its maximum value; many architectures also maintain a half-carry flag, that is set when the lower nybble of a byte carries into the upper nybble. The half-carry flag will be set by a BCD addition that causes a binary carry in the lower digit, so checking for this will satisfy the other half of the DAA check.

If the DAA check finds a digit that needs adjusting, the fix is simple: a further addition onto the nybble in question.

Adjustment of a nybble 0x08 + 0x03 = 0x0B ; Should be 0x11 0x09 + 0x05 = 0x0E ; Should be 0x14 0x09 + 0x08 = 0x11 ; Should be 0x17

In every case, the value is six away from where it should be, so the adjustment adds six to bring the value back into BCD. Applying this process to both nybbles yields the final DAA algorithm.

2-digit BCD decimal adjustment after addition OLD_value = Value OLD_carry = Carry from addition # Check lower nybble IF (Half-carry set by addition) OR (Lower nybble of Value > 9) ADD 6 to Value FI # Check upper nybble # Upper nybble will be over 9 if original Value was over 0x99 IF(OLD_carry) OR (OLD_value > 0x99) ADD 0x60 to Value Carry = 1 # BCD value carry occurred ELSE Carry = 0 FI

The DAA algoithm sets the carry flag based on whether the upper nybble overflowed; this allows DAA to be used on BCD values across multiple bytes, by employing addition-with-carry on any higher denominations.

Implementing DAA on AVR

Translating DAA from the algorithm detailed above results in the following AVR code.

AVR implementation of DAA ; Parameters: R16 = value to adjust ; Returns: R16 = Adjusted value ; Carry flag set if adjustment caused BCD carry DAA: push r16 push r17 push r18 push r19 push r16 mov r17, r16 mov r18, r16 in r19, SREG andi r19, (1<<SREG_C) clc brhs DAA_adjlo andi r17, 0x0F cpi r17, 10 brlo DAA_hi DAA_adjlo: ldi r17, 6 add r16, r17 DAA_hi: tst r19 brne DAA_adjhi pop r17 cpi r17, 0x9A brlo DAA_nadjhi DAA_adjhi: ldi r17, 0x60 add r16, r17 sec rjmp DAA_end DAA_nadjhi: clc DAA_end: pop r19 pop r18 pop r17 pop r16 ret

Usage of the DAA routine for a two-byte BCD value stored in SRAM, would work like this:

Calling DAA across two BCD bytes ; Add BCD 57 to the value stored at SRAM:0x100 ldi xl, 0x00 ldi xh, 0x01 ; Read in low byte, add 57 BCD, and store ld r16, x ldi r17, 0x57 add r16, r17 call DAA st x+, r16 ; Read in high byte, add carry from low byte, and store ld r16, x clr r17 adc r16, r17 call DAA st x, r16

Other decimal adjustment routines

The Intel IA-32 instruction set also contains a routine for decimal adjustment after subtraction, which operates in a slightly different manner to that detailed above. Development of such a routine for AVR is beyond the scope of this article, but can be done in a similar vein to DAA by pulling the algorithm from the IA-32 Reference manual.

If this routine proves useful, or if you come across any bugs with its operation, please feel free to let me know.

Imran Nazar <tf@imrannazar.com>, Dec 2009. Code released into the public domain.

Gameboy Z80 Opcode Map

Mon, 16 Nov 2009 20:56:50 +0000

The following is a full opcode map of instructions for the Nintendo-spec modified Z80 included in the GameBoy CPU.

Updated: [2010-07-28] Details added for SWAP, [2010-07-29] XOR n reinstated

	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	NOP	LD BC,nn	LD (BC),A	INC BC	INC B	DEC B	LD B,n	RLC A	LD (nn),SP	ADD HL,BC	LD A,(BC)	DEC BC	INC C	DEC C	LD C,n	RRC A
1x	STOP	LD DE,nn	LD (DE),A	INC DE	INC D	DEC D	LD D,n	RL A	JR n	ADD HL,DE	LD A,(DE)	DEC DE	INC E	DEC E	LD E,n	RR A
2x	JR NZ,n	LD HL,nn	LDI (HL),A	INC HL	INC H	DEC H	LD H,n	DAA	JR Z,n	ADD HL,HL	LDI A,(HL)	DEC HL	INC L	DEC L	LD L,n	CPL
3x	JR NC,n	LD SP,nn	LDD (HL),A	INC SP	INC (HL)	DEC (HL)	LD (HL),n	SCF	JR C,n	ADD HL,SP	LDD A,(HL)	DEC SP	INC A	DEC A	LD A,n	CCF
4x	LD B,B	LD B,C	LD B,D	LD B,E	LD B,H	LD B,L	LD B,(HL)	LD B,A	LD C,B	LD C,C	LD C,D	LD C,E	LD C,H	LD C,L	LD C,(HL)	LD C,A
5x	LD D,B	LD D,C	LD D,D	LD D,E	LD D,H	LD D,L	LD D,(HL)	LD D,A	LD E,B	LD E,C	LD E,D	LD E,E	LD E,H	LD E,L	LD E,(HL)	LD E,A
6x	LD H,B	LD H,C	LD H,D	LD H,E	LD H,H	LD H,L	LD H,(HL)	LD H,A	LD L,B	LD L,C	LD L,D	LD L,E	LD L,H	LD L,L	LD L,(HL)	LD L,A
7x	LD (HL),B	LD (HL),C	LD (HL),D	LD (HL),E	LD (HL),H	LD (HL),L	HALT	LD (HL),A	LD A,B	LD A,C	LD A,D	LD A,E	LD A,H	LD A,L	LD A,(HL)	LD A,A
8x	ADD A,B	ADD A,C	ADD A,D	ADD A,E	ADD A,H	ADD A,L	ADD A,(HL)	ADD A,A	ADC A,B	ADC A,C	ADC A,D	ADC A,E	ADC A,H	ADC A,L	ADC A,(HL)	ADC A,A
9x	SUB A,B	SUB A,C	SUB A,D	SUB A,E	SUB A,H	SUB A,L	SUB A,(HL)	SUB A,A	SBC A,B	SBC A,C	SBC A,D	SBC A,E	SBC A,H	SBC A,L	SBC A,(HL)	SBC A,A
Ax	AND B	AND C	AND D	AND E	AND H	AND L	AND (HL)	AND A	XOR B	XOR C	XOR D	XOR E	XOR H	XOR L	XOR (HL)	XOR A
Bx	OR B	OR C	OR D	OR E	OR H	OR L	OR (HL)	OR A	CP B	CP C	CP D	CP E	CP H	CP L	CP (HL)	CP A
Cx	RET NZ	POP BC	JP NZ,nn	JP nn	CALL NZ,nn	PUSH BC	ADD A,n	RST 0	RET Z	RET	JP Z,nn	Ext ops	CALL Z,nn	CALL nn	ADC A,n	RST 8
Dx	RET NC	POP DE	JP NC,nn	XX	CALL NC,nn	PUSH DE	SUB A,n	RST 10	RET C	RETI	JP C,nn	XX	CALL C,nn	XX	SBC A,n	RST 18
Ex	LDH (n),A	POP HL	LDH (C),A	XX	XX	PUSH HL	AND n	RST 20	ADD SP,d	JP (HL)	LD (nn),A	XX	XX	XX	XOR n	RST 28
Fx	LDH A,(n)	POP AF	XX	DI	XX	PUSH AF	OR n	RST 30	LDHL SP,d	LD SP,HL	LD A,(nn)	EI	XX	XX	CP n	RST 38

Table 1: Base opcode map

	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	RLC B	RLC C	RLC D	RLC E	RLC H	RLC L	RLC (HL)	RLC A	RRC B	RRC C	RRC D	RRC E	RRC H	RRC L	RRC (HL)	RRC A
1x	RL B	RL C	RL D	RL E	RL H	RL L	RL (HL)	RL A	RR B	RR C	RR D	RR E	RR H	RR L	RR (HL)	RR A
2x	SLA B	SLA C	SLA D	SLA E	SLA H	SLA L	SLA (HL)	SLA A	SRA B	SRA C	SRA D	SRA E	SRA H	SRA L	SRA (HL)	SRA A
3x	SWAP B	SWAP C	SWAP D	SWAP E	SWAP H	SWAP L	SWAP (HL)	SWAP A	SRL B	SRL C	SRL D	SRL E	SRL H	SRL L	SRL (HL)	SRL A
4x	BIT 0,B	BIT 0,C	BIT 0,D	BIT 0,E	BIT 0,H	BIT 0,L	BIT 0,(HL)	BIT 0,A	BIT 1,B	BIT 1,C	BIT 1,D	BIT 1,E	BIT 1,H	BIT 1,L	BIT 1,(HL)	BIT 1,A
5x	BIT 2,B	BIT 2,C	BIT 2,D	BIT 2,E	BIT 2,H	BIT 2,L	BIT 2,(HL)	BIT 2,A	BIT 3,B	BIT 3,C	BIT 3,D	BIT 3,E	BIT 3,H	BIT 3,L	BIT 3,(HL)	BIT 3,A
6x	BIT 4,B	BIT 4,C	BIT 4,D	BIT 4,E	BIT 4,H	BIT 4,L	BIT 4,(HL)	BIT 4,A	BIT 5,B	BIT 5,C	BIT 5,D	BIT 5,E	BIT 5,H	BIT 5,L	BIT 5,(HL)	BIT 5,A
7x	BIT 6,B	BIT 6,C	BIT 6,D	BIT 6,E	BIT 6,H	BIT 6,L	BIT 6,(HL)	BIT 6,A	BIT 7,B	BIT 7,C	BIT 7,D	BIT 7,E	BIT 7,H	BIT 7,L	BIT 7,(HL)	BIT 7,A
8x	RES 0,B	RES 0,C	RES 0,D	RES 0,E	RES 0,H	RES 0,L	RES 0,(HL)	RES 0,A	RES 1,B	RES 1,C	RES 1,D	RES 1,E	RES 1,H	RES 1,L	RES 1,(HL)	RES 1,A
9x	RES 2,B	RES 2,C	RES 2,D	RES 2,E	RES 2,H	RES 2,L	RES 2,(HL)	RES 2,A	RES 3,B	RES 3,C	RES 3,D	RES 3,E	RES 3,H	RES 3,L	RES 3,(HL)	RES 3,A
Ax	RES 4,B	RES 4,C	RES 4,D	RES 4,E	RES 4,H	RES 4,L	RES 4,(HL)	RES 4,A	RES 5,B	RES 5,C	RES 5,D	RES 5,E	RES 5,H	RES 5,L	RES 5,(HL)	RES 5,A
Bx	RES 6,B	RES 6,C	RES 6,D	RES 6,E	RES 6,H	RES 6,L	RES 6,(HL)	RES 6,A	RES 7,B	RES 7,C	RES 7,D	RES 7,E	RES 7,H	RES 7,L	RES 7,(HL)	RES 7,A
Cx	SET 0,B	SET 0,C	SET 0,D	SET 0,E	SET 0,H	SET 0,L	SET 0,(HL)	SET 0,A	SET 1,B	SET 1,C	SET 1,D	SET 1,E	SET 1,H	SET 1,L	SET 1,(HL)	SET 1,A
Dx	SET 2,B	SET 2,C	SET 2,D	SET 2,E	SET 2,H	SET 2,L	SET 2,(HL)	SET 2,A	SET 3,B	SET 3,C	SET 3,D	SET 3,E	SET 3,H	SET 3,L	SET 3,(HL)	SET 3,A
Ex	SET 4,B	SET 4,C	SET 4,D	SET 4,E	SET 4,H	SET 4,L	SET 4,(HL)	SET 4,A	SET 5,B	SET 5,C	SET 5,D	SET 5,E	SET 5,H	SET 5,L	SET 5,(HL)	SET 5,A
Fx	SET 6,B	SET 6,C	SET 6,D	SET 6,E	SET 6,H	SET 6,L	SET 6,(HL)	SET 6,A	SET 7,B	SET 7,C	SET 7,D	SET 7,E	SET 7,H	SET 7,L	SET 7,(HL)	SET 7,A

Table 2: Two-byte instruction codes (CB-prefix table)

Implementing PayPal Website Payments Pro UK

Tue, 15 Sep 2009 17:52:31 +0000

One of the most popular methods for a website to take payments for goods and services is through PayPal, the online payment service. In particular, it's possible to authenticate credit and debit cards directly through PayPal without the need for the customer to have a PayPal account; this service is referred to as Website Payments Pro.

In the UK, Website Payments Pro is implemented as a REST or SOAP API, to which a website can connect and request transactions. There are, however, some pitfalls to implementing these requests:

Inadequate documentation:: PayPal provide at least three conflicting versions of the documentation for the API, each covering a different revision of the API itself; none are marked as current relative to the others, and the API as used in production incorporates elements of each. As such, it can be difficult to produce a working API request.
The sandbox:: A "sandbox" environment is provided for testing of API requests before they are used to process live transactions. The principle is well thought-out, but the sandbox has issues: it provides a different revision of the service to the live process, and the server hosting the sandbox is often unresponsive. This makes it, at best, time-consuming to use the sandbox.
Two transaction methods:: In order to provide the ability to process credit or debit cards directly (referred to as Direct Payment), PayPal stipulate that a three-step process called Express Checkout also be available for PayPal users without a card to hand. The matter of maintaining state within an Express Checkout transaction can be troublesome, and the process must be kept distinct and separate from Direct Payments.

I fell into all of the above traps when implementing WPP, so I produced the following article for reference purposes, and to serve as a coherent source of documentation for implementatin of Website Payments Pro. Please note that throughout, the API revision used is 56.0, and REST calls will be used.

The Basics: API calls

The PayPal REST API is called by POSTing a request to a secure (HTTPS) URI, with credentials generated by the holder of a "business" PayPal account. The credentials consist of an API "user" based on the account holder's email address, a credential password, and an encoded "signature" used as an additional checksum by the API. An example set of credentials would look as follows:

API credentials (example): USER: tf_api1.imrannazar.com PWD: QF63NP99NPER3V7A SIGNATURE: AZM6n0EcNmR0AQYsCf0s1VrwkV10AlKArJ7a8X4YHG-R2oFkOwGqVrJZ VERSION: 56.0

These variables are passed, along with the transaction request parameters, in a standard POST-formatted query to the previously mentioned HTTPS URI. The API will return a POST-formatted string with the result of the transaction request, which can be split manually or with a scripting language's built-in functions. The examples in this article are written in PHP, which provides functions to build and to break up POST-format strings. The following code will produce and send a request to the PayPal API, parsing and returning the result.

PHP cURL code to call the PayPal API class PaymentPaypal { const API_URI = 'https://api-3t.paypal.com/nvp'; // PayPal API credential configuration private $config; // Data for this transaction (cart ID/contents/amount) private $data; function send($params) { $params['USER'] = $this->config['USER']; $params['PWD'] = $this->config['PWD']; $params['SIGNATURE'] = $this->config['SIGNATURE']; $params['VERSION'] = '56.0'; // Fire up a POST request to PayPal $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => self::API_URI, CURLOPT_FAILONERROR => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_POST => true, CURLOPT_POSTFIELDS => http_build_query($params) )); $result = curl_exec($c); if(!$result) { // Request failed at HTTP time; return the cURL error return array('ERROR' => curl_error($c)); } else { // Request returned; break out the response into an array curl_close($c): $r = array(); parse_str($result, $r); return $r; } } }

The config member variable for this class can be filled at construction time, and its handling is not shown here.

Error handling

As shown above, a rudimentary catch can be made if PayPal refuses to respond properly, since cURL will produce an error in such a circumstance. However, if the request succeeds but PayPal returns an error, these errors must also be accounted for. PayPal allows for this by sending error messages back with the response as appropriate.

If any error messages are provided in the response, they will be provided in three formats:

L_ERRORCODEx: An error code for referencing against PayPal's list of error codes and messages;
L_SHORTMESSAGEx: A short technical description of the error;
L_LONGMESSAGEx: A long description which can be printed direct to the user.

PayPal recommend that if a message is to be printed to the user when you trap an error, it should be the long message, since the short message may be difficult to understand. An example of such a situation is error code 10537, "Risk Control Country Filter Failure". The long message for this is currently documented as: "The transaction was refused because the country was prohibited as a result of your Country Monitor Risk Control Settings", which is eminently more understandable than the short message or the error code itself.

Since a PayPal transaction may result in more than one error, each set of error code and messages is suffixed with a number, denoted with "x" above. The numbers start at 0, which means that any failure in an API call is usually documented in the L_LONGMESSAGE0 part of the response. If L_LONGMESSAGE0 doesn't sufficiently explain the cause of the error, your code can check for any other errors that may have arisen.

Before the request: User input

As previously mentioned, PayPal allows users of Website Payments Pro to use either Express Checkout (EC) or Direct Payments (DP) to authenticate a transaction. In order to provide the choice, and to take credit card details if DP is to be used, a form must be presented to the user stating how much is to be paid, and showing the two methods.

PayPal stipulate the following requirements for this input form:

The form must state the cost to the user for this transaction (the amount to be paid).
An "Express Checkout" button must be provided, which will start the EC process.
A form to take credit card details must be provided if DP is to be used by the website, which must provide to PayPal the billing name and address as well as card details.

The form may be rendered however you wish, and the billing name and address can be taken beforehand from a previously registered account or provided directly with the card details in the DP form. An example would look as follows.

Figure 1: Payment input form

Express Checkout

The Express Checkout process runs in three stages, allowing a user to log in to PayPal and use their account balance to pay for the transaction. Once they've logged in, they can confirm their information such as a shipping address if the website requires it, and then the transaction is sent through the API to be charged against the PayPal account.

Figure 2: Express Checkout process

The first step, referred to as SetExpressCheckout in the documentation, is to generate a transaction "token" that will allow both your site and PayPal to track through the three steps: this is done by calling PayPal through the API and requesting an EC token. Once the token has been generated, the user is forwarded to PayPal so that they can log in, and PayPal will then return the user to your site.

Req/Ret	Name	Value
Request	TRXTYPE	S
Request	ACTION	S
Request	AMT	Amount of the transaction
Request	CURRENCYCODE	Currency of the transaction (GBP)
Request	RETURNURL	URL for PayPal to direct to for stage 2
Request	CANCELURL	URL for PayPal to direct to if cancelled
Return	ACK	Success or Failure
Return	TOKEN	Generated token

Table 1: API parameters for Express Checkout stage 1

If the API return comes back with an ACK value of "Failure", the return will not contain a TOKEN; this can be used to check whether the request for a token succeeded.

The URL parameters to the stage-1 request allow a website to know whether a user has proceeded to step 2, or has cancelled the transaction at the PayPal side. By passing transaction information like the ID through these URLs, the website can be informed and take the appropriate action, such as marking a transaction as "Cancelled" if the CANCELURL is triggered. The URLs shown for this purpose below link into a ficticious routing framework, but they can be modified to match your configuration.

Request code for Express Checkout stage 1 class PaymentPaypal { function express_stg1() { $params = array( 'TRXTYPE' => 'S', 'ACTION' => 'S', 'AMT' => $this->data['transaction_amount'], 'CURRENCYCODE' => 'GBP', 'RETURNURL' =>' ($this->config['SITEBASE'].'/checkout/ec2/'.$this->data['transaction_id']), 'CANCELURL' => ($this->config['SITEBASE'].'/checkout/cancel/'.$this->data['transaction_id']) ); $response = $this->send($params); if($response['ACK'] == 'Failure' || !isset($response['TOKEN'])) { // Request failed; return error return array( 'status' => 'FAIL', 'msg' => $response['L_LONGMESSAGE0'] ); } else { // Request successful; forward user to PayPal and end script header('Location: https://www.paypal.com/cgi-bin/webscr?cmd=_express-checkout&token='.$response['TOKEN']); die('FORWARD'); } } }

Once the user has logged into PayPal and returned to your site, PayPal allow for the retrieval of the token and further redirection for entering a shipping address and related details, through the step named GetExpressCheckout. This can be advantageous if your site doesn't have the ability to tie a billing and shipping address against a given user's profile, but PayPal state that this second step can be skipped if these details are already on file against a profile on your site. I've found it simpler to deal with these parts of the profile directly, than forward through PayPal again, so stage 2 of Express Checkout is simply a button to take the user to stage 3. This can be implemented, of course, as a direct link to stage 3 using the RETURNURL parameter to the stage-1 call.

Either way that this is implemented, the URL returned to will be provided with two parameters on the GET line: the original EC token generated in stage 1, and a PayerID which corresponds to the login given by the user to PayPal. The third stage of Express Checkout, DoExpressCheckout, does the actual work of firing a transaction through the API using the authenticated token provided by the PayPal user, and uses both of these as parameters to the API call.

Req/Ret	Name	Value
Request	TRXTYPE	S
Request	ACTION	D
Request	AMT	Amount of the transaction
Request	CURRENCYCODE	Currency of the transaction (GBP)
Request	TOKEN	EC token generated in stage 1
Request	PAYERID	PayerID returned to stage 2
Request	PAYMENTACTION	Sale or Authorization
Return	RESULT	0 if successful, positive for comms error, negative for declined
Return	RESPMSG	Short description of EC result

Table 2: API parameters for Express Checkout stage 3

The only new parameter here is PAYMENTACTION; this allows for a PayPal account to be checked for authorisation without proceeding with a full sale, and can be useful for testing purposes as well as advanced purposes such as recurring billing and invoicing. Such features of the Express Checkout are beyond the scope of this article, but PayPal provide a PDF describing EC integration which goes into some detail about these. (Note, however, that this documentation is outdated in its description of the basic sending of API requests; the method described in this article is more current.) For the moment, it's sufficient to set this to Sale and request a full transaction every time.

In addition to checking the standard error codes for a PayPal response, it's prudent to check the RESULT of the stage-3 call, to ensure that it comes back as a successful transaction (zero). If another value is set, the RESPMSG will describe what happened to the transaction, such as "Declined".

Request code for Express Checkout stage 3 class PaymentPaypal { function express_stg3($token, $payerid) { $params = array( 'TRXTYPE' => 'S', 'ACTION' => 'D', 'AMT' => $this->data['transaction_amount'], 'CURRENCYCODE' => 'GBP', 'TOKEN' => $token, 'PAYERID' => $payerid, 'PAYMENTACTION' => 'Sale' ); $response = $this->send($params); if(isset($response['L_ERRORCODE0']) || $response['RESULT'] != 0 || !isset($response['TOKEN'])) { return array( 'status' => 'FAIL', 'msg' => $response['RESPMSG'] ); } else { return array( 'status' => 'PASS', 'msg' => 'Transaction complete' ); } } }

Direct Payment

The alternative to Express Checkout is Direct Payment, the immediate charging of a credit or debit card without the user needing to sign up to PayPal. DP takes a set of card details, along with a billing name and address, and sends them through PayPal; the response will either be a successful charge against the card, or a failure with one of a number of reasons: mismatched billing name, invalid card number, and so forth. Because a good deal of information is asked for by DP, all the request fields below are required unless marked otherwise.

Req/Ret	Name	Value
Transaction details
Request	TRXTYPE	S
Request	TENDER	C
Request	AMT	Amount of the transaction
Request	CURRENCYCODE	Currency of the transaction (GBP)
Request	METHOD	DoDirectPayment
Request	PAYMENTACTION	Sale
Request	IPADDRESS	User's remote IP
Card details
Request	CREDITCARDTYPE	Type of card (Visa, MasterCard, Amex, Maestro, Solo)
Request	ACCT	Card number (12-20 digits)
Request	EXPDATE	Expiry date (MMYYYY, month is 01-12)
Request	STARTDATE	Start date (MMYYYY, month is 01-12) Required for Maestro and Solo cards only
Request	CVV2	Card security code (3-6 digits)
Request	FIRSTNAME	Cardholder's forename
Request	LASTNAME	Cardholder's surname
Request	STREET	Billing house number/name and street
Request	STREET2	[Optional] Second billing address line
Request	CITY	Billing address town/city
Request	ZIP	Billing address postcode
Request	COUNTRYCODE	Card issuing country (GB)
Return
Return	ACK	Success or Failure
Return	TRANSACTIONID	Alphanumeric ID given by PayPal
Return	TIMESTAMP	ISO-formatted time and date of transaction
Return	AMT	Amount of the transaction
Return	CURRENCYCODE	Currency of the transaction (GBP)

Table 3: API parameters for Direct Payment

PayPal will request address and CVV matching against the card when you send it through for processing; the results of these matches will be provided in the response along with the other parameters above, as the AVSADDR, AVSZIP and CVV2MATCH response values. Each of these will have one of the following characters as its value:

Y: Value was matched against that held at the bank;
N: Value was checked and found to be incorrect;
X: Value was not checked.

PayPal won't decline a transaction if these fail, but you can record these values against the transaction in case anything comes of them. The important flag, as with Express Checkout, is ACK; this will indicate whether the charge succeeded.

Request code for Direct Payment class PaymentPayPal { function direct($cc) { $params = array( 'TRXTYPE' => 'S', 'TENDER' => 'C', 'AMT' => $this->data['transaction_amount'], 'CURRENCYCODE' => 'GBP', 'METHOD' => 'DoDirectPayment', 'PAYMENTACTION' => 'Sale', 'IPADDRESS' => $_SERVER['REMOTE_ADDR'], 'CREDITCARDTYPE' => $cc['type'], 'ACCT' => $cc['number'], 'EXPDATE' => sprintf('%02d%04d', $cc['expmonth'], $cc['expyear'], 'CVV2' => $cc['cvv'] 'FIRSTNAME' => $this->data['user_fname'], 'LASTNAME' => $this->data['user_sname'], 'STREET' => $this->data['user_adstreet'], 'CITY' => $this->data['user_adtown'], 'ZIP' => $this->data['user_adpostcode'], 'COUNTRYCODE' => 'GB', ); // Fill in the start date if required if($cc['type'] == 'Maestro' || $cc['type'] == 'Solo') { $params['STARTDATE'] = sprintf('%02d%04d', $cc['startmonth'], $cc['startyear']); } $response = $this->send($params); if(isset($response['L_ERRORCODE0']) || $response['ACK'] == 'Failure') { return array( 'status' => 'FAIL', 'msg' => $response['L_LONGMESSAGE0'] ); } else { return array( 'status' => 'PASS', 'msg' => 'Transaction complete' ); } } }

Troubleshooting

A few issues may come up while using the above routines; a couple of the most common ones I came across are documented below.

Direct Payment transactions fail on small amounts of money:: PayPal will only accept a credit or debit card for charging if the transaction is more than £1.00; it's for this reason that the cost of the test transaction in Figure 1 is £1.20, to get over this hurdle and allow PayPal to process the transaction.
PayPal returns an "invalid merchant configuration" on Direct Payment:: This most often comes up if the PayPal account holder has upgraded their account to Website Payments Pro, and then not paid the monthly £20.00 fee; if this fee isn't paid, the Pro functionality and API access will be suspended, and the routines above won't have any valid credentials to connect with.

In Closing

That's pretty much all you need to know in order to send a transaction through PayPal Website Payments Pro. I haven't covered the more advanced aspects, such as recurring billing and refunds, but these are little more than different TRXTYPE's and are adequately documented as such in PayPal's own documentation. Just be aware that the authentication methods have changed between various revisions of the API: if the documentation asks that you send a VENDOR value, you can safely ignore it.

Imran Nazar <tf@imrannazar.com>, Sep 2009

Sci-fi Shorts: Prime Point

Wed, 19 Aug 2009 08:39:22 +0000

It had been a long time coming, but they were almost ready.

The launch had happened a few months before; a Paludis III rocket had blasted away from Guinea base, carrying a comms satellite and the HT probe. The probe had unfurled right on schedule, and started on its push to the Lagrange point that would be its destination. It was now reporting that it was approaching L5, and that it was ready to spool up the experiment.

James Kent was quite excited. Even though it was 4am local time, he hadn't been able to sleep knowing that HT was getting close to starting its job; he'd driven in to the lab to watch the data as it came in. As he arrived at the lab, it was obvious that no-one else involved on the project had been able to get any rest, either.

"Morning", called a voice over his shoulder as James sat down. It was Mike Rampton, the flight director; Mike would be overseeing operations in the command centre, while James was in charge of crunching the numbers to make sure the project ran smoothly.

"It's not morning yet, Mike. Status?" asked James.

"Looking good. HT's braking in towards L5 right now; obviously, our telemetry's about a minute delayed, so I'd say it's holding in place right now. We've secured the lunar scopes you wanted, so we should be able to see things as they happen."

James had asked for a day's worth of time on the new lunar telescopes sent up by Europe, mostly because it was difficult to see the Terran L5 point from Earth itself. It had set the project back a fair bit of money, but the telescopes had been contracted out to them for today, and both James and Mike intended to make good use of them.

"Good thing about the 'scopes is, we can see HT clear as day. With no atmosphere to cover up the view, our results will show up pretty well", Mike concluded.

"Sounds good, Mike. If we've arrived at L5, I'll start calibrating", James said. He sipped at a mug of coffee, and started inputting models into the computer on his desk.

While James crunched the coordinates, Mike confirmed that the probe had reached the Lagrange point. It had been quite easy for HT to slingshot across to the fifth point, where the Earth had been two months ago; little fuel had been expended on the journey, and the majority was going to be used in braking towards the gravitational stability point.

Theory held that at L5 (and conversely, at L4), one could plant a probe or satellite and have it stay there, rather than floating in towards the Earth or Sun. It would, of course, be moving at a speed equal to the Earth, since the Lagrange points moved with the Earth as it orbited; however, this would be the only speed required for calculations. That was why James had originally asked for one of these points to be used as the basis of the experiment.

Before too long, James indicated that he had calibrated things, and they were ready. Mike had adjusted one of the lunar 'scope feeds to show HT, in close-up, as it was parked at L5; the other 'scope was pointed at L4, on the other side of Earth's orbit, where it would be in two months' time. Mike sent out the signal for the probe to begin spinning up.

"We'll be able to see the spin-up in ahout two minutes' time, given signal delays. Should be another minute before things are at full speed", Mike stated. "Let's hope you've got all the gravities right, James."

"We're lucky that Jupiter is on the other side of the Sun right now, otherwise things would've got quite complicated with the coordinate calculation. As it is, we should end up more or less as expected", James replied.

As James continued with his coffee, HT returned that it was fully spun up.

"Drive?" Mike asked over the comm.

"Reporting 100%, caps are full. Settings programmed in; we're ready down here."

"Alright; fire it up."

Nothing happened; not immediately, at least. The telescopes still showed HT hanging in place at L5, for a minute or so. The tension mounted in the command room; James had a niggling sense that something might be off on his calculation, and HT would be destroyed or vanish never to be seen again. As quickly as the thought surfaced, it was discounted: a bright point flashed on the screen, and HT was gone. It had taken a minute for light to come from the L5 point to the lunar telescopes, and another few seconds for the images to relay to the command centre. HT was away.

A bright flash on the other monitor, and HT appeared. The stars behind it were different, however: the probe was now at L4, a full twenty million miles away. These images were also coming from the Moon, which meant that the move between Lagrange points had been almost instantaneous.

"What's that, 0.2 AU in half a second? Pretty quick, James", Mike laughed.

"And right where we wanted it, Mike. Let's get HT home and have a look at the insides; I wanna see what hyperspace did to my bacterial samples."

Parsing the DIME Message Format

Wed, 29 Jul 2009 19:22:40 +0000

Talking to web interfaces with SOAP has been made very simple in PHP, thanks to the inclusion of a SOAP module in the standard PHP 5 build. Having the functionality of a SOAP client built-in means it's very quick and easy to call an interface through either the raw RPC or the provided WSDL, which can return XML or plain text values.

The trouble starts to come in when the interface wishes to return a complicated result: a file that isn't XML or plain text, or a series of files in the same return package. There are various encodings used in the transfer of complex SOAP results; one of the more common is Direct Internet Message Encapsulation, or DIME.

The DIME Format

DIME is essentially a wrapper over MIME, allowing multiple MIME parts to be sent in one package. The format was developed by Microsoft as a draft standard, and was adopted by a good number of SOAP interfaces before the official data standard was drawn up. The concept of the format is very simple: a series of files, either XML or binary data, with a short header on each file.

Each part can be marked as the first and/or last part of the message, and the type of data it contains can also be marked as XML or binary. Being a Microsoft standard, the definition of the header for each part involves bitfields and binary fiddling. There's also scope provided for extensions to the DIME format, but none of these were ever defined, so you're unlikely to find any messages with options filled in.

Field	Length	Description
Version	5 bits	DIME format version (always 1)
First Record	1 bit	Set if this is the first part in the message
Last Record	1 bit	Set if this is the last part in the message
Chunk Record	1 bit	This file is broken into chunked parts
Type Format	4 bits	Type of file in the part (1 for binary data, 2 for XML)
Reserved	4 bits	(Classic Microsoft)
The following fields are big-endian numbers
Options Length	2 bytes	Length of the "options" field
ID Length	2 bytes	Length of the "ID" or "name" field
Type Length	2 bytes	Length of the "type" field
Data Length	4 bytes	Size of the included file
The following fields are variable-length, and padded to the next 4-byte boundary
Options	Part-specific option data, if any is defined (safely answer "no")
ID	Name of the file/part
Type	If typeformat is 1: MIME type of the part data If typeformat is 2: URI of the DTD file for the XML enclosed
Data	The file

Table 1: DIME part header

As you can see, it's possible for a DIME message to contain only one part, by marking it as both the first and the last. Each part follows directly on from the last, so it's easy enough to run through a DIME message in a loop, working out the position of each part by finding out where you end up after adding up the sizes of the four sections and the header (which is 12 bytes).

One complication of the format is that each section in a part (options, ID, type, data) is padded, so that it takes up an even multiple of 4 bytes; this is generally done by filling the gap with "0" bytes. For example, if the type of the file was given as "text/html", you'd end up with the following in the message:

Padded "type" field 74 65 78 74 2F 68 74 6D 6C 00 00 00 -- text/html

The green area above is the field data itself, defined by the header as 9 bytes long. The next multiple of 4 from there is 12, so three bytes of padding are added to push the field to an even boundary; these bytes are not counted as part of the field data.

Reading DIME messages in PHP

Using the structure pattern in PHP, it's quite a simple endeavour to build a class capable of reading in DIME messages and extracting the parts. The basis of this is the DIMERecord structure.

DIMERecord: Structure holding information about a DIME part class DIMERecord { public $version; public $first; public $last; public $chunked; public $type_format; public $options; public $id; public $type; public $data; }

Filling in this structure can be done from another class, acting as the DIME parser itself. It's this class which holds the array of DIMERecords referencing the parts.

DIME: Building the record array class DIME { const TYPE_BINARY = 1; const TYPE_XML = 2; public $records; function __construct($input) { $this->records = array(); $pos = 0; do { $r = new DIMERecord; // Shift out bitfields for the first fields $b = ord($input[$pos++]); $r->version = ($b>>3) & 31; $r->first = ($b>>2) & 1; $r->last = ($b>>1) & 1; $r->chunked = $b & 1; $r->type_format = (ord($input[$pos++]) >> 4) & 15; // Fetch big-endian lengths $lengths = array(); $lengths['options'] = ord($input[$pos++]) << 8; $lengths['options'] |= ord($input[$pos++]); $lengths['id'] = ord($input[$pos++]) << 8; $lengths['id'] |= ord($input[$pos++]); $lengths['type'] = ord($input[$pos++]) << 8; $lengths['type'] |= ord($input[$pos++]); $lengths['data'] = ord($input[$pos++]) << 24; $lengths['data'] |= (ord($input[$pos++]) << 16); $lengths['data'] |= (ord($input[$pos++]) << 8); $lengths['data'] |= ord($input[$pos++]); // Read in padded data foreach($lengths as $lk => $lv) { $r->$lk = substr($input, $pos, $lv); $pos += $lv; if($lv & 3) $pos += (4-($lv & 3)); } $this->records[] = $r; } while($pos < strlen($input)); } }

Chunking: Breaking up files across parts

The DIME standard also accommodates the ability to break up a file across multiple parts, in case the client or server don't have the processing power to fill out a header for a big file all at once. Parsing a chunked file from its parts involves checking the "chunked" bit for the part being checked, and the part before it, making a decision based on the values:

This part chunked?	Previous part chunked?	Action
No	No	This is a normal file; save
Yes	No	This is the first chunk part; start a data buffer
Yes	Yes	This is a continuation chunk; append to the data buffer
No	Yes	This is the last chunk part; append to the data buffer and save

Table 2: Actions taken for a chunked part

The type and id for the file are taken from the first chunk; any chunks after that have these fields set to zero, and have to be ignored. Implementing chunking involves extending the parser function, so that it holds a series of files as well as a series of records.

DIMEFile: Unchunking files class DIMEFile { public $type_format; public $type; public $id; public $data; } class DIME { const TYPE_BINARY = 1; const TYPE_XML = 2; public $records; public $files; function __construct($input) { $this->records = array(); $pos = 0; // Break out parts from the message string do { $r = new DIMERecord; // Shift out bitfields for the first fields $b = ord($input[$pos++]); $r->version = ($b>>3) & 31; $r->first = ($b>>2) & 1; $r->last = ($b>>1) & 1; $r->chunked = $b & 1; $r->type_format = (ord($input[$pos++]) >> 4) & 15; // Fetch big-endian lengths $lengths = array(); $lengths['options'] = ord($input[$pos++]) << 8; $lengths['options'] |= ord($input[$pos++]); $lengths['id'] = ord($input[$pos++]) << 8; $lengths['id'] |= ord($input[$pos++]); $lengths['type'] = ord($input[$pos++]) << 8; $lengths['type'] |= ord($input[$pos++]); $lengths['data'] = ord($input[$pos++]) << 24; $lengths['data'] |= (ord($input[$pos++]) << 16); $lengths['data'] |= (ord($input[$pos++]) << 8); $lengths['data'] |= ord($input[$pos++]); // Read in padded data foreach($lengths as $lk => $lv) { $r->$lk = substr($input, $pos, $lv); $pos += $lv; if($lv & 3) $pos += (4-($lv & 3)); } $this->records[] = $r; } while($pos < strlen($input)); // Unchunk records into files, as required $previous_chunk = 0; foreach($this->records as $r) { if(!$r->chunked) { if(!$previous_chunk) { // Normal part $f = new DIMEFile; $f->type_format = $r->type_format; $f->type = $r->type; $f->id = $r->id; $f->data = $r->data; $this->files[] = $f; } else { // Final chunk $f->data .= $r->data; $this->files[] = $f; } } else { if(!$previous_chunk) { // First chunk $f = new DIMEFile; $f->type_format = $r->type_format; $f->type = $r->type; $f->id = $r->id; $f->data = $r->data; } else { // Continuation $f->data .= $r->data; } } $previous_chunk = $r->chunked; } } }

Example: Requesting a Jasper report

The JasperServer reporting service uses SOAP to allow requests for reports, and a DIME-encoded message to return the status message XML and the report itself as one result. The details for our example JasperServer are as follows:

WSDL URI	http://localhost:8080/jasperserver/services/repository?wsdl
Namespace	http://www.jaspersoft.com/namespaces/php
Request	runReport
Report URI	/reports/inventory_list

Table 3: SOAP access details for an example JasperServer

Using these access details, and passing them through PHP's native SOAP client, it's a simple matter to retrieve the DIME-encoded return message.

SOAP client code to retrieve the report $request = ' XLS '; $c = new SoapClient( 'http://localhost:8080/jasperserver/services/repository?wsdl', array('trace' => true)); try { $c->__soapCall( 'runReport', array('request' => $request), array('namespace' => http://www.jaspersoft.com/namespaces/php)); } catch(SoapFault $cf) { // A DIME-encoded message has no text, generating an exception // Parse out the traced response, and get the file from there // Response should be one XML file, and one binary $dp = new DIME($c->__getLastResponse()); foreach($dp->files as $f) { if($f->type_format == DIME::TYPE_BINARY) { header('Content-type: '.$f->type); header('Content-disposition: attachment; filename="'.$f->id.'"'); echo $f->data; } } }

That's how you can use the DIME parser I've introduced here, to pull data out of a DIME-encoded SOAP response. As can be seen from the sample invocation here, all that's needed is to make a new DIME object from the message string, and check the array of files that's generated as a result.

Imran Nazar <tf@imrannazar.com>, Jul 2009

Rebuilding Your Leg After Shooting Yourself in the Foot

Thu, 04 Jun 2009 14:02:19 +0000

C

You build a new leg.

ASM

You redefine every cell in the leg, and all the inter-cell bonds; when you're done, watch your leg vanish at the first segfault.

C++

Nobody understands how you did it, but you've created an iron plated leg that runs so fast that it's essentially useless, and you walk in circles for hours.

C#

You build a leg, and it works wonderfully, but only within the city limits of Seattle.

Ruby

You build 15 more legs, and four more bodies to carry them around. For redundancy.

Objective C

You buy a new chrome leg, at great expense, and find it won't work without a chrome hip.

Objective-C on iPhone

You write an application that heals your leg when you shake it and sell it for $5. It also acts as a flashlight and makes fart noises when you use it.

Perl

You spend two days attempting to work with the Leg::Prostethic module, before running your previous C code through encryption and executing the result. A leather couch appears.

Prolog

You define your leg as healed, and wait four years for the leg to build itself.

AppleScript

A basic, cylindrical leg appears, which just barely works through a series of jerky, childlike movements.

Brainfuck

You spent nearly a month creating the replacement leg, everyone pats you on the back and you feel pretty good about yourself, but you somehow wonder if there was an easier way.

Malbolge

You spend the rest of your life rebuilding half a leg, but you somehow have to divide it by 57 before it can be finished.

Piet

You paint a picture of a leg, and then execute it. A leather couch appears.

Haskell

You heal your leg in one line. You then write 10 pages of LaTeX documentation on how you could have healed your leg more efficiently.

Befunge

With a good editor, your leg takes mere minutes to prototype, however you'll be improving it for the rest of your life to get it down to small enough to be usable.

Java (modified 2009-06-05)

After discovering the lack of a built-in library for legs, you spend 8 hours trying to write factory patterns for human, animal and robot legs, then instantiate a new leg. The stump of the old leg loiters around for a while before being collected.

Java on Android (added 2015-04-25)

Google builds and attaches your leg for you, but collects and records all your vital signs as stipulated under section 83.5 of the EULA you signed.

Lisp

You attach the leg bone to the hip bone, which is attached to the leg bone, which is attached to the hip bone...

COBOL

PROCEDURE DIVISION. WRITE ESSAY IN REGARD TO CONSTRUCTION OF NEW LEG. RUN.

Javascript

You spend a few hours building an array of pieces of leg, and a few weeks playing with toolkits that can slide pieces of leg around in all sorts of ways.

PHP

You can't remember whether the function order requires (hip, leg, ankle) or (ankle, leg, hip). Eventually through trial and error, you have a leg which works fine on tarmac but doesn't understand grass.

HTML

You build a framework for a leg, and you wake up in the morning with it blinking.

CSS

You attach a new leg to the hip and float it in the proper direction, but your feet are also attached to your hips because they haven't been cleared below the legs.

Scheme

You can only build a leg-1, so you can either have a leg to stand on or have something to give you a leg up, but not both.

SQL

You attempt to clean up the stump of your leg by deleting it, but you forget to specify which limb to delete; now you have no arms or legs.

Bash

You try to rm -r /limbs/leg/right/stump, but hit enter instead of tab after rm -r / and disappear.

Pascal

You build a leg module, just in time to learn that Modula-2 has legs built into the standard library.

BASIC

Your new leg looks perfectly fine, but doesn't work until you tattoo numbers down the side of the shin.

JCL

You spend weeks crafting a build mechanism for your leg, and submit it to the mainframe, to find a syntax error. Only then do you find, that you don't understand the code any more.

Erlang

You spawn a process to fix your leg, and a supervisor process to build more new legs just in case. The supervisor attaches all your new legs to your head, because your hip is an immutable structure.

Python (modified 2015-04-24)

from __future__ import leg

Fortran (modified 2009-06-17)

You build a new leg, and spend the rest of your life improving it in small increments until you can leap tall buildings. Unfortunately, no-one cares.

CUDA (added 2009-06-05)

You make 10,000 tiny legs which enable you to run at thirty miles an hour, but they only work on astroturf.

Lua (added 2009-06-05)

You find that you can only emulate a new leg as userdata with a metatable the size of the moon. Surprisingly, this works quite well if you upgrade your brain at the same time.

POVRay (added 2009-06-08)

You design a new leg from basic geometric shapes, and texture it in reflective Perlin noise. Because it looks cooler that way.

D (added 2009-06-08)

You build a new leg. You forget that the Makefile specifies omission of bounds checking, so attaching the new leg extends it into your torso, replacing many vital organs.

Matlab (added 2009-06-17)

You meticulously formulate the components of your new leg, and then build it. A graph of your leg's kinetic energy output appears.

Qubit C (added 2009-06-17)

You set out to create a new leg, which would explain the extra leg you sprouted last week. A leather couch appears.

Whitespace (added 2009-06-17)

Win32 API (added 2015-04-24)

Microsoft hands you a new leg, but it keeps randomly cramping up. Every diagnosis only comes up with a strange number and "The leg is incorrect". Your only option is to go home and sleep it off.

Visual Basic (added 2015-04-24)

You call the doctor's office to build you a new a leg, but find out that it comes with an incantation tattoo'd onto it that causes it to never stop running. The unfamiliar phrase reads: On Error Resume Next

Rust (added 2015-04-24)

You make a new leg without any trouble but when you try to attach it the compiler complains that that it doesn't live long enough.

With thanks to: letusgothen, TehLaser, VoidBoi, jercos, matja, apathy, asonge, Ended, tobias104, Emu*, MHD, lulzfish, Ysn, maafy6, RoadieRich.

Vanilla JavaScript Slideshows

Wed, 03 Jun 2009 19:18:57 +0000

Many sites introduce a dynamic element to their content by including a "slide-show" subsection: a list of items or pictures which can be scrolled through and viewed in portions, and which repeats to the first item when the last item is visible. An example of such a slide-show is show below.

Al-Qaeda 'kills British hostage'

03 June 2009 09:47

Downing Street says there is "strong reason to believe" that a British citizen has been killed by al-Qaeda militants in Africa.
Government still working - Harman

03 June 2009 10:07

Harriet Harman hits back at claims the government is in its "death throes" after ministers' decisions to stand down.
Broadband 'essential' to UK

03 June 2009 02:24

Consumers consider broadband internet as important a utility as water or electricity, a government advisory panel finds.
Obama embarks on Mid-East mission

03 June 2009 04:03

President Barack Obama heads to the Middle East on a visit aimed at increasing US engagement with the Islamic world.

Figure 1: Slide-show in use as a news ticker

In the above sample, the window is controlled by left and right anchors, which each give a direction for the window to move in by one step. It may seem complicated to implement such a slide-show, but this article will show you the basic pieces of code that go into making the slide-show work.

Viewing the world though a window

In order for the slide-show to operate, it's important to conceptualise what must be achieved. Each item in the section is joined horizontally to the next, to form a line of items. The user will see this line through a "window" which slides over the line, making a small portion visible.

Figure 2: Items seen through a window

The task of the slide-show is to control where this window lies, and when to move it. This can be achieved through simple use of HTML and JavaScript: a DIV block will act as the window, and the list of items is contained within the window.

HTML for window and list

The inline style placed on the window DIV demonstrates what will occur if the window is set to be a smaller width than the list contained within: the remainder of the list will be cut-off, and hidden from view.

Moving the window

In order to move the window over the list of items, repeated small steps must be taken: moving the window in one operation will result in a simple change of view, as opposed to the desired slide. This effect of repeated steps can be achieved by the judicious use of timeouts: the function responsible for making a small step sets a timer which will, after a short while, run the same function again to make the next step.

For this to work properly, the function must be able to detect when a full item has been scrolled into view, and it's time to stop moving further. In this article, I'll assume that the items in the list are all the same width, and that this width has been defined for the JavaScript function; it's possible, though more time-consuming, to record the positions of the list items in relation to each other, and detect the edge of an item by checking the window position against these item positions.

Moving the window over the list timestep = 50; pos = 0; posstep = 4; curtravel = 0; step = function() { timer = setTimeout(function(){step();}, timestep); curtravel += posstep; if(curtravel >= itemwidth) { curtravel = 0; clearTimeout(timer); } pos += posstep; if(pos >= listwidth) pos = 0; document.getElementById('window').style.left = pos+'px'; };

Note that in the above code, the window movement code will detect whether the window has reached the end of the list; if so, it will wrap to the start, and the first item will again be displayed.

Wrapping the list items

A problem presents itself with this simple approach: when approaching the end of the list, blank areas are shown before the first item appears once more. This is because the window is being positioned such that a part of it is past the end of the list, whereas the first item of the list is (as expected) at the start.

Figure 3: Window positioned past the end of the list

This issue can be alleviated by repositioning the items in the list, to allow for the first item to be rendered after the end of the list, and thus for the window to encapsulate the area while showing that the list is wrapping to the start. In order to do this, the items in the list must have their positions checked for each step through the movement of the window: if the item is positioned on the "wrong" end of the list to the window movement, it must be moved to the other side.

CSS to allow for movement of list items #window ul { position: relative; } #window li { position: absolute; } Step function including item repositioning step = function() { /* Initialise a timer, to do the step after this one */ timer = setTimeout(function(){step();}, timestep); /* Set the window's new position */ pos += posstep; if(pos >= listwidth) pos = 0; /* Check each item in the list, to see if it's outside the bounds of the window and if so, move it to the other side of the window to allow for scroll-in */ items = document.getElementById('window').getElementsByTagName('ul')[0].getElementsByTagName('li'); for(var i=0; i= 0) { itempos = i*itemwidth; if(itempos+pos > windowwidth) itempos -= listwidth; if(iempos+pos < -(windowwidth+itemwidth)) itempos += listwidth; items[i].style.left = itempos+'px'; } } /* Check if we're at the end of a scroll; if so, stop the timer */ curtravel += posstep; if(curtravel >= itemwidth) { curtravel = 0; clearTimeout(timer); } /* Set the new window position */ document.getElementById('window').style.left = pos+'px'; };

Moving both ways

The above code allows for the window to move rightwards over the list, which corresponds to a visual effect of pushing the list off-screen to the left, and bringing more items on-screen from the right. It's important that the code also allow for items to be scrolled to the right, which would involve moving the window leftwards over the list. This can be achieved by modifying the step function somewhat, to include clauses for the reverse direction of travel.

Step function allowing for reverse travel step = function(stepdir) { timer = setTimeout(function(){step(stepdir);}, timestep); items = document.getElementById('window').getElementsByTagName('ul')[0].getElementsByTagName('li'); if(stepdir < 0) { /* Handle moving the window left; the same as before, but with all the comparison and addition signs flipped */ pos -= posstep; if(pos <= listwidth) pos = 0; for(var i=0; i= listwidth) pos = 0; for(var i=0; i= 0) { itempos = i*itemwidth; if(itempos+pos > windowwidth) itempos -= listwidth; if(iempos+pos < -(windowwidth+itemwidth)) itempos += listwidth; items[i].style.left = itempos+'px'; } } } curtravel += posstep; if(curtravel >= itemwidth) { curtravel = 0; clearTimeout(timer); } document.getElementById('window').style.left = pos+'px'; };

Now all that must be remembered is that moving left entails a positive step direction, and that moving right is a negative step direction.

Caveats and extensions

In principle, it's quite easy to adapt this slideshow to travel vertically instead of horizontally. However, the fact that this system relies on a fixed width for each list item is a disadvantage when it comes to vertical travel: as can be seen in the above sample, items will differ in height depending on their content.

To alleviate this, the system previously mentioned can be employed: a list is maintained of the positions of the items in the list, and these are used to check the bounds of the window. If an item's initial position combined with the window's position is outside the window, this will force it to move to the other side of the list.

This system can easily be extended to allow for image slideshows, by changing the contents of the list to a list of images of the same width; be combining the images with anchors, a thumbnail slideshow can be created. Feel free to extend the code to your heart's content.

Imran Nazar <tf@imrannazar.com>, 2009

Sci-fi Shorts: Highrise

Thu, 09 Apr 2009 20:32:21 +0000

Ryan had been here before.

The place was palatable enough, if a bit lightweight: bed, shower, toilet cubicle, microwave oven. The east-facing wall was a viewscreen, which he could switch to transparent if he wanted to see what was going on outside, or to any of the dozens of full-definition channel feeds. There was, however, no way for him to talk to the outside: no terminals, no data sockets. Even his cellphone didn't work in here: it didn't seem to be picking up a signal.

He was on the ground floor of an apartment block, in room 106. The block was eight stories high, and there were eight apartments on each floor. This was Borneo Block 1, based in the equatorial country of the same name, and was the ideal spot to put such a place as this.

Some of the apartments were occupied by families, some by lone residents such as himself; none were empty. And none could be opened from the inside; once they were sealed from the outside, only an extraordinary emergency could open the doors. That was hardly likely to happen here, with no other buildings within five miles of the block.

It was a prison. A comfortable prison, with a short stay, but a prison none the less.

Every resident of the block would be here for four days, after which time a new list of tenants would be drawn up and moved in. Ryan had just moved in, so he settled down on the bed, identical to the one in 303 where he'd been placed last time, and flipped the viewscreen to Pacific News.

Four days later, it was time to leave. As Ryan stepped out of the shower, the apartment door unsealed and opened by itself. He dressed quickly, and made to leave. As he stepped out of the apartment, the door sealed itself shut again. He was met by Paul, manager of the summer shift at Borneo Block.

"Welcome back, Ryan. How's Earth?"

"Still there," he replied.

The apartment block began to draw away from them both; as it retreated, Ryan could see a blue border encroach on the edges of the block. The border got larger as the block dwindled, and intrusions of white and brown appeared in spots and wisps.

Ryan kept one eye on the apartment block as it fell back towards Earth, his temporary home for what the space elevator engineers called Lift Time; four days was the quickest comfortable journey up to orbit, but the block would fall much more quickly with no residents to cater for.

"We've got some retensioning to do on Four, if you want to jump straight in," Paul stated. Ryan was a maintenance engineer for the Borneo elevator, and cable tension was a part of his job.

"Let's get started, then."

Sci-fi Shorts: Go Northeast

Sun, 05 Apr 2009 19:52:57 +0000

He found himself waking up in a field. There was nothing unusual about that; he'd camped up in fields many times during his travels. Something was different this morning, though. For one thing, he could feel the wind over his face, and that meant he was in the open.

He opened his eyes. Expecting to see the dark green of his tent over him, he found a blue sky, tinged with the orange of a rising sun. He was indeed in the open, so where was his tent?

He sat up, rubbing his eyes, trying to focus. Around him, there was just grass; it was an open field, and he was apparently asleep right in the middle. He couldn't remember finding this field; even if he had picked this place to sleep overnight, his tent would've been over him, and he'd be nearer the woods. Maybe the tent blew away last night, but he couldn't see it now. He'd have to find another at some point.

He looked behind him, and there was a house in the distance. With the sun behind it, lying in its own shadow, the house looked stark. He could see, though, that it was a wooden house. The walls were lime-washed, and it looked like some of the windows were broken. The front door had been boarded over at one point, but the board had fallen away on one side.

He felt himself being drawn to the house, for some reason. Maybe because the side window was open just enough for one person to get through, though anything useful was probably long gone. His plan was to head further south today; his old map showed a village by the road, which might prove a good source of food for the next couple of weeks.

He got up, and made ready to leave. Instead of heading south, he turned around to face the house. He found himself walking towards the open window, as though something was pushing him towards it; as though a command had been given.

> GO NORTHEAST

Vanilla JavaScript Tab Controls

Sun, 08 Mar 2009 23:10:21 +0000

The most basic display elements on a HTML page are easy to understand: paragraphs, headings and tables. Often, a page is broken up into sections where each section contains these elements; these sections can be defined in the HTML source as blocks. The problem that arises from this is how to display the separate sections in an intuitive manner.

An example would be a login form, on which you can provide a username and password to log in. Another section of the page provides a "forgot-password" form, where an email address can be entered and a retrieval email sent. This page may be written in two sections, as below.

Two-section login form Login Provide login details Username: Password: Forgot Password Provide e-mail E-mail:

Having these two forms directly in line with each other could cause some confusion for the user. There are a few ways to alleviate this: bringing the two forms alongside each other, for example, would allow a visual separation of the two functions. The most effective display method in this situation, however, is tabbing.

The tab system

It's very likely that you've come across tabs before: they've been used by web browsers for many years as a way to display multiple pages in the same browser window. There are two components to a tabbing system: the tab list and the tab contents. Each entry in the tab list has an associated content block: when a given entry in the tab list is selected, the content block for that entry is displayed and the other content blocks are hidden away.

Figure 1: Graphical tabs, in a web browser

In the above example, four separate pages are open in the same web browser instance: the second is selected. As can be seen, it's obvious which tab is selected, and which contents are being displayed as a result. Tabs aren't just a graphical concept, however: they can be used equally well in a text-based environment.

Figure 2: Textual tabs, in a multi-terminal environment

In this example, four terminals are open in the multiple-terminal screen, and each one has an entry on the tab list at the bottom. The third terminal (an editor session) is currently selected, and the tab list reflects this by highlighting the third tab.

Tabs in a web page are visually very similar to these two interfaces; the above example of a login and forgot-password interface may be implemented as per the following diagrams.

Figure 3: Two tabs of a login page

Implementation: HTML

As stated above, the tab contents must be sectioned before tabbing can be applied; a tab list must also be present to allow switching between tabs. A simple way to break the content down is by placing each section in a DIV. The correspondence between tab list item and tab content is maintained by giving each DIV an id, which is used as the rel attribute on the list item. As described in the JavaScript section, the tab switcher will use this rel to determine which tab content to switch in.

Two-section login form, with tab code Login Forgot Password Provide login details Username: Password: Provide e-mail E-mail:

Implementation: CSS

Each tab in the tablist can be in one of two states: active (the currently selected tab) or inactive. In the above example, the tab list has been coded as an unordered list, which means that the list items must be floated next to each other if they are to appear on the same line.

The tab content DIV is a simple matter to style: a black border will suffice. The tab list, however, has to be positioned such that the "active" tab will visually merge with the tab contents. The easiest way to do this is to give the active tab and the tab content box the same background (in this case, white), and to set a bottom border on the active tab of white. From here, the tab list can be positioned to overlay the tab content, causing the active tab's white border to visually override the content's black border.

In CSS, the implementation could be as follows.

CSS for tab rendering /* Tab list: no bullets */ ul.tablist { list-style: none inside; margin: 0; padding: 0; } /* Tab list item: floated, pushed down one pixel */ ul.tablist li { display: block; float: left; background: #ddd; border-top: 1px solid #ddd; border-bottom: 1px solid black; position: relative; bottom: -1px; padding: 0.5em; margin-right: 2px; cursor: pointer; } /* Tab list item (active): white bottom border */ ul.tablist li.active { background: white; border-left: 1px solid black; border-right: 1px solid black; border-top: 1px solid black; border-bottom: 1px solid white; } /* Tab: black border */ div.tab { border: 1px solid black; clear: both; padding: 0.5em; }

Implementation: JavaScript

The most important part of the tabbing system is the active component: that part which switches in a tab and switches out the others, when an item on the tab list is clicked. In order to do this, a mapping must be maintained of which tab list items are in a particular list; this map can be created at the time the page is loaded.

At initialisation time, each item in a tab list is also given an onclick function, to activate the switching mechanism when the tab is clicked by the user. The mechanism is a simple loop, determining which tab content boxes are to be switched, and hiding every tab except the one requested.

JavaScript tab switcher tabSwitcher = { _map: {}, init: function() { // Check each UL on the page, to see if it's a tablist lists = document.getElementsByTagName('ul'); for(i=0; i'tablist') >= 0) { // If we find a tablist, put each item in the map items = lists[i].getElementsByTagName('li'); for(j=0; j// Map the item's REL attribute to this tablist tabSwitcher._map[items[j].getAttribute('rel')] = lists[i].id; // When the user clicks this item, run switcher items[j].onclick = function() { tabSwitcher.action(this.getAttribute('rel')); return false; }; } // Leave this tab list in a default state of // first item active tabSwitcher.action(items[0].getAttribute('rel')); } } }, action: function(target) { // Fetch all the tab list items in the same list as the target tablist = document.getElementById(tabSwitcher._map[target]); listitems = tablist.getElementsByTagName('li'); for(k=0; k// If this item's REL is the same as the clicked item, // activate the tab list item and show the content rel = listitems[k].getAttribute('rel'); if(rel == target) { listitems[k].className = 'tab_hi'; document.getElementById(rel).style.display = 'block'; } // Otherwise, make the tab list item inactive and hide the content else { listitems[k].className = 'tab'; document.getElementById(rel).style.display = 'none'; } } } }; window.onload = tabSwitcher.init;

Putting all these code sections together provides:

Figure 4: Tabbed interface example

Advanced usage: Multiple tab lists

Since the above JavaScript code is designed to map a tab list item to the list within which it's contained, it's possible to place multiple tab lists on the same page, and have each work independently; the tab switcher will maintain the relations to the appropriate tab lists in its internal map. This can be used for a detailed drill-down display, or any other point at which a tab list could be nested within another tab.

The styling of tabs can also be enhanced, to make judicious use of rounded tabs, colouring and the like; since the styling has been separated from the presentational HTML, restyling the tabs is merely a matter of changing the CSS used to define the tab styles.

Imran Nazar <tf@oopsilon.com>, 2009

NotPDO for PHP: Wrapping MySQL to look like PDO

Sat, 14 Feb 2009 19:39:20 +0000

Ever been in a situation where you need PDO-MySQL installed and running, but you don't have it? If it's imperative for your script to run database queries through PDO, and it's not installed on the server, there is a way to alleviate the problem: use a wrapper that looks like PDO.

The following is just such a wrapper, that I wrote to get an application running on a new host quickly. It's a hack, and functionality is missing that would otherwise be in PDO, but it covers the basics.

NotPDO.php: PDO lookalike wrapper for MySQL class NotPDO { private $dbconn; function NotPDO($dsn, $user='', $pass='') { $dsnparts = explode(':', $dsn); switch($dsnparts[0]) { case 'mysql': $dsnparams = explode(';', $dsnparts[1]); foreach($dsnparams as $dsnp) { $dsnpv = explode('=', $dsnp); switch($dsnpv[0]) { case 'host': $host = $dsnpv[1]; break; case 'dbname': $dbname = $dsnpv[1]; break; } } if(isset($host) && isset($dbname)) { $this->dbconn = mysql_connect($host, $user, $pass); if(!$this->dbconn) die('NotPDO: Database connection failed.'); if(!mysql_select_db($dbname, $this->dbconn)) die('NotPDO: Could not select database.'); } else { die('NotPDO: Database not specified.'); } break; default: die('NotPDO: Database type not supported.'); } } function prepare($q) { return new NotPDOQuery($q, $this->dbconn); } function query($q) { $q = new NotPDOQuery($q, $this->dbconn); $q->execute(); return $q; } }; class NotPDOQuery { private $dbconn; private $q; private $r; function NotPDOQuery($query, $dbconn) { $this->dbconn = $dbconn; $this->q = $query; } function bindParam($param, $val) { if(is_numeric($val)) { $this->q = str_replace( $param, mysql_real_escape_string($val, $this->dbconn), $this->q); } else { $this->q = str_replace( $param, "'".mysql_real_escape_string($val, $this->dbconn)."'", $this->q); } } function execute() { $this->r = mysql_query($this->q); if($this->r) return true; else return false; } function fetch() { return mysql_fetch_assoc($this->r); } function fetchAll() { $arr = array(); if(mysql_num_rows($this->r)) mysql_data_seek($this->r, 0); while($row = mysql_fetch_assoc($this->r)) $arr[] = $row; return $arr; } }; ?>

Modified Preorder Tree Traversal

Mon, 09 Feb 2009 14:53:00 +0000

The MPTT algorithm is, as the name states, a modification of pre-order tree traversal; each node of the tree has two extra values associated to it, to assist in the traversal of the tree.

Data structure

Node ID	Name	Left MPTT value	Right MPTT value
1	(Root)	1	16
2	Articles	2	11
5	Fiction	3	8
7	Fantasy	4	5
8	Sci-fi	6	7
6	Reference	9	10
3	Portfolio	12	13
4	Contact	14	15

SELECT * from Pages order by mpttLeft asc

Figure 1: Sample MPTT tree

Numbers are assigned such that a path can be traced around the tree, taking in every node. The path here starts at (Root), flowing down the left, around the bottom of the Fiction subtree, then up to the Reference branch of Articles, and from there to the other branches of (Root), before flowing back to (Root).

Note that leaf nodes (those with no children) have Left and Right values immediately after each other; Portfolio, for example, is 12/13. Note also that the parent of a node has a smaller Left and a bigger Right; this can be used to trace up the tree finding parent nodes, until you hit a Left of 1 (meaning the root node). For example, Fantasy (4/5) has a parent of Fiction (3/8), which has a parent of Articles (2/11).

Operations

Addition

To add a node, simply add 2 to all the values after that node, then insert the node into place. As an example, inserting a "Horror" section under Fiction is a simple matter of:

Finding where to insert "Horror" (after 7)
Adding 2 to every Left value past 7
Adding 2 to every Right value past 7
Inserting a new row, "Horror", with Left=8 and Right=9

At the end of this, "Fiction" would have a Left of 3 and Right of 10.

Removal

A very similar process is used to delete a node from the tree. For example, to delete "Fantasy":

Fetch the Right value of the node (in this case, 5)
Subtract 2 from every Left value past 5
Subtract 2 from every Right value past 5
Remove the node row

CRC32 Calculation in 256 Bytes

Mon, 09 Feb 2009 14:22:44 +0000

The following program is in nasm format, and assembles to a DOS .com executable of 240 bytes.

The program is based on Horst Schäffer's 784-byte CRC32 calculator, which is part of the PBATS32 collection.

crc32.asm: A CRC32 calculator [bits 16] org 0x0100 start: mov di,CMDBUF mov cx,0x7800 xor ax,ax rep stosw mov di,LUTBUF xor cx,cx .lutolp: xor dx,dx mov ax,cx mov ch,8 .lutlp: shr dx,1 rcr ax,1 jnc .xorskip xor dx,0xEDB8 xor ax,0x8320 .xorskip: dec ch jnz .lutlp stosw mov ax,dx stosw inc cx and ch,ch jz .lutolp mov si,0x81 mov di,CMDBUF mov dx,di .getarg: lodsb cmp al,' ' je .getarg jb .aend cmp al,',' je .aend cmp al,'/' je .aend stosb jmp short .getarg .aend: cmp dx,di je printhelp push di mov ax,0x3D40 int 0x21 jc printhelp mov bp,ax mov di,LUTBUF xor ax,ax dec ax cwd .crcolp: push ax push dx mov si,READBUF mov dx,si mov cx,READBLEN mov bx,bp mov ah,0x3F int 0x21 mov cx,ax pop dx pop ax jc .crcend jcxz .crcdone .crclp: mov bx,ax lodsb xor bx,ax shl bx,2 mov al,ah mov ah,dl mov dl,dh xor dh,dh xor ax,[bx+di] xor dx,[bx+di+2] loop .crclp cmp si,READBUF+READBLEN jnc .crcolp .crcdone: not ax not dx .crcend: push ax mov bx,bp mov ah,0x3e int 0x21 pop bx pop di mov al,' ' stosb mov cx,4 .prn: mov al,dh mov dh,dl mov dl,bh mov bh,bl call byteascii loop .prn mov ax,0x0a0d stosw mov al,'$' stosb mov dx,CMDBUF jmp short printmsg printhelp: mov dx,msghelp printmsg: mov ah,9 int 0x21 ret byteascii: xor ah,ah div byte [divider] call .inner xchg ah,al .inner: cmp al,10 sbb al,0x69 das stosb ret divider: db 16 msghelp: db "Usage: CRC32 ",13,10,'$' CMDBUF equ 0x03C3 LUTBUF equ 0x0468 READBUF equ 0x0968 READBLEN equ 0xC000

List of countries and dependent territories

Wed, 31 Dec 2008 20:36:20 +0000

Below is a table of countries, with their ISO and UN codes, assigned top-level Internet domains and timezone relative to UTC. In all cases, there may be gaps where the ISO or UN codes, or Internet domains, have not yet been assigned. Dependent territories are provided as entries under the country to which responsibility is recognised; entities which no longer exist are shown in grey.

Caveats apply especially in the cases of Serbia, Montenegro and Kosovo, where the codes may have changed since this list was produced.

This list is also available in SQL format, where the following fields are defined:

country_id: Sequential from 1 to 250;
name: Alpha characters, ASCII;
iso2: ISO alpha-2 code if available;
iso3: ISO alpha-3 code if available;
un3: UN number if available;
sovereign: Flag (0 if dependent territory);
extant: Flag (0 if no longer existing);
parent: Set to parent's country_id if dependent;
cctld: Two-character Internet domain;
time_offset: Number of hours from UTC.

http://imrannazar.com/content/files/countries.sql

Country	ISO alpha-2	ISO alpha-3	UN number	Internet TLD	Timezone
Afghanistan	AF	AFG	004	af	+4.50
Albania	AL	ALB	008	al	+1
Algeria	DZ	DZA	012	dz	+1
Andorra	AD	AND	020	ad	+1
Angola	AO	AGO	024	ao	+1
Antigua and Barbuda	AG	ATG	028	ag	-4
Argentina	AR	ARG	032	ar	-3
Armenia	AM	ARM	051	am	+4
Australia	AU	AUS	036	au	+10
Christmas Island				cx	+7
Cocos (Keeling) Island				cc	+6.50
Heard and McDonald Islands				hm	+5
Norfolk Island	NF	NFK	574	nf	+11.50
Austria	AT	AUT	040	at	+1
Azerbaijan	AZ	AZE	031	az	+4
Bahamas	BS	BHS	044	bs	-5
Bahrain	BH	BHR	048	bh	+3
Bangladesh	BD	BGD	050	bd	+6
Barbados	BB	BRB	052	bb	-4
Belarus	BY	BLR	112	by	+2
Belgium	BE	BEL	056	be	+1
Belize	BZ	BLZ	084	bz	-6
Benin	BJ	BEN	204	bj	+1
Bhutan	BT	BTN	064	bt	+6
Bolivia	BO	BOL	068	bo	-4
Bosnia and Herzegovina	BA	BIH	070	ba	+1
Botswana	BW	BWA	072	bw	+2
Brazil	BR	BRA	076	br	-3
Brunei Darussalam	BN	BRN	096	bn	+8
Bulgaria	BG	BGR	100	bg	+2
Burkina Faso	BF	BFA	854	bf	+0
Burundi	BI	BDI	108	bi	+2
Cambodia	KH	KHM	116	kh	+7
Cameroon	CM	CMR	120	cm	+1
Canada	CA	CAN	124	ca	-5
Cape Verde	CV	CPV	132	cv	-1
Central African Republic	CF	CAF	140	cf	+1
Chad	TD	TCD	148	td	+1
Chile	CL	CHL	152	cl	-4
China	CN	CHN	156	cn	+8
Hong Kong	HK	HKG	344	hk	+8
Macau	MO	MAC	446	mo	+8
Colombia	CO	COL	170	co	-5
Comoros	KM	COM	174	km	+3
Congo, Democratic Republic of	CD	COD	180	cd	+1
Congo, Republic of	CG	COG	178	cg	+1
Costa Rica	CR	CRI	188	cr	-6
Cote d'Ivoire	CI	CIV	384	ci	+0
Croatia	HR	HRV	191	hr	+1
Cuba	CU	CUB	192	cu	-5
Cyprus	CY	CYP	196	cy	+2
Czech Republic	CZ	CZE	203	cz	+1
Denmark	DK	DNK	208	dk	+1
Faroe Islands	FO	FRO	234	fo	+0
Greenland	GL	GRL	304	gl	-3
Djibouti	DJ	DJI	262	dj	+3
Dominica	DM	DMA	212	dm	-4
Dominican Republic	DO	DOM	214	do	-4
Ecuador	EC	ECU	218	ec	-5
Egypt	EG	EGY	818	eg	+2
El Salvador	SV	SLV	222	sv	-6
Equatorial Guinea	GQ	GNQ	226	gq	+1
Eritrea	ER	ERI	232	er	+3
Estonia	EE	EST	233	ee	+2
Ethiopia	ET	ETH	230	et	+3
Fiji	FJ	FJI	242	fj	+12
Finland	FI	FIN	246	fi	+2
Aland	AX	ALA	248		+2
France	FR	FRA	250	fr	+1
French Guiana	GF	GUF	254	gf	-3
French Polynesia	PF	PYF	258	pf	-10
French Southern Territories				tf	+5
Guadeloupe	GP	GLP	312	gp	-4
Martinique	MQ	MTQ	474	mq	-4
Mayotte	YT	MYT	175	yt	+3
New Caledonia	NC	NCL	540	nc	+11
Reunion	RE	REU	638	re	+4
Saint Pierre and Miquelon	PM	SPM	666	pm	+0
Wallis and Futuna Islands	WF	WLF	876	wf	+12
Gabon	GA	GAB	266	ga	+1
Gambia	GM	GMB	270	gm	+0
Georgia	GE	GEO	268	ge	+4
Germany	DE	DEU	276	de	+1
Ghana	GH	GHA	288	gh	+0
Greece	GR	GRC	300	gr	+2
Grenada	GD	GRD	308	gd	-4
Guatemala	GT	GTM	320	gt	-6
Guinea	GN	GIN	324	gn	+0
Guinea-Bissau	GW	GNB	624	gw	+0
Guyana	GY	GUY	328	gy	-4
Haiti	HT	HTI	332	ht	-5
Honduras	HN	HND	340	hn	-6
Hungary	HU	HUN	348	hu	+1
Iceland	IS	ISL	352	is	+0
India	IN	IND	356	in	+5.50
Indonesia	ID	IDN	360	id	+7
Iran	IR	IRN	364	ir	+3.50
Iraq	IQ	IRQ	368	iq	+3
Ireland	IE	IRL	372	ie	+0
Israel	IL	ISR	376	il	+2
Italy	IT	ITA	380	it	+1
Jamaica	JM	JAM	388	jm	-5
Japan	JP	JPN	392	jp	+9
Jordan	JO	JOR	400	jo	+0
Kazakhstan	KZ	KAZ	398	kz	+6
Kenya	KE	KEN	404	ke	+3
Kiribati	KI	KIR	296	ki	+12
Korea, North	KP	PRK	408	kp	+9
Korea, South	KR	KOR	410	kr	+9
Kosovo					+1
Kuwait	KW	KWT	414	kw	+3
Kyrgyzstan	KG	KGZ	417	kg	+6
Laos	LA	LAO	418	la	+7
Latvia	LV	LVA	428	lv	+2
Lebanon	LB	LBN	422	lb	+0
Lesotho	LS	LSO	426	ls	+2
Liberia	LR	LBR	430	lr	+0
Libya	LY	LBY	434	ly	+2
Liechtenstein	LI	LIE	438	li	+1
Lithuania	LT	LTU	440	lt	+2
Luxembourg	LU	LUX	442	lu	+1
Macedonia	MK	MKD	807	mk	+1
Madagascar	MG	MDG	450	mg	+3
Malawi	MW	MWI	454	mw	+2
Malaysia	MY	MYS	458	my	+8
Maldives	MV	MDV	462	mv	+5
Mali	ML	MLI	466	ml	+0
Malta	MT	MLT	470	mt	+1
Marshall Islands	MH	MHL	584	mh	+12
Mauritania	MR	MRT	478	mr	+0
Mauritius	MU	MUS	480	mu	+4
Mexico	MX	MEX	484	mx	-6
Micronesia	FM	FSM	583	fm	+10
Moldova	MD	MDA	498	md	+2
Monaco	MC	MCO	492	mc	+1
Mongolia	MN	MNG	496	mn	+8
Montenegro				me	+1
Morocco	MA	MAR	504	ma	+0
Western Sahara	EH	ESH	732	eh	+0
Mozambique	MZ	MOZ	508	mz	+2
Myanmar	MM	MMR	104	mm	+6.50
Namibia	NA	NAM	516	na	+2
Nauru	NR	NRU	520	nr	+12
Nepal	NP	NPL	524	np	+5.75
Netherlands	NL	NLD	528	nl	+1
Aruba	AW	ABW	533	aw	-4
Netherlands Antilles	AN	ANT	530	an	-4
New Zealand	NZ	NZL	554	nz	+12
Cook Islands	CK	COK	184	ck	-10
Niue	NU	NIU	570	nu	-11
Tokelau				tk	-10
Nicaragua	NI	NIC	558	ni	-6
Niger	NE	NER	562	ne	+1
Nigeria	NG	NGA	566	ng	+1
Norway	NO	NOR	578	no	+1
Bouvet Island				bv	+1
Svalbard and Jan Mayen Islands	SJ	SJM	744	sj	+1
Oman	OM	OMN	512	om	+4
Pakistan	PK	PAK	586	pk	+5
Palau	PW	PLW	585	pw	+9
Palestinian Territory, Occupied	PS	PSE	275	ps	+0
Panama	PA	PAN	591	pa	-5
Papua New Guinea	PG	PNG	598	pg	+10
Paraguay	PY	PRY	600	py	-4
Peru	PE	PER	604	pe	-5
Philippines	PH	PHL	608	ph	+8
Poland	PL	POL	616	pl	+1
Portugal	PT	PRT	620	pt	+0
Qatar	QA	QAT	634	qa	+3
Romania	RO	ROU	642	ro	+2
Russia	RU	RUS	643	ru	+3
Rwanda	RW	RWA	646	rw	+2
Saint Kitts and Nevis	KN	KNA	659	kn	-4
Saint Lucia	LC	LCA	662	lc	-4
Saint Vincent and the Grenadines	VC	VCT	670	vc	-4
Samoa	WS	WSM	882	ws	-11
San Marino	SM	SMR	674	sm	+1
Sao Tome and Principe	ST	STP	678	st	+0
Saudi Arabia	SA	SAU	682	sa	+3
Senegal	SN	SEN	686	sn	+0
Serbia	CS	SCG	891	rs	+1
Seychelles	SC	SYC	690	sc	+4
Sierra Leone	SL	SLE	694	sl	+0
Singapore	SG	SGP	702	sg	+8
Slovakia	SK	SVK	703	sk	+1
Slovenia	SI	SVN	705	si	+1
Solomon Islands	SB	SLB	090	sb	+11
Somalia	SO	SOM	706	so	+3
South Africa	ZA	ZAF	710	za	+2
Spain	ES	ESP	724	es	+1
Sri Lanka	LK	LKA	144	lk	+5.50
Sudan	SD	SDN	736	sd	+3
Suriname	SR	SUR	740	sr	+0
Swaziland	SZ	SWZ	748	sz	+2
Sweden	SE	SWE	752	se	+1
Switzerland	CH	CHE	756	ch	+1
Syria	SY	SYR	760	sy	+0
Taiwan	TW	TWN	158	tw	+8
Tajikistan	TJ	TJK	762	tj	+5
Tanzania	TZ	TZA	834	tz	+3
Thailand	TH	THA	764	th	+7
Timor-Leste	TL	TLS	626	tp	+9
Togo	TG	TGO	768	tg	+0
Tonga	TO	TON	776	to	+13
Trinidad and Tobago	TT	TTO	780	tt	-4
Tunisia	TN	TUN	788	tn	+1
Turkey	TR	TUR	792	tr	+2
Turkmenistan	TM	TKM	795	tm	+5
Tuvalu	TV	TUV	798	tv	+12
USSR	SU	SUN	810	su	+0
Uganda	UG	UGA	800	ug	+3
Ukraine	UA	UKR	804	ua	+2
United Arab Emirates	AE	ARE	784	ae	+4
United Kingdom	GB	GBR	826	uk	+0
Anguilla	AI	AIA	660	ai	-4
Ascension Island				ac	+0
Bermuda	BM	BMU	060	bm	-4
British Indian Ocean Territory				io	+6
British Virgin Islands	IO	VGB	092	vg	-4
Cayman Islands	KY	CYM	136	ky	-5
Falkland Islands (Malvinas)	FK	FLK	238	fk	-4
Gibraltar	GI	GIB	292	gi	+1
Guernsey	GG	GGY	831	gg	+0
Isle of Man	IM	IMN	833	im	+0
Jersey	JE	JEY	832	je	+0
Montserrat	MS	MSR	500	ms	-4
Pitcairn Island	PN	PCN	612	pn	-8
Saint Helena	SH	SHN	654	sh	+0
South Georgia and the South Sandwich Islands				gs	-2
Turks and Caicos Islands	TC	TCA	796	tc	-5
United States of America	US	USA	840	us	-5
American Samoa	AS	ASM	016	as	-11
Guam	GU	GUM	316	gu	+10
Northern Mariana Islands	MP	MNP	580	mp	+10
Puerto Rico	PR	PRI	630	pr	-4
US Minor Outlying Islands				um	-11
United States Virgin Islands	VI	VIR	850	vi	-4
Uruguay	UY	URY	858	uy	-3
Uzbekistan	UZ	UZB	860	uz	+5
Vanuatu	VU	VUT	548	vu	+11
Vatican City State (Holy See)	VA	VAT	336	va	+1
Venezuela	VE	VEN	862	ve	-4.50
Vietnam	VN	VNM	704	vn	+7
Yemen	YE	YEM	887	ye	+3
Yugoslavia	YU	YUG	890	yu	+0
Zambia	ZM	ZMB	894	zm	+2
Zimbabwe	ZW	ZWE	716	zw	+2

Asynchronous JavaScript and JSON

Tue, 30 Dec 2008 18:59:08 +0000

One of the most common uses of asynchronous JavaScript in web development is live page updates: loading new content for a portion of a page, without reloading the page in full. This is made very simple by the thousands of JavaScript examples available on the Web, but there is a deficiency inherent in this methodology.

If you need to update more than one portion of a page at the same time, the traditional asynchronous request can cause some pain. Since the methodology only allows for the update of one portion, multiple requests have to be generated; each of these requests will have a load time, involving processing and transfer. The end result is that the application works less efficiently than otherwise may be possible.

If instead, it were possible to retrieve all the updates in one request, that would allow for a quicker and more responsive application. This can be done relatively simply, by taking advantage of JavaScript Object Notation.

JSON encoding

By way of example, let's take a football results website that wishes to display scores in more-or-less real time. By the traditional method of asynchronous requests, this could be done one of two ways:

Requesting a full results update: By refreshing the results container in full, more transfer load is generated, especially if the results have not changed between requests.
Requesting updates for each game: Since the client cannot know which games have had changed results, every game must be updated. This generates even more transfer and processing load than the first option.

A good way to alleviate this problem would be to send only the games which have changed score, in some easy-to-transfer encoding. JSON provides that encoding, through a few simple rules:

A value can be a number, string, true, false, null, or one of the following two types.
An array is a series of values, indexed sequentially from zero; values can be of any type, including another array.
An object is a set of name/value pairs, where the name is a string and the value can be any type, including another object.

If the current list of football results is as follows:

HTML for multiple-score results page Manchester United: id="MAN">2 - id="AST">3 :Aston Villa Arsenal: id="ARS">1 - id="EVE">1 :Everton
A simplistic method of transferring some updates for the football results may, using JSON, generate the following response:
JSON-encoded multiple-score update {"MAN":3, "EVE":2}
As can be seen in the above response, the value for MAN is now 3, and the value for EVE is 2. These name/value pairs can be used to update the appropriate elements: for each pair in the response, update the element whose id is the name of the pair, with the new value.

Handling JSON responses

The name "JavaScript Object Notation" infers some native ability of JavaScript to understand it; and indeed, it's possible for a script to simply evaluate the JSON response and refer to its contents as if it were a normal variable. A script to update the page contents in the manner described above may look similar to this:
Decoding and performing a multiple-score update updateScores = function(response) { r = eval('(' + response + ')'); for (k in r) document.getElementById(k).innerHTML = r[k]; };
If this function is used as the response handler by the AJAX script, it will be able to parse the JSON-encoded strings returned from the server. One of the advantages of using JSON is the ability to send other things than a plain response; for example, an inline script could be sent with the update:
JSON-encoded update with inline JS {"MAN":3, "EVE":2, "script":"alert('There has been an update.');"}
A simple modification to the response handler will suffice to be able to use this new inline script:
Decoding a JSON update with a script updateScores = function(response) { r = eval('(' + response + ')'); for (k in r) { if(k == 'script') eval(r[k]); else document.getElementById(k).innerHTML = r[k]; } };
There are a couple of caveats involved with using this methodology:

Quoting:

The JSON standard dictates that strings must be enclosed in double-quotes; this means that any double-quotes in the string contents have to be escaped if JavaScript is to understand them. This should automatically be handled by the server script's JSON encoding mechanism, but it's something that the encoder may trip up on.

Security:

Many developers refuse to conscience the use of eval in JavaScript, since it's possible for arbitrary code to be executed simply by returning desired code instead of the JSON response. Some precautions have been taken by the response handler I've set out above, but the server should also be employed to ensure that the requesting session is valid.

If these issues are kept in mind, JSON-encoded AJAX responses are a very useful tool for live Website updates, and can be built upon to generate highly efficient and responsive tools.

Using Pointers in C#

Sat, 01 Nov 2008 01:30:19 +0000

Pointers are often used in lower-level languages, such as C, as an efficient way to access and perform operations on memory buffers. In C, as with many other languages which allow their use, a pointer is simply a variable with a value: that value is the address of a region of memory, and the memory can be accessed by dereferencing the pointer.

The current generation of languages have generally eschewed pointers, and most big thinkers in programming discourage their use. As The C Programming Language states:

Pointers have been lumped with goto statement as a marvelous way to create impossible-to-understand programs. ... With discipline, however, pointers can be used to achieve clarity and simplicity.

It's for this reason that the team behind the development of C# ran against the trend of removal for pointer syntax, and included pointers in the language.
Pointers and unsafe blocks

C# is one of the languages in the .NET family, and as such is run under the Common Language Runtime (CLR). It's the runtime that takes care of memory operations and lower-level functionality on the program's behalf, so that the program doesn't normally have to worry about pointers or memory buffers.

The language does, however, provide a way for pieces of code to avoid the constraints of the CLR. Any blocks marked as unsafe will not be managed by the runtime, and it's up to the programmer to test and ensure that the code works as expected.
An unsafe block in a method static void foo() { int i = 10; unsafe { int *p = &i; System.Console.WriteLine("Value at p: " + *p); System.Console.WriteLine("Address of p: " + (int)p); } }
As can be seen in this example, pointer dereferencing and casts are both allowed to happen inside an unsafe block, such that pointer operations within these blocks proceed much as they would under C.

A graphic example: The fire effect

One of the first things a budding coder wants to learn is how to make a game, and the first stage in that journey is how to output graphics to the screen. Among the simplest demonstrations of graphical output are the two-dimensional colour effects: palette rotation, plasma, and the effect being explored here, fire.

The effect relies on the screen being represented as a memory buffer, running left-to-right and top-down; accessing consecutive memory locations will run through the whole buffer in sequence. An averaging algorithm is applied across the buffer, which runs as follows:
Fire algorithm, applied to a memory buffer // Introduce randomness to the averaging For each X-coordinate on the bottom line Fill in the pixel with a random value Next // Apply averaging filter For all other lines in the buffer For each X-coordinate on the line Total = Value of current pixel + Value of pixel to the right + Value of pixel underneath + Value of pixel to the bottom left Avg = Total / 4 // Decrement, so lines toward the top fade to 0 Avg = Avg-1 If (Avg < 0) Avg = 0 Value of current pixel = Avg Next Next
The effect of this is that the larger values, from the line beneath, are transferred in lesser form up the screen; combined with a forced decrement on the averaged value, the effect is a randomised fading from high values at the bottom, to zero at the top. With the appropriate palette to define high values as white, going through yellow and red to black, it's easy to make this look like a burning fire.

It's a situation that's ideal for pointers: dereferencing a pointer to get the value at the current pixel, adding constants on to get to the pixels around the current one, and then pushing the pointer along to do the next byte.

Putting the effect on-screen

The simplest way of using this effect to get something on the screen is to use a Windows Form; the Form base class will handle all the window instantiation and mouse events, leaving us to concentrate on putting data into the window. By holding a handle to an 8-bit bitmap image, and drawing that image into the window, the Bitmap class will do all the work of calculating palette colours and translating them from the bitmap indices.
fire.cs: Rendering the fire using System; using System.Drawing; using System.Drawing.Imaging; using System.Windows.Forms; class FirstForm : Form { private Bitmap buf; // Graphic buffer private Random rnd; // RNG source private const int width = 320; private const int height = 240; public FirstForm() { // Initialise an RNG for later use rnd = new Random(); // Set the initial properties of the form this.Text = "Fire #1"; this.ClientSize = new Size(width, height); this.MaximizeBox = false; this.BackColor = Color.Black; SetStyle(ControlStyles.Opaque, true); // Nominate the paint function this.Paint += new PaintEventHandler(this.DoFire); // Generate a 320x240 bitmap, fire palette buf = new Bitmap(width, height, PixelFormat.Format8bppIndexed); ColorPalette pal = buf.Palette; // Fill the palette with the following 64-colour blocks: // Black to red, Red to yellow, Yellow to white, White // Since each range is 64 colours, and RGB spans 256 values, // utilise the left shift to multiply up for(int i=0; i<64; i++) { pal.Entries[i] = Color.FromArgb(i<<2, 0, 0); pal.Entries[i+64] = Color.FromArgb(255, i<<2, 0); pal.Entries[i+128] = Color.FromArgb(255, 255, i<<2); pal.Entries[i+192] = Color.FromArgb(255, 255, 255); } buf.Palette = pal; } // The paint function delegated to handle drawing the fire private void DoFire(object src, PaintEventArgs e) { // Lock the bitmap so we can write to it direct BitmapData buflock = buf.LockBits( new Rectangle(Point.Empty, buf.Size), ImageLockMode.ReadWrite, PixelFormat.Format8bppIndexed); // Write a fire // This section uses pointers, and is thus deemed "unsafe" unsafe { // Fetch a pointer to the top scanline of the image Byte *bufdata = (Byte*)buflock.Scan0; Byte *bufbottom = bufdata + ((height-1) * width); Byte *i; int v; // Write a random bottom line as source of the fire for(int x=0; x0, 255); } // For each pixel in the image, average the values of // the pixel, the one to the right, the one underneath // and the one to the bottom left. Threshold to 0, // and write to the current position. for(i=bufdata; i4; if(v<0) v=0; *i = (Byte)v; } } // Unlock ourselves out from the image and blit it to the Form buf.UnlockBits(buflock); e.Graphics.DrawImageUnscaled(buf, 0, 0); // Ensure that we'll be drawing another frame real soon, by // forcing a repaint this.Invalidate(); } public static void Main(string[] args) { Application.Run(new FirstForm()); } }
In the full example above, the unsafe block in the paint handler is where the averaging algorithm is applied across the bitmap buffer, through the use of pointers. By running this code for a few seconds, the following is produced on-screen.

Figure 1: The fire algorithm applied to a bitmap buffer

So that's how it works. Some languages in the current crop forbid the use of pointers altogether: C# allows their use, but only if you promise to keep things clean, because the runtime won't do it for you.

Copyright Imran Nazar <tf@imrannazar.com>, 2008.

Automated Deployment with Subversion

Tue, 21 Oct 2008 21:37:43 +0000

When introducing version control into a web development environment, the process often happens gradually. Generally, one developer will set up a repository, throw all the code in there, and maintain two working copies: one to develop with, and one to act as the production codebase. On occasion, these will even be the same working copy.

At this point, it's simple to work the version control: the developer commits from his local working copy, and updates the "live" working copy to the new revision. There are, of course, a couple of drawbacks to this approach:

No testing phase:

Unless the developer has a testing environment set up on his local machine, there's no way to test code that's being written, until it's sitting on the production server. If the code is bug-ridden this presents a problem, since the live dataset is being corrupted.

Single-developer testing:

Even if the developer in question has a test environment set up, a new developer assigned to the product will not be able to test his work without a replica of that setup on his own machine. Ideally, a testing machine would be made available so that any developer could test their work without recourse to a specialised setup on their own computer.

Manual deployment:

Once a developer is satisfied that their work will stand up in production, whether this be tested with a common testbed or on their own workstation, the repository now has to be exported, and copied to the live server. Depending on the convolutions required to connect to the live server, this can be a tedious and/or complicated process.

This article discusses the next phase of repository setup: separation of the testing and production environments, and automated updating of both.

Subversion Hooks

If a common testing server is introduced into the process, and a working copy of the codebase is placed into testing, it's quite simple for a developer to test code: simply commit their working copy to the repository, and update the copy on the testing server.

As with a standard production server, however, a manual step remains: the testing copy has to be updated before it can be used. The commit process itself must somehow automatically update the testing copy if the test server is to be of any real use. Fortunately, Subversion provides a facility to perform actions as part of a commit, with the hook scripting system.

A Subversion hook is a script that is run by the Subversion server whenever a particular thing happens. There are a few actions which can trigger a hook, but we're only interested in the commit hook scripts:

start-commit: Run before Subversion opens a database transaction to store the updated work. If this script returns an error, the commit fails before it even began.

pre-commit: Run just before Subversion writes the commit into the database. If this errors, the commit is reversed, and not saved.

post-commit: Run just after a revision has been committed and has a revision number. Any errors produced by this script are ignored by the Subversion server.

For the automated deployment process, the post-commit hook will be used to ensure that only successful commits are replicated to the testing and/or live servers.

Updating the testing environment

The post-commit hook script is given two parameters by the Subversion server: the path to the repository that's just been updated, and the revision number of the update. Since the hook script is specific to a repository, it can be customised:

Figure 1: Control flow for a commit with automatic testing environment update

The hook scripts can be in any language usable by the server, including Bash, Python or Perl; I've used Bash for the purposes of this article, so the above control flow would translate into the following post-commit hook:
Automated update of a test-server working copy #!/bin/bash REPO="$1" REV="$2" TEST_SERVER="192.168.1.55" # Update the working copy on the test server ssh -l root $TEST_SERVER -t "cd /var/www && svn up"
Note in the above example that the testing environment is accessed by ssh, which means that the root user on the testing server must know the public key of the Subversion server's user account. For example, if the Subversion repositories are accessed by WebDAV through Apache, and the Apache process is running as user nobody, the user calling the post-commit hook is nobody@SVN_SERVER, and a public/private ssh key pair must be prepared for this user and copied to the testing server.

Deployment notification

The control flow in the above example updates the testing environment every time an update is committed to the repo. What's needed next is a method of automatically pushing committed updates to the production system, using some part of the commit action as a trigger. The ideal vector for this is the commit message: the description entered by the developer as the reason for this update.

A command is available as part of the Subversion distribution to examine the properties of a repository: svnlook. This can be used to check the commit message for the latest revision, and look for a signal inserted by the developer to indicate a request for deployment.

Figure 2: Control flow for a commit with deployment request

The deployment signal can be as simple as a block of text inside the commit message: if the post-commit hook detects this block of text in the message, it will perform the deployment. As stated above, svnlook can be used to look at the commit message:
Checking the commit message: svnlook PROD_SERVER="172.16.16.1" if ( svnlook log -r $REV $REPO | grep "~~DEPLOY~~" ) then /usr/local/bin/svn-deploy $REPO $REV "root@${PROD_SERVER}:/var/www" fi
By asking for a specific revision using the -r flag, we can ensure that the revision number passed into the hook script is the one that gets checked. Even though this number should be the latest revision in the repo, it's best to make use of the revision number when it's given.

Performing a deployment

One way to deploy a Subversion repo is to simply keep a working copy as the production environment: in this situation, deployment is as easy as updating the production working copy. The disadvantage of this is that the Subversion control files and directories will be available in production; since these control files include the text base of the working copy, this exposes the backend code and database interfaces in plain text files.

Subversion provides a command targeted to producing a "clean" copy of the repository: a dump of the contents, without .svn directories littering the structure. That command is svn export:
Exporting the contents of a repo: svn export svn export -r $REV "file://$REPO" /destination/path
By asking for a specific revision, as with svnlook, we make sure that the revision passed to the hook script is the one exported. Once the export has occurred, this can be uploaded to the production server, or synchronised with rsync in whatever way is required.

Putting things together

With these components, we can put together the hook and deployment scripts:
post-commit: Hook script #!/bin/bash REPO="$1" REV="$2" TEST_SERVER="192.168.1.55" PROD_SERVER="172.16.16.1" # Update the working copy on the test server ssh -l root $TEST_SERVER -t "cd /var/www && svn up" # Check for a deployment signal if ( svnlook log -r $REV $REPO | grep "~~DEPLOY~~" ) then /usr/local/bin/svn-deploy $REPO $REV "root@${PROD_SERVER}:/var/www" fi svn-deploy: Example deployment script #!/bin/bash REPO="$1" REV="$2" TARGET="$3" # Connect to datacentre VPN sudo pppd call datacentre nodetach sudo route add -net 172.16.16.0/24 dev ppp0 # Export the repo rm -rf /tmp/export svn export -r $REV "file://$REPO" /tmp/export # Synchronise with production rsync -az -e ssh /tmp/export/* $TARGET
In this particular case, the production server is behind a VPN at the datacentre, which must be tunneled through for the deployment to occur.

Running a deployment: The developer's view

Once the post-commit hook has been put into place by a repository administrator, any developer with a checked-out copy of the repo is free to commit updates; any update will cause the testing environment copy to be updated, allowing for a common testing point.

Deployment is signalled as part of the commit message for a revision, as below:
Sample commit message with deployment - Frontend: Checkout process: CC payment handling added - Admin: Orders: Status dropdown now autosaves on change ~~DEPLOY~~
As can be seen above, the deployment code "~~DEPLOY~~" must be present in the commit message for deployment to be signalled. Any files changed as part of the commit will be saved in the new revision, before deployment; the copy to production will include all files in the repository that have changed since the last deployment.

Nice-to-haves: possible advancements

There are a few ways in which the above simple scripts could be enhanced.

Branch handling:

The scripts assume that the codebase is stored in its entirety in the repository, and only the trunk of the codebase is in the repo. If the repo is structured in a trunk-and-branch fashion, everything will be exported and deployed. By using a commit-message signal similar to the deployment signal, it should be possible to test a particular branch, by updating the trunk on the testing server and then copying the contents of the signalled branch over the top.

Changelog production:

By checking the commit messages using svnlook, it's possible to generate a Changelog for the repository: a list of revisions ordered by date, showing what changes were made to the codebase at each point. It's also possible to email the Changelog to the developers, if this is desired.

Database structure updates:

svnlook also allows the hook script to look at which files were modified with the commit. If a pre-determined SQL file is modified, this can be used to signal a change in database structure, and the changes can be applied to the production database through an ssh connection.

Each of these possible changes would introduce complexity into the automated deployment system; for now, the scripts presented here are a simple way to speed up the testing and deployment process.

Copyright Imran Nazar <tf@oopsilon.com>, 2008

Building Complex Emails with PHP

Sun, 10 Aug 2008 15:17:50 +0000

One of the first things that a newcomer to PHP learns is how to send a simple email: using the mail function to send a few paragraphs of text to an email address. This is an easy and functional way to send status messages and other small emails, but there are disadvantages:

Unformatted text:

With plain text emails, only the most rudimentary structure can be given to a message; there's no inherent way to insert a heading or a bullet-point list. If HTML were allowed in the email, for example, this formatting could be provided to the message.

No attachments:

Because the email is a single plain-text message, there is no way to provide additional documents or other files in the body of the message. A compromise is to place the files in question on the public Web, and provide links to the files, but this also compromises the security of the documents.

These problems are not just limitations of PHP's mailing routines: they are limitations of the email transport mechanism. In order to get around them, a devious scheme was standardised in the 1990s.

Multipurpose Internet Mail Extensions

The MIME standard was designed to work inside the existing email transport system; as such, it doesn't need any special connection methods, and no complicated networking is required on the part of the developer. Instead, MIME allows for multi-part messages by inserting all the parts into a plain-text email, and separating the parts by a boundary.
Structure: Basic boundaries Text outside the boundary (part #0) --BOUNDARY Part #1 --BOUNDARY Part #2 --BOUNDARY--
As can be seen in the example above, two hyphens precede all instances of the boundary, and one boundary forms the end of one part and the start of the next. The end of the last part is denoted by two hyphens after the boundary closing that part.

The basic structure outlined above allows for the separation of parts, but all it can provide is multiple plain-text messages combined into one. To allow for more complex information to be encoded, headers must be provided in association with each part.

MIME Headers

An issue arises with the boundary structure: how is the email reading client to know which lines denote the boundary for a part, and which are simply part of the message? The client can be informed of which boundary is being used by providing a header for the message in total. Headers are often used to denote the originator of the message, the software version of the sending server, and other such information which may be pertinent to the client; the MIME boundary can be added to this.
Headers: MIME message boundary From: "Imran Nazar" MIME-Version: 1.0 Content-type: multipart/mixed; boundary="BOUNDARY" Message body
The Content-type header tells the email client what kind of data is provided in the message; the text following Content-type is known as a MIME type. The concept of MIME types has been extended for use beyond email, and is now commonly provided by Web and file servers in response to a request for data.

The MIME type provided with a chunk of data can be used to identify the data in question. There are various classes of data that have MIME types associated with them, and subdefinitions for each class. The class and subclass of data are given in major/minor format; a few examples are provided below.

Major Minor Full type Data

Text documents

text plain text/plain Plain text documents

text html text/html HTML documents

text csv text/csv Comma-separated data files
Images

image jpeg image/jpeg JPEG-formatted images

image png image/png PNG-formatted images

Application-specific types

application pdf application/pdf Portable Document Format (PDF)

application zip application/zip PKZIP compressed archives

application msword application/msword MS Word documents

Types with multiple components

multipart form-data multipart/form-data Web forms with uploaded files

multipart mixed multipart/mixed Messages with many types of component

Table 1: Sample MIME types

As can be seen above, the multipart/mixed MIME type tells the email reader that each part of the message can be of a different type. Just as with the message, each part can have a header and a body. Taking this into account, a fuller MIME-compliant message can be built.
Multipart emails with headers This is part #0. --BOUNDARY Content-type: text/html This is part #1. --BOUNDARY Content-type: text/csv id,content,date "1","This is part #2.","2008-08-10" --BOUNDARY--
Attachments and Content Headers

We've seen how to put multiple types of message into one email, but this is not sufficient for attaching documents and other files to an email message. There are two major problems with inserting documents into an email:

Naming:

As can be seen above, files can be inserted into an email as a MIME part, but they are not given a filename, and are not treated as attachments. This problem is solved by inserting another header along with the part's Content-type, called Content-disposition.

Encoding:

An email message has to be readable in its entirety by any mailserver that happens across it. Because mailservers may run in many places, under many languages and character sets, a binary data file is not guaranteed to make it to the destination intact: it has to be encoded into a more basic character set, and the email client has to be told how to decode the resultant email part. This is done with a third header, called Content-transfer-encoding.

The Content-disposition attached to a message part can be one of two types: inline, meaning this type is to be shown as part of the email, and attachment, which denotes a file attached for download. If it's an attachment, a filename can be provided as a parameter to the Content-disposition header. Using this header, we can make the CSV data file in the above example into an attachment:
Multipart emails with disposition This is part #0. --BOUNDARY Content-type: text/html Content-disposition: inline This is part #1. --BOUNDARY Content-type: text/csv Content-disposition: attachment; filename="data.csv" id,content,date "1","This is part #2.","2008-08-10" --BOUNDARY--
This takes care of the first problem with attaching files to an email, but the second remains: encoding the attachment into a transferable format. There are two major encoding methods allowed by the MIME standard:

quoted-printable: a discriminate encoding, which allows standard text through without encoding, but translates non-standard characters into their hexadecimal ordinal values;

base64: an indiscriminate encoding, which takes the whole stream of data as one number, and translates it a chunk at a time, three bytes translating into a 4-character block.

The base64 encoding is generally easier to produce, since the quoted-printable encoding requires specialised translation tables. With base64, the data is broken up into 48-byte "lines", and encoded into 64-character lines before insertion into the email.

Once an encoding has been picked, it should be provided in the header for the message part, as shown in the below example.
Attaching a binary file in base64 encoding Content-type: image/gif Content-disposition: attachment; filename="text-icon.gif" Content-transfer-encoding: base64 R0lGODlhIAAgAKIEAISEhMbGxgAAAP///////wAAAAAAAAAAACH5BAEAAAQALAAA AAAgACAAAAOaSKoi08/BKeW6Cgyg+e7gJwICRmjOM6hs6q5kUF7o+rZ2vgkypq3A oHA4kPVoxCTROFv8lNAir5mxNa7ESorpi0a5yMg15QU7vVBzFZ1Un9jtaVeMRbuf 8OA9P9zTx4CAK358QH6BiIJSR2eFhnJhiZJbkI2Oi1Rvf5N1hI6ehYeKZZVrl6Jj bKB8q3luJwGxsrO0taUXnLkXCQA7
Now we have all the pieces of the puzzle: the ability to create an email message with multiple parts, and a way to encode and attach files to the email. It's just a matter of implementation.

Using PHP to send MIME-compliant emails

With the information above, implementation is no issue. The only problem presented is how to define the MIME type of an arbitrary attachment. Fortunately, UNIX systems provide the file command, which can read any file and work out the MIME type of its contents. On a Windows server, no such analogue exists, but it is possible to obtain file through Microsoft Services for Unix, or UnxUtils.

A MIME-compliant email solution is provided below, making use of this tactic and the information presented in this article.
mimemail.php: Mail-building class for PHP define('MIMEMAIL_HTML', 1); define('MIMEMAIL_ATTACH', 2); define('MIMEMAIL_TEXT', 3); class MIMEMail { private $plaintext; private $output; private $headers; private $boundary; public function __construct() { $this->output = ''; $this->headers = ''; $this->boundary = md5(microtime()); $this->plaintext = 0; } // add: Add a part to the email // Parameters: type (Constant) - MIMEMAIL_TEXT, MIMEMAIL_HTML, MIMEMAIL_ATTACH // name (String) - Contents of email part if TEXT or HTML // - Attached name of file if ATTACH // value (String) - Source name of file if ATTACH public function add($type, $name, $value='') { switch($type) { case MIMEMAIL_TEXT: $this->plaintext = (strlen($this->output))?0:1; $this->output = "{$name}\r\n" . $this->output; break; case MIMEMAIL_HTML: $this->plaintext = 0; $this->writePartHeader($type, "text/html"); $this->output .= "{$name}\r\n"; break; case MIMEMAIL_ATTACH: $this->plaintext = 0; if(is_file($value)) { // If the file exists, get its MIME type from `file` // NOTE: This will only work on systems which provide `file`: Unix, Windows/SFU $mime = trim(exec('file -bi '.escapeshellarg($value))); if($mime) $this->writePartHeader($type, $name, $mime); else $this->writePartHeader($type, $name); $b64 = base64_encode(file_get_contents($value)); // Cut up the encoded file into 64-character pieces $i = 0; while($i < strlen($b64)) { $this->output .= substr($b64, $i, 64); $this->output .= "\r\n"; $i += 64; } } break; } } // addHeader: Provide additional message headers (Cc, Bcc) public function addHeader($name, $value) { $this->headers .= "{$name}: {$value}\r\n"; } // send: Complete and send the message public function send($from, $to, $subject) { $this->endMessage($from); return mail($to, $subject, $this->output, $this->headers); } // writePartHeader: Helper function to add part headers private function writePartHeader($type, $name, $mime='application/octet-stream') { $this->output .= "--{$this->boundary}\r\n"; switch($type) { case MIMEMAIL_HTML: $this->output .= "Content-type: {$name}; charset=\"iso8859-1\"\r\n"; break; case MIMEMAIL_ATTACH: $this->output .= "Content-type: {$mime}\r\n"; $this->output .= "Content-disposition: attachment; filename=\"{$name}\"\r\n"; $this->output .= "Content-transfer-encoding: base64\r\n"; break; } $this->output .= "\r\n"; } // endMessage: Helper function to build message headers private function endMessage($from) { if(!$this->plaintext) { $this->output .= "--{$this->boundary}--\r\n"; $this->headers .= "MIME-Version: 1.0\r\n"; $this->headers .= "Content-type: multipart/mixed; boundary=\"{$this->boundary}\"\r\n"; $this->headers .= "Content-length: ".strlen($this->output)."\r\n"; } $this->headers .= "From: {$from}\r\n"; $this->headers .= "X-Mailer: MIME-Mail v0.03, 20070419\r\n\r\n"; } } Example usage of mimemail include('mimemail.php'); $m = new MIMEMail(); // Provide the message body $m->add(MIMEMAIL_TEXT, 'An example email message.'); // Attach file 'icons/txt.gif', and call it 'text-icon.gif' in the email $m->add(MIMEMAIL_ATTACH, 'text-icon.gif', '/var/www/icons/txt.gif'); // Send to the author $m->send('noreply@oopsilon.com', '"Imran Nazar" ', 'Test message');
Download the script: mimemail.php

Imran Nazar <tf@oopsilon.com>, 2008

Extended Text Mode on the Commodore 64

Sun, 03 Aug 2008 00:00:00 +0000

I sometimes wonder what it would take for a computer of 80s vintage to be a viable work terminal in today's world. After some thought, I came up with two things that an old computer would need:

Internet access:

Getting data to an old computer without the Internet is a thankless task, involving microcassette recordings and funky formats of floppy disc which are unreadable in a PC. Getting the data back off the computer in question is even more of a problem; it's simply orders of magnitude easier to connect to a standard network and transfer through that.

Display compatibility:

Computers of such vintage as the Commodore 64 and the Spectrum have a variety of display modes, but none of them put a great deal of information on the screen. To viably communicate with a computer of a different type, such as a Linux server, it's a prerequisite to extend the text mode capabilities of the computer, and get something approaching a usable terminal.

This article is intended to be part 1 of a series, in which a Commodore 64 will be set up to act as a standard Unix-compatible terminal. The first step in that ambitious program is to provide a reasonable text display on the Commodore 64.

The C64's video modes

The Commodore 64 has a display resolution of 320 by 200 pixels, a resolution which will be familiar to any PC programmer who has dealt with the VGA display modes. The C64 provides two basic types of display mode: bitmapped and tiled. Each of these has the ability to display pixels in one colour against a background of another colour. In both modes, the display is broken up logically into 8x8-pixel "tiles"; the difference is in how these are handled and how graphics are drawn in the two modes.

In tiled mode, also called "text mode", the screen has a tile-resolution of 40 wide by 25 high, and the display is built in the following manner.

Figure 1a: Display in tiled mode

The tile-address buffer, often called "Screen Memory", is a 1000-byte region of memory where each byte refers to an 8x8 block on screen. In order to get the bitmap data for the display, the video circuit uses the value in screen memory as a pointer into the tile-data buffer, called "Character Memory". When the computer is first started, this memory contains the shapes of letters and numbers which can be used to draw text on the screen; for this reason, tiled mode is often referred to as "40x25 text mode".

Tiled mode allows for each 8x8 block of pixels to have a different foreground colour; any bits in the tile which are set to "1" will be drawn in the foreground colour for that tile. Just like screen memory, a 1000-byte region is set aside for the tile-colour buffer, called "Colour Memory", which provides the foreground colours for each tile. The background is the same across the whole screen, and any "0" bits will be drawn in the global background colour.

Bitmapped mode skips the tile-addressing step in the display process, opting instead for a unified buffer of bitmap data. An 8000-byte region is set aside for the display of 320x200 bits, with blocks of 8x8 still being addressed as a tile.

Figure 1b: Drawing in bitmapped mode

Just as with tiled mode, each 8x8 block can have a different foreground colour. In the case of bitmapped mode, however, a block can also have its own background colour, which will be used instead of the global background if any "0" bits are encountered in the bitmap.

The options

So we have two options for drawing to the C64's screen. Either of these could be used for the rendering of an 80x25 text mode, but there are a few arguments against the use of tiled mode:

It's more complex:

Drawing to a bitmapped screen involves writing the appropriate bits to a piece of the bitmap buffer. In tiled mode, a tile has to be written into Character Memory, and Screen Memory has to be updated to reflect the new tile.

It's slower to work with:

Because of the above complexity of tiled mode, more memory has to be worked with to set a tile up correctly, which means it takes more time to write text out to screen.

It's not big enough:

By placing an 80x25 "extended text mode" into 40x25 tiled mode, each tile can hold two characters. In theory, there are over 65,000 combinations of two-character tiles, any of which could show up on screen; tiled mode can only deal with 256 of these combinations on screen at the same time, before some hacks have to be employed.

For these reasons, it's simpler to use bitmapped mode to draw the characters. What's required now is a readable font that can be used by the rendering system.

The font

Most terminal systems use a mono-spaced font, primarily because it makes calculations easier regarding text placement and size. This extended text mode will be no exception: in order to fit an 80x25 text screen into a 320x200 graphical display, each character must be 4x8 pixels: in other words, each of the 8x8 tiles must be cut down the middle and a character placed in each half.

What this doesn't take into account is the need to seperate characters: if the font is made up of 4x8-pixel glyphs, each character in a line of text will be joined to the next, without any seperation. What is instead needed is a pixel of seperation between characters: this means that the font will consist of 3x7-pixel glyphs in 4x8 boxes.

On such a small scale, designing a legible font is tricky: distinguishing between zero (0) and capital O is difficult at the best of times, and the difference between one (1), small L and the vertical pipe (|) can be even more of a problem. I'm not a font designer, so I opted instead to use the font glyphs from Novaterm, a terminal program for the C64.

Figure 2: 3x7 pixel font, Novaterm's "ansi81"

In order to use this font programmatically, each character has to be broken down into its constituent bits, and reconstituted as data. Because the glyphs are 4 pixels wide, the resultant data will be 4 bits wide.

The process

With a font and a bitmap mode, we can now draw text to the bitmap. Unfortunately, it's not quite as easy as writing one character to each tile, because there are only 40 tiles' worth of space across the screen. Instead, two characters have to be put inside one tile-space. This involves shifting the bitmap values for the "left" character across, and combining them with the "right" character.

Figure 3: Drawing two characters in one tile

In BASIC code, this could be represented as follows, assuming that the FONT two-dimensional array represents the 3x7 Novaterm font:
Rendering "He" in BASIC LET CH1 = 72: REM "H" LET CH2 = 101: REM "e" FOR A = 0 TO 7 OUT(A) = (FONT(CH1)(A) * 16) + FONT(CH2)(A) NEXT A
By using a "cursor" position to keep track of where on the screen tiles must be filled, it's relatively straightforward to use the technique above for rendering text two characters at a time. The problem arises when a string of text contains an odd number of characters: not only does the renderer have to fill half a tile instead of a full tile, but the next string will start halfway through the tile in question. Because of this, the rendering function becomes more complex:

Check whether we're starting halfway through a tile. If so, fill in the right half of the existing on-screen tile with the first character bitmap. If not, skip this step altogether.

The main rendering loop: For each pair of characters in the string (starting with the second character if step 1 was performed), build a tile and draw it to screen. Do this until there are either 1 or 0 characters left to render.

If there's one character left, fill in the left half of a blank tile, and draw that to screen. If there are no characters left to draw, skip this step.

The top and bottom pieces of algorithm are extensions of the main rendering loop, and won't be covered here in much detail. Instead, I'll provide an interpretation of the insides of the main loop, in pseudo-C++.
Rendering a two-character tile BYTE *bitmap; // 8000-byte bitmap to render to BYTE *font; // 2048 bytes font data, 8 bytes per char char t1, t2; // Text to render (two characters long) int X, Y; // Current cursor position // Calculate position of destination tile in the bitmap // Each tile is 8 bytes long BYTE *tile = bitmap + ((Y * 80 + X) * 8); for(int i=0; i<8; i++) { // Retrieve font data for this line of the bitmap BYTE ch1 = font[t1 * 8 + i]; BYTE ch2 = font[t2 * 8 + i]; // Calculate final tile contents tile[i] = (ch1 * 16) + ch2; }
In the case of the top and bottom parts of the algorithm, either ch1 or ch2 is not used in the final tile; otherwise, the code for these parts is as above.

The implementation

There are a few things that need to be considered when taking this algorithm to the Commodore 64, in order to cope with the restrictions of the platform.

Font data:

The algorithm outlined above takes two characters from the font data, and shifts the "left" one over by 4 bits before tacking it to the "right" character. This step can be eliminated by keeping a pre-shifted copy of the font data as a seperate buffer to the original, which means that building a cell is merely a matter of finding one character in the original font, and the other in the shifted font, then adding the two values.

Initial screen colour:

As mentioned above, each tile in bitmap mode can maintain its own foreground and background colour. The values for each tile's colours are stored in Colour Memory, one byte for each tile: the background colour code (between 0 and 15) is stored as the lower half of the byte, and the foreground code as the upper half.

For the purposes of this article, we won't be dealing with different colours of text or other attributes, so all that's required is to initialise the Colour Memory: setting all the bytes to reflect "grey on black" allows for a simple monochromatic output.

Multiplication:

The 6510 CPU used by the Commodore 64 doesn't have a multiply instruction, which means we can't simply "multiply by 8" to get a font-data position. Luckily, we can use a basic property of binary powers to perform the calculation:
Multiplication by binary powers x * 8 = x * (2 ** 3) x * 8 = x << 3
By shifting the value left, we can simulate multiplication. In this case, however, that's not quite enough. Shifting a value left pushes the left-most bits off the end of the register, discarding the higher portion of the result: we need that higher portion, so some more calculation is required:
Multiplication by binary powers into a double-width result x = 01101101b LOBYTE(x * 8) = 01101101 << 3 = [011]01101000 HIBYTE(x * 8) = [011] = 01101101 >> (8-3) LOBYTE(x * 8) = x << 3 HIBYTE(x * 8) = x >> 5
The above sample demonstrates a more general rule: a 1-byte by 1-byte multiplication will generate a 2-byte result, both parts of which can be calculated by appropriate shifting. This rule can be used by the 6510 code of the implementation.

The result

After putting these algorithms into code, something like the following will be produced:

Figure 5: Lorem ipsum on the C64

In the example above, the additional algorithms for handling newline characters have been added to the 80x25 display system, allowing the text to contain line breaks. This is simply a matter of moving the cursor down to the start of the next line when a newline character is encountered.

The system does not currently handle scrolling of the text buffer: if text is to be drawn below line 25, it will not appear on the display. Scrolling, and other control sequences including character colour, will be covered in part 2 of this series.

80x25.s: 6510 assembly source
ansi.font: Encoded font data
80x25.prg: Assembled binary, emulation-ready

Imran Nazar (tf@oopsilon.com)

The Structure Pattern in PHP

Sun, 20 Jul 2008 00:00:00 +0000

Many file formats contain data in a packed manner: a series of values which encode information, placed into the file one after another. As an example, TrueType font files encode information about the font represented in the file, such as the name of the font and how many characters are contained within the font face. As another example, Windows BMP files encode the dimensions and formatting of the image within the format.

The Need: Structures in C/C++

In most instances, the information in a file is encoded as a series of structures, grouping related information such that a programming interface can retrieve them easily. In C and C++, a special type is set aside for just such a reason: the struct.
BMP header structure, in C typedef struct { unsigned char r; /* Red */ unsigned char g; /* Green */ unsigned char b; /* Blue */ unsigned char reserved; } RGBQUAD;
The above is the C representation of one palette entry in a Windows BMP. Once this structure has been defined as a type with typedef, using it is very simple:
Using a structure, in C RGBQUAD palette[256]; fread(palette, sizeof(RGBQUAD), 256, file_handle); printf("Colour #0 is %02X%02X%02X.\n", palette[0].r, palette[0].g, palette[0].b);
It can be seen above that the data contained within a struct can be accessed in much the same way as methods can be accessed within a class, in C++ or any other object-oriented programming language. Indeed, in C++ the keywords class and struct are equivalent, and mean much the same thing.

The Problem: Structures in PHP

When using languages such as PHP, a problem arises: PHP does not support a native struct type. Further, since PHP is a loosely-typed language, it's not possible to read data from an encoded file format directly into PHP and manipulate it. This can be alleviated by using PHP's class keyword to build a class containing the structure members:
A PHP class with structure members class RGBQUAD { // Byte size of the structure const SIZE = 16; // Structure members public $r; public $g; public $b; public $reserved; // Initialise members given packed data public function __construct($data) { if($data) { list($this->r, $this->g, $this->b, $this->reserved) = unpack('v4', $data); } } }
This representation of the structure makes use of the unpack function, to take a string of binary data and load it into the class members. This can be used in a similar fashion to the C representation:
Using a structure, in PHP $palette = array(); for($i=0; $i<256; $i++) $palette[$i] = new RGBQUAD(fread($file_handle, RGBQUAD::SIZE)); printf("Colour #0 is %02X%02X%02X.\n", $palette[0]->r, $palette[0]->g, $palette[0]->b);
This approach has distinct advantages over the C struct type. In particular, the PHP implementation is a class, and can contain methods other than the simple constructor; furthermore, complex types can be contained within the structure in a way that C cannot accomplish.

Advanced Usage: TrueType File Format

An example of complex usage of the structure pattern is in parsing of the TrueType file format. A TrueType file defines a vector or bitmap font face, and contains a series of data tables: a table of names, a table of Windows-specific font information, and tables of glyph definitions. Also contained in the file structure is a header defining the table "directory", which allows a parser to find these tables within the file.

A TrueType file begins with information about the version of the TrueType specification, followed by the table directory. This can be represented in PHP in a simple manner:
TrueType file header and table directory, in PHP // File header. This contains the number of tables in the TTF. class ttfHeader { const SIZE = 12; public $majorVersion; public $minorVersion; public $tableCount; public $searchRange; public $entrySelector; public $rangeShift; public $tableDirectory; public function __construct($file) { $this->tableDirectory = array(); $header = fread($file, self::SIZE); list($t, $this->majorVersion, $this->minorVersion, $this->tableCount, $this->searchRange, $this->entrySelector, $this->rangeShift) = unpack('n*', $header); for($i=0; $i<$this->tableCount; $i++) { $this->tableDirectory[$i] = new ttfTableDirectoryEntry(fread($file, ttfTableDirectoryEntry::SIZE)); } } } // Table directory. Describes the location and size of a tables in the TTF. class ttfTableDirectoryEntry { const SIZE = 16; public $tag; public $checksum; public $offset; public $length; public function __construct($data) { list($t, $tag, $this->checksum, $this->offset, $this->length) = unpack('N*', $data); // Build a string tag from the numeric value $this->tag = chr(($tag>>24)&255). chr(($tag>>16)&255). chr(($tag>> 8)&255). chr(($tag>> 0)&255); } }
This example shows how it's possible for a structure to contain more information than the sum of its members. In the case of the ttfHeader, the constructor can pull in the structure of an entry in the table directory, and build its own array to represent the directory.
All in all, the Structure pattern makes it possible for PHP to represent structures in a similar fashion to C structs; it also allows PHP to be more versatile, and represent more complex information related to the data, which would normally have to be held outside the structure.

© Imran Nazar <tf@oopsilon.com>, 2008

Intranet DNS Resolution with BIND Views

Mon, 02 Jun 2008 12:01:48 +0000

From time to time, I write an article for the Oopsilon web site. While the article is being written, I like to check that it's displaying correctly on the web site, which involves loading it up in a browser. This is perfectly fine, except that my working computer is on the same network as the web server, and the network is internal.

Anyone coming from outside the network (outside my house, in other words) will be able to view the article without a problem: a request is made to Oopsilon, which resolves (currently) to 87.194.101.173, and a connection request is made to that IP address. My firewall translates that to the web server's internal IP of 192.168.0.1, and maintains the translation both ways.

From inside the network, it's another story. Oopsilon resolves to 87.194.101.173 as before, but when the firewall receives that connection request, it sees a connection from the internal network to the outside world, and immediately back into the network. As a result, the firewall refuses to connect the request, and I end up unable to see my article.

Standard BIND Configuration

The problem inside the network is caused by Oopsilon resolving to an external IP. This happens because BIND is configured with a simple DNS zone, as follows:
oopsilon.zone: External zone file IN SOA adhocbox.oopsilon.com. tf.oopsilon.com. ( 2008042701 ; Serial 28800 ; Refresh 14400 ; Retry 604800 ; Expire 86400 ) ; Minimum NS adhocbox.oopsilon.com. MX 10 oopsilon.com. oopsilon.com. IN A 87.194.101.173 adhocbox IN CNAME oopsilon.com. www IN CNAME oopsilon.com.
BIND is then told to use this zone for requests relating to the domain in question, as follows:
named.conf: BIND master configuration zone "oopsilon.com" IN { type master; file "oopsilon.zone"; allow-update { 88.192.91.15; }; notify yes; };
The configuration above states that any requests for the domain will be serviced by the zone file given. This includes requests from inside the LAN, which should resolve to the LAN address of the web server. This can be fixed by using not one, but two zone files.

View-Specific BIND Configuration

For external requests, the zone file above is sufficient: serving the external IP is what these clients will expect. For internal requests, a seperate zone can be used:
oopsilon.zone.int: Internal zone file IN SOA adhocbox.oopsilon.com. tf.oopsilon.com. ( 2008042701 ; Serial 28800 ; Refresh 14400 ; Retry 604800 ; Expire 86400 ) ; Minimum NS adhocbox.oopsilon.com. MX 10 oopsilon.com. oopsilon.com. IN A 192.168.0.1 adhocbox IN CNAME oopsilon.com. www IN CNAME oopsilon.com.
In order to select between the two zone files, a series of "views" can be set up in the configuration file, where each view is matched against a series of IP addresses. This is done by nesting zones inside view blocks:
named.conf: Master configuration with views view "internal" { match-clients { 192.168.0.0/24; }; zone "oopsilon.com" IN { type master; file "oopsilon.zone.int"; allow-update { none; }; notify no; }; }; view "external" { match-clients { any; }; zone "oopsilon.com" IN { type master; file "oopsilon.zone"; allow-update { 88.192.91.15; }; notify yes; }; };
In the above configuration, there are two views: internal for clients from the internal network (192.168.0.x), and external for everyone else. A view can have any number of zones inside, but in this case I only need one zone in each.

One this configuration has been put in place, its operation is automatic: anyone from the LAN will receive the LAN IP of the web server, and will be able to view the web site. Clients outside the network will receive an external IP, and also be able to see the web site. Everyone wins.

Copyright Imran Nazar <tf@oopsilon.com>, 2008

An Introduction to Compression

Thu, 22 May 2008 21:24:30 +0000

I got to thinking recently, about the difference between the GIF and JPEG image formats: why is it that some images are larger on disk when saved as GIF, while others are larger as JPEG? It turns out that the different image formats use different methods of compression.

Compression is simply the name for a set of procedures, that allow data to be packed into a smaller space, and yet allow the data to be retrieved from the compressed encoding. It's a two-way process: an input file can yield compressed output, but putting the compressed output back into the algorithm should give you a copy of the input.

Redundancy: Run-Length Encoding

The concept that makes compression possible is redundancy: the fact that most data repeats itself in some fashion. A document may use the same word many times, for example, or a picture will contain the same colour in many places. A very simple example of a redundant piece of data could be something like the following.
Redundancy: Before compression AAAAABBWWWWWWWWWPPPPQZMMMMVVV
In this case, the redundancy is obvious; repeated series of letters present themselves throughout the sample. An easy way to compress this would be to represent the repeated letters by the number of repeats, thus cutting down on the total length of the sample.
Redundancy: After compression A5B2W9P4Q1Z1M4V3
An algorithm reading this encoded version of the sample will be able to perfectly retrieve the original data: "A" five times, "B" twice, and so on. This simple algorithm is used extensively, and is called run-length encoding (RLE): writing down how long each run of characters is. An example of a widely used standard employing RLE is the venerable PCX image format.

Figure 1: Stripes (Gottschal/Schuster)

In Figure 1, there are many solid blocks of single colours. This image is 500 pixels wide, and 190 high; as a raw bitmap, using one byte to represent a pixel, this image would constitute 95kB of data. The PCX algorithm calculates run lengths for each line of pixels in the image, and then saves the run length for consecutive pixels of the same colour: in this way, the size of the image is reduced to 52kB.

Frequency: Huffman Encoding

One of the major problems with RLE is that it acts on consecutive values of data: in Figure 1, the RLE algorithm will treat each horizontal line of the image separately, whereas all the lines are the same as each other. This can be alleviated by looking at the data in the whole, and building a table of how often each value occurs in the entire data set.

Huffman encoding is a method of using this "frequency table", which denotes the frequency of occurrence for each value, and assigning each entry a code. The most frequent entries are given shorter codes, and rarer entries are relegated to receiving long codes. In computing, these codes are invariably binary codes, which can then be combined into bytes for file storage.

Using the example above, a sample Huffman encoding process may run as follows:
Huffman encoding: Before compression AAAAABBWWWWWWWWWPPPPQZMMMMVVV

Value Frequency Code

Q 1 000000

Z 1 000001

B 2 00001

V 3 0001

P 4 001

M 4 011

A 5 01

W 9 1

Table 1: Frequency and Huffman table

Huffman encoding: After compression 01 01 01 01 01 00001 00001 1 1 1 1 1 1 1 1 1 001 001 001 001 000000 000001 011 011 011 011 0001 0001 0001 UBù$€m±ˆ
Using Huffman encoding, the data has been whittled down from 29 characters to 10 bytes. This does not include the frequency and coding table, which has to be stored with the compressed data for it to make any sense; in this example, the frequency table is larger than the compressed data, but the size of the frequency table is negligible in most cases.

It is, of course, possible to combine RLE and Huffman encoding, performing RLE first and then running the compressed result through the Huffman algorithm. This produces especially good results on simple images: Figure 1 above can be compressed from a 95kB bitmap to a 4kB file by using the GIF file format, which combines RLE, Huffman encoding and other algorithms.

Perception: Lossy Encoding

The methods outlined above can be used to compress data in such a manner that it can be perfectly reproduced. Examples of this usage of compression include documents and software programs, where the loss or corruption of one value may render the file worthless.

In certain circumstances, a perfect reproduction of the data in question is not necessary: a close approximation is sufficient. Generally, these circumstances arise in multimedia applications: sounds beyond the range of human hearing need not be recorded, and subtleties of colour and gradient beyond the discernment of the human eye need not be reproduced.

A classic example of this is the MPEG Audio standard, which attempts to reduce the size of audio files by removing extraneous information regarding high-frequency sounds. The Layer-3 specification of this standard allows for various settings of removal, by which progressively more information will be removed from the audio sample.

Figure 2: Yardım Et (Mor ve Ötesi, "Dünya Yalan Söylüyor")
Encoded with MPEG Audio Layer-3

In Figure 2 above, two waveforms are superimposed: the original song waveform in red, and a highly compressed variant overlaid in blue. The sample shown above is 1.5 seconds long; as a section in the original waveform file, this sample is stored using 160kB of data. The compressed variant shown is of the same length, occupying only 48kB of space.

This has been achieved by the MPEG Audio compression algorithm, by transposing the sound into its frequency components, and removing those components beyond the range of human hearing (above approximately 20kHz). By doing this, the resultant waveform is not significantly affected, as can be seen above, and thus the compressed sound is not perceptibly different from that of the original source.

Throwing Data Away: Visual Lossy Encoding

Just as a sound file has high-frequency components that can't be discerned by the ear, a picture has high-frequency components: shades of colour that aren't different enough for the eye to distinguish, or gradients that run from black to white so quickly that there's no space for the gradient to be seen. Just as with sound, these components can be removed from a picture; this is the premise of the JPEG image format.

JPEG performs a variant of the same algorithm used in MPEG Audio, to retrieve a two-dimensional map of the frequency components contained within an image; the algorithm then proceeds to cut the components down, and recombine the image. An example of this process is shown below.

Figure 3: JPEG compression applied to a diagram

In Figure 3, an image composed of four 16x16-pixel squares is compared against the JPEG-encoded variant of the same file. A sharp change in colour or luminance is defined as an event of high visual frequency, and it is here where JPEG performs its removal. As a result, the encoded image has a lower definition to its edges, and the meeting point of the four squares is especially blurred.

The strength of JPEG is not in encoding images of sharp edges and corners, but instead in images of low visual frequency; photographs are a prime example of such.

Figure 4: JPEG compression applied to a photograph

In Figure 4, a 300x300 image of Antalya Harbour is encoded by JPEG. The original bitmap is 270kB, whereas by removal of the sharp edges and colour changes, JPEG is able to produce a 22kB image. As far as the human eye is concerned, very little has changed in the image; the features shown in the image survive intact, even if the pixels have changed somewhat.

This is the main concept behind lossy encoding: that the exact data is not as important as the information presented by the data. Using the JPEG algorithm to encode a software program would be unwise, but in cases where the information is more than the sum of the data, lossy encoding is ideal.

Perceptive Redundancy: Video Encoding

When it comes to video clips, it's possible to compress the data involved yet further, by combining the principles behind lossless and lossy encoding. The simplest and most naive method of building a video clip is to tack together consecutive pictures and refer to them as frames: the MJPEG video file format does this by treating a series of JPEG images as individual frames.

What this approach ignores is the inherent redundancy in a video clip: most of the information contained in a given frame is also in the previous frame. Only a small percentage of any particular frame is new information; by calculating where that percentage of information lies, and storing only that amount, it's possible to drastically cut down the data size of the frame.

Figure 5: Consecutive frames of video, and their difference (NASA JPL)

In Figure 5, the second frame of video shows very little change relative to the first: only in the Shuttle's exhaust plume is there significant motion. Indeed, the output of the SRBs and the sky behind the launch tower are entirely unchanged between frames. Instead of storing these portions of the image in their entirety, it's possible to store a single value: "No change".

The MPEG Video standard makes use of this inherent redundancy as a part of its compression algorithm. In theory, only the initial frame of a shot is required in full: any movement as part of the shot can be stored as a difference from the previous frame. The initial frame, known as an Intra-frame, is stored as a standard JPEG image, and the subsquent difference frames are called inter-frames, or Predicted frames.

In practice, the MPEG Video standard was designed with "streaming" in mind: the ability to begin viewing a video clip halfway through a shot. If only one Intra-frame (I-frame) is provided for the shot, it's not possible for the Predicted frames (P-frames) to interpolate their differences. For this reason, I-frames are commonly inserted at regular intervals into the video clip, regardless of whether a shot is in progress.

Figure 6: Frame sizes for a 4-second MPEG clip (BBC News)

In Figure 6 above, the video clip has I-frames inserted at 25-frame intervals, or once a second. The subsequent P-frames are each much smaller in size than the I-frame, since politicians tend not to move around very much when interviewed, thus causing a lower amount of difference between frames.

The example used for Figure 6 was a 400x224 video clip of 4 seconds. In raw bitmap form, the size of the resultant file would be 26.7MB; by using the combined techniques of lossy encoding and redundancy, the MPEG Video standard is able to reduce this to 300kB, a reduction of 99%.

Conclusion: Where To Go Lossy

The examples of lossy encoding presented in this article are employed in special circumstances: audio, video, pictures. It's only in these instances, and others related to these, that perception is the important factor in the compression process. For other compression targets, such as documents and software programs, it's important to preserve the data exactly as-is.

More advanced specialisations of compression are being developed all the time, but most common implementations of compression are based on the techniques in this article: eliminating redundant and duplicate information. Compression works best when there's a lot of redundant data, so don't try to compress a compressed file.

Imran Nazar <tf@oopsilon.com>, 2008

Whitelisting SSH Access with OpenWRT

Fri, 09 May 2008 18:29:17 +0000

Sometimes, I take the liberty of looking through the log files of my server. Invariably, there's something like the following at the bottom:
Sample SSH log file May 8 01:46:43 adhocbox sshd[28514]: Invalid user tanta from 61.100.x.x May 8 01:46:46 adhocbox sshd[28516]: Invalid user cornel from 61.100.x.x May 8 01:46:49 adhocbox sshd[28518]: Invalid user ronaldo from 61.100.x.x May 8 01:46:51 adhocbox sshd[28520]: Invalid user wave from 61.100.x.x May 8 01:46:54 adhocbox sshd[28522]: Invalid user vanilla from 61.100.x.x May 8 01:46:57 adhocbox sshd[28524]: Invalid user ice from 61.100.x.x May 8 01:47:02 adhocbox sshd[28526]: Invalid user mason from 61.100.x.x
This is repeated for a few hundred lines, and is followed a few hours later by another batch of attacks from another IP. It gets tiresome for one's bandwidth to be taken up by these attempts at logins, especially when the server only has one valid user, as is the case for my setup.

I decided to put an end to this, by implementing a whitelist: allowing specific IPs through at the firewall, and blocking all others. Fortunately, I run an OpenWRT installation on my Internet router, which provides Linux's iptables infrastructure for the manipulation of firewall rules. In this article, I'll detail how I set up my whitelist system, and how you can do the same.

What you'll need

My network consists of a Linksys WRT54G wireless router hosting the firewall, and a webserver running a distribution of Linux. For the purposes of this setup, the particulars of the webserver aren't an issue, but you will need:

OpenWRT:

My Linksys router has been reflashed with OpenWRT Whiterussian, which provides the iptables firewall, along with simplification scripts for the firewall rules. Any Linux box can act as the firewall, as long as you can pull the appropriate formatting together for the rules.

PHP with PECL-SSH2:

OpenWRT provides SSH access to the router, which allows for direct editing of the firewall configuration file. We'll be using this to our advantage, by programmatically adding IPs to the whitelist using PHP.

An external computer:

The easiest way to test the whitelist setup is by using a computer that's outside the LAN; this will allow you to check that packets are being appropriately blocked at the router, which will not necessarily be the case if you're going between computers inside the LAN.

The OpenWRT Firewall

OpenWRT provides a simple wrapper over the Linux iptables interface, using awk to rewrite the contents of a configuration file into filtering and NAT rules, which are then applied by an init script. There's also a wrapper on top of that, which constitutes the Web interface to the firewall; it is this interface that most people associate with the OpenWRT firewall.

The major issues with the Web interface are that it's relatively clunky, especially when it comes to changing the order of firewall rules; new rules are added to the bottom of the list, and moving them to the top involves an arduous series of clicks and page loads. For most purposes, direct editing of the configuration file makes more sense.

A simple configuration may contain among its rules the following:
OpenWRT's /etc/config/firewall: A sample accept:dport=113 src=192.168.0.0/24 forward:proto=tcp dport=22:192.168.0.1:22 forward:proto=tcp dport=80:192.168.0.1:80 drop
This sample script will allow the firewall to accept Ident requests from inside the LAN, forward SSH and HTTP to a server at 192.168.0.1, and drop everything else. The parameters to each rule are parsed out by the init script, and built into iptables rules.

Just as with iptables, these rules are processed in order, and the first rule to match the incoming packet is applied. By using this principle, it's simple to put together a ruleset which will act as a whitelist for SSH:
Whitelisting SSH: firewall ruleset forward:proto=tcp src=[IP #1] dport=22:192.168.0.1:22 forward:proto=tcp src=[IP #2] dport=22:192.168.0.1:22 drop:proto=tcp dport=22
In this example, any SSH packets coming from specific external IPs will be forwarded to the SSH server, and any other SSH packets will be dropped at the firewall. This is the behaviour which allows a whitelist: the next problem is how to add IPs to the list.

Adding IPs to the Whitelist

There are two ways to add addresses to this firewall ruleset. The first is to SSH into the OpenWRT router, edit the configuration file to add the appropriate rule, and then restarting the firewall service:
Manually updating the whitelist ssh -l root 192.168.0.254 vim /etc/config/firewall /etc/init.d/S45firewall restart
The problems with this method are two-fold:

Ease of use:

This manual method of updating the list doesn't constitute the most user-friendly interface to addition of IPs, and it can get tiresome to add IPs months or years after the whitelist is initially put into place.

Access:

Almost exclusively, access to the router's SSH port is only available from inside the LAN. From an external viewpoint, this availability will only exist by connecting from the accessible SSH server residing on the LAN. This in turn is governed by the whitelist, held on the router. The eponymous Catch-22 situation is an apt description of this problem.

Instead of using a manual process to update the list, it's possible to provide an externally-accessible interface to add IPs. In my case, I have a Web server (which happens to be my SSH server), so I can use a Web script to provide this interface; for the purposes of this article, PHP has been used as the language doing the work.

PHP doesn't have an interface to SSH version 2 by default: this is provided by a PECL extension named ssh2. Once this has been put in place, a variety of methods are exposed to allow for SSH connections to be made. These can be used to perform work on the OpenWRT router:
Use PECL_ssh2 to connect to the router $ssh = ssh2_connect('192.168.0.254'); if(ssh2_auth_password($ssh, 'root', '[router root passwd]')) { $stream = ssh2_shell($ssh); fwrite($stream, 'touch /tmp/newfile'); }
As an aside, if you don't like having the router's root password lying around in a PHP file, the PECL ssh2 extension also provides a public key authentication mechanism, and the SSH server on an OpenWRT installation allows addition of public keys in the same manner as OpenSSH.

Using PHP to automatically add IPs

Opening an interactive shell with ssh2_shell allows more than one command to be executed, which means we can do the file manipulation required to add an address to the list. We can combine everything, to produce the following script.
ssh.php: Add an IP to the router's whitelist if(isset($_POST['add'])): $ssh = ssh2_connect('192.168.0.254'); if(ssh2_auth_password($ssh, 'root', '[router root passwd]')) { $fp = ssh2_shell($ssh); fwrite($fp, 'echo "forward:proto=tcp src='.$_POST['ip'].' dport=22:192.168.0.1:22" > /tmp/1'."\n"); fwrite($fp, "cp /etc/config/firewall /tmp/2\n"); fwrite($fp, "cat /tmp/1 /tmp/2 > /etc/config/firewall\n"); fwrite($fp, "/bin/sh /etc/init.d/S45firewall\n"); // Provide enough time for the firewall to restart sleep(10); } echo "Done."; else: ?> endif; ?>
All that's required now is to navigate to this script, put an IP into the box, and wait 10 seconds. When this process has completed, the IP has automatically been added to the top of the firewall script, and the firewall restarted.

That should be everything you need to set up your own whitelist access list for SSH. No more brute-force attacks against your server!

Copyright Imran Nazar <tf@oopsilon.com>, 2008

Making PDO Look Like ADODB

Mon, 21 Apr 2008 20:56:38 +0000

Many developers who come to PHP from the Visual Basic set of languages wish to take their experience of database interfacing with them. In the VB world, ActiveX Data Objects (ADO) is used to interact with a database; since PHP has no native library for ADO, this can present a problem.

With this in mind, an ADO library was written for use by PHP developers many years ago, which would allow developers to directly port their existing code and interfaces to PHP. There are a few problems with this approach, as can be expected:

The ADO library is written in PHP, and because it was written some time ago, the language used within the library is specific to PHP 4. Since this version of PHP has reached the end of its support lifecycle, it's highly unlikely that any issues that arise with the ADO interface will be fixed.

Further, because the library is written in PHP, it's incredibly slow to load and execute, introducing multiple additional layers between the PHP business logic layer and the database.

With the advent of PHP 5, a native database access layer was introduced to the core language: PHP Data Objects (PDO). Since this layer interfaces directly with the PHP core, it can operate on a much more efficient level, and therefore loads and runs much more quickly. Furthermore, since it's a current extension to PHP, it is maintained and kept secure.

In the ideal case, any applications using ADO under PHP would be redeveloped to use PDO. For large applications, however, this is infeasible: some kind of layer must be introduced over PDO, to "fake" the functionality of ADO on behalf of the application. The following is just such a layer.
ADODB-PDO.php: PDO wrapper to provide an ADODB interface define('ADODB_FETCH_NUM', PDO::FETCH_NUM); define('ADODB_FETCH_ASSOC', PDO::FETCH_ASSOC); /** * Connection and query wrapper */ class ADODB_PDO { /** PDO connection to wrap */ private $_db; /** Connection information (database name is public) */ private $connector; private $dsn; private $host; private $user; private $pass; public $database; /** Debug flag, publically accessible */ public $debug; /** PDO demands fetchmodes on each resultset, so define a default */ private $fetchmode; /** Number of rows affected by the last Execute */ private $affected_rows; /** * Constructor: Initialise connector * @param connector String denoting type of database */ public function __construct($connector='mysql') { $this->connector = $connector; } /** * Connect: Establish connection to a database * @param host String * @param user String [optional] * @param pass String [optional] * @param database String [optional] */ public function Connect($host, $user='', $pass='', $database='') { $this->host = $host; $this->user = $user; $this->pass = $pass; $this->database = $database; switch($this->connector) { case 'mysql': $this->dsn = sprintf('%s:host=%s;dbname=%s', $this->connector, $this->host, $this->database); $this->_db = new PDO($this->dsn, $this->user, $this->pass); $this->_db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true); $this->fetchmode = ADODB_FETCH_ASSOC; break; } } /** * SetFetchMode: Change the fetch mode of future resultsets * @param fm Integer specified by constant */ public function SetFetchMode($fm) { $this->fetchmode = $fm; } /** * Insert_ID: Retrieve the ID of the last insert operation * @return String containing last insert ID */ public function Insert_ID() { return $this->_db->lastInsertId(); } /** * GetAll: Retrieve an array of results from a query * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return Array of results */ public function GetAll($sql, $vars=null) { $st = $this->DoQuery($sql, $vars); return $st?$st->fetchAll():false; } /** * CacheGetAll: Wrapper to emulate cached GetAll * @param timeout int count of seconds for cache expiry * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return Array of results */ public function CacheGetAll($timeout, $sql, $vars=null) { return $this->GetAll($sql, $vars); } /** * Execute: Retrieve a resultset from a query * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return ADODB_PDO_ResultSet object */ public function Execute($sql, $vars=null) { $st = $this->DoQuery($sql, $vars); $this->affected_rows = $st->rowCount(); return $st?new ADODB_PDO_ResultSet($st):false; } /** * CacheExecute: Wrapper to emulate cached Execute * @param timeout int count of seconds for cache expiry * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return ADODB_PDO_ResultSet object */ public function CacheExecute($timeout, $sql, $vars=null) { return $this->Execute($sql, $vars); } /** * Affected_Rows: Retrieve the number of rows affected by Execute * @return The number of affected rows */ public function Affected_Rows() { return $this->affected_rows; } /** * GetRow: Retrieve the first row of a query result * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return Array of data from first result */ public function GetRow($sql, $vars=null) { $st = $this->DoQuery($sql, $vars); return $st?$st->fetch():false; } /** * GetOne: Retrieve the first value in the first row of a query * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return String data of the requested value */ public function GetOne($sql, $vars=null) { $st = $this->DoQuery($sql, $vars); return $st?$st->fetchColumn():false; } /** * GetAssoc: Retrieve data from a query mapped by value of first column * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return Array of mapped data */ public function GetAssoc($sql, $vars=null) { $out = array(); $st = $this->DoQuery($sql, $vars); if($st) { if($st->columnCount() > 2) { while($row = $st->fetch()) { $rowidx = array_shift($row); $out[$rowidx] = $row; } } else if($st->columnCount == 2) { while($row = $st->fetch()) { $rowidx = array_shift($row); $out[$rowidx] = array_shift($row); } } else $out = false; } else $out = false; return $out; } /** * GetCol: Retrieve the values of the first column of a query * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return Array of column data */ public function GetCol($sql, $vars=null) { $out = array(); $st = $this->DoQuery($sql, $vars); if($st) { while($val = $st->fetchColumn()) $out[] = $val; return $out; } else return false; } /** * MetaColumns: Retrieve information about a table's columns * @param table String name of table to find out about * @return Array of ADODB_PDO_FieldData objects */ public function MetaColumns($table) { $out = array(); $st = $this->DoQuery('select * from '.$table); for($i=0; $i<$st->columnCount(); $i++) $out[] = new ADODB_PDO_FieldData($st->getColumnMeta($i)); return $out; } /** * qstr: Quote a string for use in database queries * @param in String parameter to quote * @return String quoted by database */ public function qstr($in) { return $this->_db->quote($in); } /** * quote: Quote a string for use in database queries * @param in String parameter to quote * @return String quoted by database */ public function quote($in) { return $this->_db->quote($in); } /** * DoQuery: Private helper function for Get* * @param sql String query to execute * @param vars Array of variables to bind [optional] * @return PDOStatement object of results, or false on fail */ private function DoQuery($sql, $vars=null) { $st = $this->_db->prepare($sql); $st->setFetchMode($this->fetchmode); if(!is_array($vars)) $vars = array($vars); return $st->execute($vars)?$st:false; } } /** * Resultset wrapper */ class ADODB_PDO_ResultSet { /** PDO resultset to wrap */ private $_st; /** One-time resultset information */ private $results; private $rowcount; private $cursor; /** Publically accessible row values */ public $fields; /** Public end-of-resultset flag */ public $EOF; /** * Constructor: Initialise resultset and first results * @param st PDOStatement object to wrap */ public function __construct($st) { $this->_st = $st; $this->results = $st->fetchAll(); $this->rowcount = count($this->results); $this->cursor = 0; $this->MoveNext(); } /** * RecordCount: Retrieve number of records in this RS * @return Integer number of records */ public function RecordCount() { return $this->rowcount; } /** * MoveNext: Fetch next row and check if we're at the end */ public function MoveNext() { $this->fields = $this->results[$this->cursor++]; $this->EOF = ($this->cursor == $this->rowcount) ? 1 : 0; } } /** * Table field information wrapper */ class ADODB_PDO_FieldData { public $name; public $max_length; public $type; /** * Constructor: Map PDO meta information to object field data * @param meta Array from PDOStatement::getColumnMeta */ public function __construct($meta) { $lut = array( 'LONG' => 'int', 'VAR_STRING' => 'varchar' ); $this->name = $meta['name']; $this->max_length = $meta['len']; $this->type = $lut[$meta['native_type']]; } } /** * NewADOConnection: Thin wrapper to generate a new ADODB_PDO object * @param connector String denoting type of database * @return ADODB_PDO object */ function NewADOConnection($connector) { return new ADODB_PDO($connector); }

Booting Linux from a USB Flash Drive

Sun, 07 Oct 2007 21:25:47 +0000

I like my computers to be as quiet as possible: if a computer is emitting very little sound, that makes it easier to live with. The noise a computer makes is especially important when it's in a home theatre situation, or acting as a media server; if a HTPC is itself throwing out lots of noise, that detracts from the sound of the movie being played.

The ideal case, of course, is for there to be no moving parts: things that move or spin inevitably have friction, and that causes noise. I've tried to eliminate everything that spins from my setup: I have a CPU that doesn't need a fan on the heatsink, and a fanless power supply. However, there's one thing left that is spinning, and that's the hard disk which hosts the Linux installation. Dropping that would make my system truly silent, so I started looking into how that could be done.

The only viable choice for a non-spinning medium to host the operating system is a USB Flash drive: most motherboard BIOSes have the ability to boot from a Flash device, and there's no other medium which is easily available in the sizes required. So the choice was obvious: copy the operating system from hard disk to Flash, and boot it from there.

Unfortunately, it's not quite that easy. In order for the BIOS to understand the Flash disk and boot from it, the disk must be formatted in a very simple format: specifically, good old FAT32. It's quite unacceptable for a Linux root filesystem to be based on FAT32, so a certain process has to be run through:

Compress the root filesystem down into a disk image

Boot the Linux kernel from a FAT-formatted Flash disk

Get the kernel to mount the compressed image

Continue booting from that image

What you'll need

Each step will require its own tools to get the job done, and I'll be covering the details behind each of these tools when they're used. In the meantime, here's a short list of everything that will be employed:

SquashFS tools:

We'll be using SquashFS to compress the root filesystem into an image, and the SquashFS make-filesystem tool will be needed for that.

SysLinux:

Getting Linux to boot from a FAT-formatted disk is no mean feat, unless you use SysLinux; it makes everything so much easier.

BusyBox:

In order for Linux to boot the filesystem image, it'll need to mount, cd and pivot_root, among other things. BusyBox provides all these tools in one go, as I'll explain in more detail later.

Stage 1a: Copying the existing root

The first step is to build the disk image that will eventually act as the root filesystem. This will most likely come from a system that's already running: in my case, I'll be using the media server, which runs on Gentoo. We can't compress / as-is, since other filesystems are mounted into it, so we have to remove those mounts. A simple way of doing this is to remount / somewhere else, by binding it with mount:
Bindmounting / to remove mount points # mkdir /root/bindmount # mount -o bind / /root/bindmount
The new bindmounted root is a plain representation of what's on the disk, with no additional mounts: that means there should be no files in /proc or /sys, and the bare minimum in /dev. If there are any files in these three directories, you can safely get rid of them:
Cleaning out the mount shadows # cd /root/bindmount # rm -rf proc/* dev/* sys/* # cp -a /dev/console /dev/null /dev/initctl dev
The last line above copies over the three devices which are initally needed by the boot process: the character devices for the console and null, and the FIFO used by init. Every other device is filled in by the kernel when udev automatically populates /dev.

We're still operating on the root filesystem itself; it's now safe to make a copy, to which we can make any further changes:
Copying the bindmount # mkdir /root/fscopy # cp -av /root/bindmount/* /root/fscopy
Once the copy is complete, any files in the copy's tmp and var/tmp directories can safely be cleaned out, since they're temporaries and won't be needed.

Stage 1b: Compressing the existing root

In an ideal world, you'd be able to take the filesystem copy and dump it to Flash, from which it would run. Unfortunately, there are two reasons as to why it's not that simple:

FAT32:

As I mentioned earlier, the filesystem employed by Flash drives is FAT, which isn't conducive to holding a Linux root filesystem. No problem; we simply hold an uncompressed disk image on the drive and read/write to that. That causes us to run into a second problem;

Write Limits:

Flash memory has a limit on how many times it can be written to before bits start sticking and data gets corrupted. Sure, it's a figure with six zeros, and hard drives have a similar limit, but I don't want this setup to work for a year or two; I'd like it to run for many years yet. Therefore, we need a way to store the filesystem that won't involve writing changes to Flash.

Fortunately, people more clever than I have devised a way of doing this: keep a compressed image on the Flash as a read-only starting point for the filesystem, and hold the changes in a RAM drive. I'll be setting up the RAM drive later, but the disk image can be done right now, using SquashFS.

There are two redeeming factors to SquashFS. Firstly, it has a hugely high compression ratio; the 2GB filesystem on my media server compresses down to about 500MB, which makes it ideal for the smaller USB Flash drives. Secondly, it's read in blocks (64k blocks by default), and the kernel will only cache a few blocks at a time; as a result, SquashFS doesn't take up much memory at all when it's running.

Setting up the image is a simple matter of calling the compressor:
Making the SquashFS image # cd /root/fscopy # mksquashfs * ../filesystem.squash
Stage 2a: Patching the kernel

Now, if you're lucky, you'll already have kernel support for SquashFS, and for UnionFS which we'll be using later. Those of you running Ubuntu Feisty, for example, will already have the modules, and don't need to do anything further. You can find out if you're one of the lucky people quite easily:
Checking for pre-existing modules # modprobe squashfs # modprobe unionfs # mount -o loop /root/filesystem.squash /mnt
If these commands all run fine, and you can see your squashed filesystem in /mnt, then you don't need to do any more work on the kernel. If, however, you're a masochist like myself, you'll have to compile your own kernel containing SquashFS and UnionFS. Unfortunately, these eminently useful filesystems aren't in the default kernel package, so you'll have to patch the kernel tree:
Downloading and applying patches # cd /usr/src # wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.9.tar.bz2 # tar xjf linux-2.6.22.9.tar.bz2 # wget http://switch.dl.sourceforge.net/sourceforge/squashfs/squashfs3.2-r2.tar.gz # tar xzf squashfs3.2-r2.tar.gz # cd linux-2.6.22.9 # patch -p1 < ../squashfs3.2-r2/kernel-patches/linux-2.6.20/squashfs3.2-patch # wget -O- http://download.filesystems.org/unionfs/unionfs-2.1/unionfs-2.1.6_for_2.6.22.9.diff.gz | gunzip | patch -p1
Of course, these packages are current as of October '07, and will change with time. At the moment, the above lines will complete successfully.

Once the patches have been applied successfully, you can build your kernel as normal, including all the drivers and modules as normal, but also including the two patches:
Selecting the patches File systems -> Layered filesystems -> Union file system Miscellaneous filesystems -> SquashFS 3.2 - Squashed file system support
Stage 2b: Booting the kernel

Whether you've built your own kernel or using the one shipped with your distribution, the next step in the process is to get it booting. Since the USB Flash disk has a filesystem, it should be a simple matter of copying the kernel file over and applying some magic; that magic comes in the form of SysLinux, a small package which boots a kernel from a variety of situations. In order to use SysLinux, it first has to be installed; on Gentoo Linux, the installation would run as follows:
Installing SysLinux # emerge syslinux
SysLinux provides a boot system from which the Linux kernel can be started; it does this by adding a bootsector and system file to the FAT partition on which it's installed. Performing this installation couldn't be simpler:
Placing SysLinux on the Flash drive # syslinux /dev/sda1
When SysLinux starts booting, it will look to a configuration file called syslinux.cfg, in order to find out what to do. You can make the configuration file complex, with multiple kernels and various options between which you can select at boot-time; in this case, we need the simplest possible configuration:
syslinux.cfg: Booting the kernel default kernel.img
This configuration will force SysLinux to look for a Linux kernel file, called kernel.img, and load it. Now all that's required is to copy the kernel over: depending on which route you took above, the kernel will either be inside the kernel source tree, or sitting in /boot. I compiled my kernel, so I ran the following:
Copying the kernel image # mount /dev/sda1 /mnt/flash # cp /usr/src/linux-2.6.22.9/arch/i386/boot/bzImage /mnt/flash/kernel.img
Now you'll be able to boot Linux from the Flash drive. Unfortunately, you won't get very far before you hit a kernel panic; no root device was specified, so the boot falls over. To allow the rest of the operating system to boot, a temporary root filesystem is required: that's what I'll build next.

Stage 3a: Building an initial root

Now, I could just use the SquashFS image as the root disk, and let it be. There are problems with that, though: namely that the SquashFS image is read-only, and we'll need a system that can accommodate changes to stuff like the log files. Fortunately, there is a way to allow this, which is to overlay a temporary file system on the SquashFS image, and write changes to that.

In order to set this up, we need an intermediate step between the kernel and the filesystem image, which can perform the overlay and then "pivot" over to the actual root image. The kernel has a feature which allows it to do this: the initrd, or initial root disk, which can be loaded into RAM. The initrd can contain anything you like, as long as it's relatively small: as the whole disk image is loaded into RAM, it has to leave room for the root filesystem proper.

Building the initial root disk is a less complicated affair than putting together the SquashFS image, and it starts by allocating some space for the image. I've used 8MB, since it's ample room for everything we'll need:
Initialising the initrd # dd if=/dev/zero of=/root/initrd bs=1M count=8 # mke2fs /root/initrd # mount -o loop /root/initrd /mnt/initrd
Once the initrd has been filled in with a filesystem, and mounted somewhere, we can add stuff to it. The first things that are required are a basic directory structure, and a few device nodes:
Filling out the initrd # cd /mnt/initrd # mkdir -p bin dev etc lib/modules mnt proc sbin tmp usr/bin usr/sbin var/lib # for a in {0,1,2,3,4,5,6,7}; do mkdir mnt/$a; done # for a in {tty*,console,null}; do cp `find /dev -maxdepth 1 -name "$a" -type c` dev; done # for a in {hd*,sd*,fd*}; do cp `find /dev -maxdepth 1 -name "$a" -type b` dev; done
Stage 3b: Filling in the initrd

Once the initrd has a basic structure, it needs some executable utilities in order to do anything: commands like sh, mount and cp all have to be provided. Fortunately, there's a nifty package called BusyBox which compiles all the utilities one could possibly need into one binary file. The configuration process for BusyBox is something I won't be covering in great detail, but it's very similar to the Linux kernel configuration: a system of menus is provided, from which selections can be made. Be aware that you'll have to compile BusyBox as a statically-linked binary, otherwise the initrd will require not just the executable, but the libraries on which it depends.
Compiling BusyBox # wget http://busybox.net/downloads/busybox-1.7.2.tar.bz2 # tar xjf busybox-1.7.2.tar.bz2 # cd busybox-1.7.2 # make menuconfig # make
I've provided a copy of the initrd below, for those who are less willing to run through the above process. It contains the basic directory structure, along with the device nodes and a statically compiled copy of BusyBox:

http://oopsilon.com/software/linux-initrd.gz [1.1MB]

What's missing from the initrd is a copy of the modules associated with the kernel. You can retrieve these from /lib/modules and simply copy them over to the initrd; in the example below, I'm copying the modules from the 2.6.22.9 kernel I compiled earlier:
Copying kernel modules to the initrd # cp -a /lib/modules/linux-2.6.22.9 /mnt/initrd/lib/modules
Stage 3c: Mounting the real root

The final step in the initrd is to give it a purpose: at present, it's a collection of binaries and device nodes with nothing to do. We have an initial root disk and a kernel, but no way to link the two: the kernel should load and boot the initrd. This is done by giving parameters to the kernel when it boots, and that's done from the SysLinux configuration:
syslinux.cfg: Including the initrd default kernel.img initrd=initrd.gz root=/dev/ram0 ramdisk_size=8192 rw init=/linuxrc
As can be seen in this configuration, the kernel is told to run a file called linuxrc on the initrd, as the initialisation script. I've provided a simple linuxrc on the initrd image linked above; all it does is load a shell:
linuxrc: Sample init file #!/bin/msh mount -t proc proc /proc clear exec /bin/msh
You may have guessed that this file can contain just about anything, as long as it uses commands supplied by BusyBox. We can mount the SquashFS root filesystem, overlay a temporary RAM disk on top and start up the new root, with the following script:
linuxrc: Pivoting to the real root #!/bin/msh echo Initial root disk loaded. Proceeding. # Mount the proc filesystem, and the Flash disk mount -t proc proc /proc mount /dev/sda1 /mnt/0 # Find the SquashFS image on the Flash disk, and mount it mount /mnt/0/newroot.sfs /mnt/1 # Mount a temporary filesystem, to use as the overlay mount -t tmpfs -o size=100M tmpfs /mnt/2 # Perform the overlay with UnionFS, with tmpfs as read/write # and the SquashFS as read-only mount -t unionfs -o dirs=/mnt/2=rw:/mnt/1=ro /mnt/1 /mnt/3 # Pivot to the new root cd /mnt/3 mkdir initrd pivot_root . initrd # Enter the new root, and run init exec chroot . /sbin/init /dev/console 2>&1
You may have to change the device reference for the Flash disk, depending on the kernel you use: if you boot the initrd with the simple shell-exec script I provided above, and check the output of dmesg, you should be able to see where the kernel has loaded from.

Stage 4: Finalising and testing

With the new linuxrc, the initrd is complete, and can now be unmounted and compressed:
Compressing the initrd # umount /mnt/initrd # gzip -9 /root/initrd # cp /root/initrd.gz /mnt/flash # umount /mnt/flash
And that should be that. Throw the Flash disk into a spare computer, and watch it boot: it should look just like your hard disk's boot process. If it doesn't, the cause may be one of a few things:

Kernel panic: No root or no init

Remember that in order to use initial root disks, the support must be compiled in (not just built as a module) into the kernel; furthermore, the options must be specified on the SysLinux configuration line.

The initrd shows a message, but nothing else happens

Ensure that the correct device name is being used by linuxrc for the Flash disk; if the script can't find the Flash disk, things will fall over.

init fails to read /dev/initctl

The SYSV init script, as used by most Linux distributions, uses a FIFO called /dev/initctl to communicate and change runlevels; if this node doesn't exist on the SquashFS root, init will fail.

If you have any more obscure errors, feel free to get in touch with either myself or your local Linux support channel; also, please let me know if you manage to get this setup working. The procedure above was quite smooth for me, but in the eternal clause of the technical tutorial, your mileage may vary.

Copyright Imran Nazar <tf@oopsilon.com>, 2007

ARMv4/5 Opcode Map

Thu, 04 Oct 2007 20:13:03 +0000

Update, Nov 2023: Cliff Biffle has put together a Thumb-2 opcode map for M-profile ARM cores (Google Sheet), which may be more relevant to modern interests than the ARMv4/5 opcode map; my map remains below for posterity.

The following is a full opcode map of instructions for the ARM7 and ARM9 series of CPU cores. Instructions added for ARM9 are highlighted in blue, and instructions specific to the M-extension are shown in green. The Thumb instruction set is also included, in Table 2.

Table 1. ARM Opcode Map.

Bits
27-20 Bits 7-4

0 1 2 3 4 5 6 7 8 9 A B C D E F

00 AND lli AND llr AND lri AND lrr AND ari AND arr AND rri AND rrr AND lli MUL AND lri STRH ptrm AND ari LDRD ptrm AND rri STRD ptrm

01 ANDS lli ANDS llr ANDS lri ANDS lrr ANDS ari ANDS arr ANDS rri ANDS rrr ANDS lli MULS ANDS lri LDRH ptrm ANDS ari LDRSB ptrm ANDS rri LDRSH ptrm

02 EOR lli EOR llr EOR lri EOR lrr EOR ari EOR arr EOR rri EOR rrr EOR lli MLA EOR lri STRH ptrm EOR ari LDRD ptrm EOR rri STRD ptrm

03 EORS lli EORS llr EORS lri EORS lrr EORS ari EORS arr EORS rri EORS rrr EORS lli MLAS EORS lri LDRH ptrm EORS ari LDRSB ptrm EORS rri LDRSH ptrm

04 SUB lli SUB llr SUB lri SUB lrr SUB ari SUB arr SUB rri SUB rrr SUB lli SUB lri STRH ptim SUB ari LDRD ptim SUB rri STRD ptim

05 SUBS lli SUBS llr SUBS lri SUBS lrr SUBS ari SUBS arr SUBS rri SUBS rrr SUBS lli SUBS lri LDRH ptim SUBS ari LDRSB ptim SUBS rri LDRSH ptim

06 RSB lli RSB llr RSB lri RSB lrr RSB ari RSB arr RSB rri RSB rrr RSB lli RSB lri STRH ptim RSB ari LDRD ptim RSB rri STRD ptim

07 RSBS lli RSBS llr RSBS lri RSBS lrr RSBS ari RSBS arr RSBS rri RSBS rrr RSBS lli RSBS lri LDRH ptim RSBS ari LDRSB ptim RSBS rri LDRSH ptim

08 ADD lli ADD llr ADD lri ADD lrr ADD ari ADD arr ADD rri ADD rrr ADD lli UMULL ADD lri STRH ptrp ADD ari LDRD ptrp ADD rri STRD ptrp

09 ADDS lli ADDS llr ADDS lri ADDS lrr ADDS ari ADDS arr ADDS rri ADDS rrr ADDS lli UMULLS ADDS lri LDRH ptrp ADDS ari LDRSB ptrp ADDS rri LDRSH ptrp

0A ADC lli ADC llr ADC lri ADC lrr ADC ari ADC arr ADC rri ADC rrr ADC lli UMLAL ADC lri STRH ptrp ADC ari LDRD ptrp ADC rri STRD ptrp

0B ADCS lli ADCS llr ADCS lri ADCS lrr ADCS ari ADCS arr ADCS rri ADCS rrr ADCS lli UMLALS ADCS lri LDRH ptrp ADCS ari LDRSB ptrp ADCS rri LDRSH ptrp

0C SBC lli SBC llr SBC lri SBC lrr SBC ari SBC arr SBC rri SBC rrr SBC lli SMULL SBC lri STRH ptip SBC ari LDRD ptip SBC rri STRD ptip

0D SBCS lli SBCS llr SBCS lri SBCS lrr SBCS ari SBCS arr SBCS rri SBCS rrr SBCS lli SMULLS SBCS lri LDRH ptip SBCS ari LDRSB ptip SBCS rri LDRSH ptip

0E RSC lli RSC llr RSC lri RSC lrr RSC ari RSC arr RSC rri RSC rrr RSC lli SMLAL RSC lri STRH ptip RSC ari LDRD ptip RSC rri STRD ptip

0F RSCS lli RSCS llr RSCS lri RSCS lrr RSCS ari RSCS arr RSCS rri RSCS rrr RSCS lli SMLALS RSCS lri LDRH ptip RSCS ari LDRSB ptip RSCS rri LDRSH ptip

10 MRS rc QADD SMLABB SWP SMLATB STRH ofrm SMLABT LDRD ofrm SMLATT STRD ofrm

11 TSTS lli TSTS llr TSTS lri TSTS lrr TSTS ari TSTS arr TSTS rri TSTS rrr TSTS lli TSTS lri LDRH ofrm TSTS ari LDRSB ofrm TSTS rri LDRSH ofrm

12 MSR rc BX BLX reg QSUB BKPT SMLAWB SMULWB STRH prrm SMLAWT LDRD prrm SMULWT STRD prrm

13 TEQS lli TEQS llr TEQS lri TEQS lrr TEQS ari TEQS arr TEQS rri TEQS rrr TEQS lli TEQS lri LDRH prrm TEQS ari LDRSB prrm TEQS rri LDRSH prrm

14 MRS rs QDADD SMLALBB SWPB SMLALTB STRH ofim SMLALBT LDRD ofim SMLALTT STRD ofim

15 CMPS lli CMPS llr CMPS lri CMPS lrr CMPS ari CMPS arr CMPS rri CMPS rrr CMPS lli CMPS lri LDRH ofim CMPS ari LDRSB ofim CMPS rri LDRSH ofim

16 MSR rs CLZ QDSUB SMULBB SMULTB STRH prim SMULBT LDRD prim SMULTT STRD prim

17 CMNS lli CMNS llr CMNS lri CMNS lrr CMNS ari CMNS arr CMNS rri CMNS rrr CMNS lli CMNS lri LDRH prim CMNS ari LDRSB prim CMNS rri LDRSH prim

18 ORR lli ORR llr ORR lri ORR lrr ORR ari ORR arr ORR rri ORR rrr ORR lli ORR lri STRH ofrp ORR ari LDRD ofrp ORR rri STRD ofrp

19 ORRS lli ORRS llr ORRS lri ORRS lrr ORRS ari ORRS arr ORRS rri ORRS rrr ORRS lli ORRS lri LDRH ofrp ORRS ari LDRSB ofrp ORRS rri LDRSH ofrp

1A MOV lli MOV llr MOV lri MOV lrr MOV ari MOV arr MOV rri MOV rrr MOV lli MOV lri STRH prrp MOV ari LDRD prrp MOV rri STRD prrp

1B MOVS lli MOVS llr MOVS lri MOVS lrr MOVS ari MOVS arr MOVS rri MOVS rrr MOVS lli MOVS lri LDRH prrp MOVS ari LDRSB prrp MOVS rri LDRSH prrp

1C BIC lli BIC llr BIC lri BIC lrr BIC ari BIC arr BIC rri BIC rrr BIC lli BIC lri STRH ofip BIC ari LDRD ofip BIC rri STRD ofip

1D BICS lli BICS llr BICS lri BICS lrr BICS ari BICS arr BICS rri BICS rrr BICS lli BICS lri LDRH ofip BICS ari LDRSB ofip BICS rri LDRSH ofip

1E MVN lli MVN llr MVN lri MVN lrr MVN ari MVN arr MVN rri MVN rrr MVN lli MVN lri STRH prip MVN ari LDRD prip MVN rri STRD prip

1F MVNS lli MVNS llr MVNS lri MVNS lrr MVNS ari MVNS arr MVNS rri MVNS rrr MVNS lli MVNS lri LDRH prip MVNS ari LDRSB prip MVNS rri LDRSH prip

20 AND imm

21 ANDS imm

22 EOR imm

23 EORS imm

24 SUB imm

25 SUBS imm

26 RSB imm

27 RSBS imm

28 ADD imm

29 ADDS imm

2A ADC imm

2B ADCS imm

2C SBC imm

2D SBCS imm

2E RSC imm

2F RSCS imm

30

31 TSTS imm

32 MSR ic

33 TEQS imm

34

35 CMPS imm

36 MSR is

37 CMNS imm

38 ORR imm

39 ORRS imm

3A MOV imm

3B MOVS imm

3C BIC imm

3D BICS imm

3E MVN imm

3F MVNS imm

40 STR ptim

41 LDR ptim

42 STRT ptim

43 LDRT ptim

44 STRB ptim

45 LDRB ptim

46 STRBT ptim

47 LDRBT ptim

48 STR ptip

49 LDR ptip

4A STRT ptip

4B LDRT ptip

4C STRB ptip

4D LDRB ptip

4E STRBT ptip

4F LDRBT ptip

50 STR ofim

51 LDR ofim

52 STR prim

53 LDR prim

54 STRB ofim

55 LDRB ofim

56 STRB prim

57 LDRB prim

58 STR ofip

59 LDR ofip

5A STR prip

5B LDR prip

5C STRB ofip

5D LDRB ofip

5E STRB prip

5F LDRB prip

60 STR ptrmll STR ptrmlr STR ptrmar STR ptrmrr STR ptrmll STR ptrmlr STR ptrmar STR ptrmrr

61 LDR ptrmll LDR ptrmlr LDR ptrmar LDR ptrmrr LDR ptrmll LDR ptrmlr LDR ptrmar LDR ptrmrr

62 STRT ptrmll STRT ptrmlr STRT ptrmar STRT ptrmrr STRT ptrmll STRT ptrmlr STRT ptrmar STRT ptrmrr

63 LDRT ptrmll LDRT ptrmlr LDRT ptrmar LDRT ptrmrr LDRT ptrmll LDRT ptrmlr LDRT ptrmar LDRT ptrmrr

64 STRB ptrmll STRB ptrmlr STRB ptrmar STRB ptrmrr STRB ptrmll STRB ptrmlr STRB ptrmar STRB ptrmrr

65 LDRB ptrmll LDRB ptrmlr LDRB ptrmar LDRB ptrmrr LDRB ptrmll LDRB ptrmlr LDRB ptrmar LDRB ptrmrr

66 STRBT ptrmll STRBT ptrmlr STRBT ptrmar STRBT ptrmrr STRBT ptrmll STRBT ptrmlr STRBT ptrmar STRBT ptrmrr

67 LDRBT ptrmll LDRBT ptrmlr LDRBT ptrmar LDRBT ptrmrr LDRBT ptrmll LDRBT ptrmlr LDRBT ptrmar LDRBT ptrmrr

68 STR ptrpll STR ptrplr STR ptrpar STR ptrprr STR ptrpll STR ptrplr STR ptrpar STR ptrprr

69 LDR ptrpll LDR ptrplr LDR ptrpar LDR ptrprr LDR ptrpll LDR ptrplr LDR ptrpar LDR ptrprr

6A STRT ptrpll STRT ptrplr STRT ptrpar STRT ptrprr STRT ptrpll STRT ptrplr STRT ptrpar STRT ptrprr

6B LDRT ptrpll LDRT ptrplr LDRT ptrpar LDRT ptrprr LDRT ptrpll LDRT ptrplr LDRT ptrpar LDRT ptrprr

6C STRB ptrpll STRB ptrplr STRB ptrpar STRB ptrprr STRB ptrpll STRB ptrplr STRB ptrpar STRB ptrprr

6D LDRB ptrpll LDRB ptrplr LDRB ptrpar LDRB ptrprr LDRB ptrpll LDRB ptrplr LDRB ptrpar LDRB ptrprr

6E STRBT ptrpll STRBT ptrplr STRBT ptrpar STRBT ptrprr STRBT ptrpll STRBT ptrplr STRBT ptrpar STRBT ptrprr

6F LDRBT ptrpll LDRBT ptrplr LDRBT ptrpar LDRBT ptrprr LDRBT ptrpll LDRBT ptrplr LDRBT ptrpar LDRBT ptrprr

70 STR ofrmll STR ofrmlr STR ofrmar STR ofrmrr STR ofrmll STR ofrmlr STR ofrmar STR ofrmrr

71 LDR ofrmll LDR ofrmlr LDR ofrmar LDR ofrmrr LDR ofrmll LDR ofrmlr LDR ofrmar LDR ofrmrr

72 STR prrmll STR prrmlr STR prrmar STR prrmrr STR prrmll STR prrmlr STR prrmar STR prrmrr

73 LDR prrmll LDR prrmlr LDR prrmar LDR prrmrr LDR prrmll LDR prrmlr LDR prrmar LDR prrmrr

74 STRB ofrmll STRB ofrmlr STRB ofrmar STRB ofrmrr STRB ofrmll STRB ofrmlr STRB ofrmar STRB ofrmrr

75 LDRB ofrmll LDRB ofrmlr LDRB ofrmar LDRB ofrmrr LDRB ofrmll LDRB ofrmlr LDRB ofrmar LDRB ofrmrr

76 STRB prrmll STRB prrmlr STRB prrmar STRB prrmrr STRB prrmll STRB prrmlr STRB prrmar STRB prrmrr

77 LDRB prrmll LDRB prrmlr LDRB prrmar LDRB prrmrr LDRB prrmll LDRB prrmlr LDRB prrmar LDRB prrmrr

78 STR ofrpll STR ofrplr STR ofrpar STR ofrprr STR ofrpll STR ofrplr STR ofrpar STR ofrprr

79 LDR ofrpll LDR ofrplr LDR ofrpar LDR ofrprr LDR ofrpll LDR ofrplr LDR ofrpar LDR ofrprr

7A STR prrpll STR prrplr STR prrpar STR prrprr STR prrpll STR prrplr STR prrpar STR prrprr

7B LDR prrpll LDR prrplr LDR prrpar LDR prrprr LDR prrpll LDR prrplr LDR prrpar LDR prrprr

7C STRB ofrpll STRB ofrplr STRB ofrpar STRB ofrprr STRB ofrpll STRB ofrplr STRB ofrpar STRB ofrprr

7D LDRB ofrpll LDRB ofrplr LDRB ofrpar LDRB ofrprr LDRB ofrpll LDRB ofrplr LDRB ofrpar LDRB ofrprr

7E STRB prrpll STRB prrplr STRB prrpar STRB prrprr STRB prrpll STRB prrplr STRB prrpar STRB prrprr

7F LDRB prrpll LDRB prrplr LDRB prrpar LDRB prrprr LDRB prrpll LDRB prrplr LDRB prrpar LDRB prrprr

80 STMDA

81 LDMDA

82 STMDA w

83 LDMDA w

84 STMDA u

85 LDMDA u

86 STMDA uw

87 LDMDA uw

88 STMIA

89 LDMIA

8A STMIA w

8B LDMIA w

8C STMIA u

8D LDMIA u

8E STMIA uw

8F LDMIA uw

90 STMDB

91 LDMDB

92 STMDB w

93 LDMDB w

94 STMDB u

95 LDMDB u

96 STMDB uw

97 LDMDB uw

98 STMIB

99 LDMIB

9A STMIB w

9B LDMIB w

9C STMIB u

9D LDMIB u

9E STMIB uw

9F LDMIB uw

A0 B

A1

A2

A3

A4

A5

A6

A7

A8

A9

AA

AB

AC

AD

AE

AF

B0 BL

B1

B2

B3

B4

B5

B6

B7

B8

B9

BA

BB

BC

BD

BE

BF

C0 STC ofm

C1 LDC ofm

C2 STC prm

C3 LDC prm

C4 STC ofm

C5 LDC ofm

C6 STC prm

C7 LDC prm

C8 STC ofp

C9 LDC ofp

CA STC prp

CB LDC prp

CC STC ofp

CD LDC ofp

CE STC prp

CF LDC prp

D0 STC unm

D1 LDC unm

D2 STC ptm

D3 LDC ptm

D4 STC unm

D5 LDC unm

D6 STC ptm

D7 LDC ptm

D8 STC unp

D9 LDC unp

DA STC ptp

DB LDC ptp

DC STC unp

DD LDC unp

DE STC ptp

DF LDC ptp

E0 CDP MCR CDP MCR CDP MCR CDP MCR CDP MCR CDP MCR CDP MCR CDP MCR

E1 MRC MRC MRC MRC MRC MRC MRC MRC

E2 MCR MCR MCR MCR MCR MCR MCR MCR

E3 MRC MRC MRC MRC MRC MRC MRC MRC

E4 MCR MCR MCR MCR MCR MCR MCR MCR

E5 MRC MRC MRC MRC MRC MRC MRC MRC

E6 MCR MCR MCR MCR MCR MCR MCR MCR

E7 MRC MRC MRC MRC MRC MRC MRC MRC

E8 MCR MCR MCR MCR MCR MCR MCR MCR

E9 MRC MRC MRC MRC MRC MRC MRC MRC

EA MCR MCR MCR MCR MCR MCR MCR MCR

EB MRC MRC MRC MRC MRC MRC MRC MRC

EC MCR MCR MCR MCR MCR MCR MCR MCR

ED MRC MRC MRC MRC MRC MRC MRC MRC

EE MCR MCR MCR MCR MCR MCR MCR MCR

EF MRC MRC MRC MRC MRC MRC MRC MRC

F0 SWI

F1

F2

F3

F4

F5

F6

F7

F8

F9

FA

FB

FC

FD

FE

FF

Table 2. Thumb Opcode Map.

Bits
15-12 Bits 11-8

0 1 2 3 4 5 6 7 8 9 A B C D E F

0 LSL imm LSR imm

1 ASR imm ADD reg SUB reg ADD imm3 SUB imm3

2 MOV i8r0 MOV i8r1 MOV i8r2 MOV i8r3 MOV i8r4 MOV i8r5 MOV i8r6 MOV i8r7 CMP i8r0 CMP i8r1 CMP i8r2 CMP i8r3 CMP i8r4 CMP i8r5 CMP i8r6 CMP i8r7

3 ADD i8r0 ADD i8r1 ADD i8r2 ADD i8r3 ADD i8r4 ADD i8r5 ADD i8r6 ADD i8r7 SUB i8r0 SUB i8r1 SUB i8r2 SUB i8r3 SUB i8r4 SUB i8r5 SUB i8r6 SUB i8r7

4 DP g1 DP g2 DP g3 DP g4 ADDH CMPH MOVH BX reg LDRPC r0 LDRPC r1 LDRPC r2 LDRPC r3 LDRPC r4 LDRPC r5 LDRPC r6 LDRPC r7

5 STR reg STRH reg STRB reg LDRSB reg LDR reg LDRH reg LDRB reg LDRSH reg

6 STR imm5 LDR imm5

7 STRB imm5 LDRB imm5

8 STRH imm5 LDRH imm5

9 STRSP r0 STRSP r1 STRSP r2 STRSP r3 STRSP r4 STRSP r5 STRSP r6 STRSP r7 LDRSP r0 LDRSP r1 LDRSP r2 LDRSP r3 LDRSP r4 LDRSP r5 LDRSP r6 LDRSP r7

A ADDPC r0 ADDPC r1 ADDPC r2 ADDPC r3 ADDPC r4 ADDPC r5 ADDPC r6 ADDPC r7 ADDSP r0 ADDSP r1 ADDSP r2 ADDSP r3 ADDSP r4 ADDSP r5 ADDSP r6 ADDSP r7

B ADDSP imm7 PUSH PUSH lr POP POP pc BKPT

C STMIA r0 STMIA r1 STMIA r2 STMIA r3 STMIA r4 STMIA r5 STMIA r6 STMIA r7 LDMIA r0 LDMIA r1 LDMIA r2 LDMIA r3 LDMIA r4 LDMIA r5 LDMIA r6 LDMIA r7

D BEQ BNE BCS BCC BMI BPL BVS BVC BHI BLS BGE BLT BGT BLE SWI

E B BLX off

F BL setup BL off

Table 2A. Thumb Opcode Map - Register/Register Data Processing.

Bits
9-8 Bits 7-6

0 1 2 3

0 AND EOR LSL LSR

1 ASR ADD SUB ROR

2 TST NEG CMP CMN

3 ORR MUL BIC MVN

Tetris in Vanilla JavaScript

Thu, 04 Oct 2007 19:27:49 +0000

Pause

This page demonstrates an implementation of the classic game 'Tetris', in JavaScript. The major purpose is to demonstrate how addition to and manipulation of the DOM can be performed easily: the Tetris display consists of two hundred DIVs in a 10x20 grid, which are given different class names to represent the blocks in the well.

JavaScript also makes it easy to retrieve events from the keyboard; the 'window' object provides an 'event' subsystem, which in combination with the setting of event handlers for the keyboard allows the script to capture and process keys. This implementation of Tetris uses four keys:

Left: move the current piece left one block

Right: move the current piece right one block

Up: Rotate the piece by 90 degrees

Down: Push the piece down quickly

You're free to take a look at the source, and explore the game for any bugs which may exist (I already know of a couple). Enjoy.

Running a Windows Partition in VMWare

Sat, 20 Jan 2007 21:54:02 +0000

I have my system partitioned into two: one part of the hard drive hosts a Windows XP partition, and the other runs Gentoo Linux. About a month ago, I was just about tired of having to reboot to switch between the two, so I decided to set up a VM for Windows.

There was, however, a snag to this: I wanted to use the existing Windows installation, because I'd tuned it up and installed the software I always use. I expressly didn't want a virtual disk image duplicating my Windows drive, since I didn't have the space for that. So, that was the task: running the Windows partition in a VM.

I hunted around the 'Net, and found surprisingly little information on this: the procedure I finally threw together was sourced from many disparate places. So, in one place, I've put together the steps you'll need to take in order to get a Windows partition running inside a VM.

What you'll need

You'll need a few tools in order to pull the information you need, and to run the finished VM:

VMware Player:

The easiest way to get a finished VM running; all you need to do is point the VM information file at vmplayer, and you're good to go.

VMware kernel modules for Linux:

All VMware products on Linux use kernel modules to get monitoring and networking functionality working, so you'll need those.

A text editor:

We'll be doing some hacking of configuration and VM information files, which means your favourite text editor.

GNU Parted:

In order for us to tell VMware about the hard disk, we need to know a few things about the disk; we'll also be doing a little hacking of the partition table. Parted allows us to do this from the command line, and quite easily too.

On a Gentoo Linux installation, you can get the software you need on the Linux side from the following command (on other distributions, check your associated documentation):
emerge vmware-player vmware-modules parted
Stage 1: Setting up a virtual hard disk

VMware needs a virtual disk descriptor file, telling it how the disk is set up and structured. So, dump the following into a file called WindowsXP.vmdk:
Disk description: WindowsXP.vmdk # Disk DescriptorFile version=1 CID=9428f535 parentCID=ffffffff createType="fullDevice" # Extent description RW 63 FLAT "WindowsXP.mbr" 0 RW 23579072 FLAT "/dev/hda" 63 # The Disk Data Base #DDB ddb.toolsVersion = "6530" ddb.adapterType = "ide" ddb.virtualHWVersion = "4" ddb.geometry.sectors = "63" ddb.geometry.heads = "240" ddb.geometry.cylinders = "1559"
The values highlighted in red are ones you'll need to change, depending on the characteristics of your hard disk: they describe my disk quite well.

Now you can fire up Parted against the disk you want to use. If you have Windows on one hard disk and Linux on another, use the Windows disk in the command below, otherwise just use the disk device containing the Windows partition.
Fetching disk information: Parted output # parted /dev/hda GNU Parted 1.7.1 Using /dev/hda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit s (parted) print Disk /dev/hda: 23579135s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 63s 9223199s 9223137s primary ntfs 2 9223200s 22679999s 13456800s primary ext3 boot 3 22680000s 23572079s 892080s primary linux-swap (parted) unit cyl (parted) print Disk /dev/hda: 1559cyl Sector size (logical/physical): 512B/512B BIOS cylinder,head,sector geometry: 1559,240,63. Each cylinder is 7741kB. Partition Table: msdos Number Start End Size Type File system Flags 1 0cyl 609cyl 609cyl primary ntfs 2 610cyl 1499cyl 890cyl primary ext3 boot 3 1500cyl 1558cyl 59cyl primary linux-swap
Note the unit s command, which tells Parted to print out its values in terms of disk sectors: we'll be using these values in the VMDK file. Also note the second unit command, to provide the values in cylinders; that allows us to fetch the disk geometry in CHS format.

The values in red are the ones we'll be using. But not so fast; before you plug the values in, we'll need to do some calculation. Instead of using the hard disk's standard boot sector, which allows you to boot Windows or Linux, we ideally want the VM to boot only Windows. We'll be doing that by telling the Windows partition to boot, ignoring the Linux boot menu, and then making a copy of that bootsector for the VM to use.

All that means we need two lines in the VMDK, as shown above: a line for the bootsector copy, starting at sector 0 and stretching for 63 sectors; and the rest of the hard disk, starting at sector 63. And that means a little calculation: the value in the VMDK for the size of the disk is 63 less than the actual disk.
Size of the virtual disk: Subtracting the MBR 23579135 - 63 = 23579072
Note that this is the value I've got in my VMDK, shown above.

I mentioned above that we'll be using a copy of the disk's bootsector for the VM, with Windows set to boot. We'll need to set that up first:
Setting Windows to boot: Parted output (parted) set 1 boot on (parted) print Disk /dev/hda: 1559cyl Sector size (logical/physical): 512B/512B BIOS cylinder,head,sector geometry: 1559,240,63. Each cylinder is 7741kB. Partition Table: msdos Number Start End Size Type File system Flags 1 0cyl 609cyl 609cyl primary ntfs boot 2 610cyl 1499cyl 890cyl primary ext3 3 1500cyl 1558cyl 59cyl primary linux-swap (parted) quit #
Now to make a copy of the bootsector, using the infamous dd utility:
Copying the bootsector # dd if=/dev/hda of=WindowsXP.mbr bs=512 count=63 63+0 sectors in 63+0 sectors out #
Stage 2: The VMware information file

Once the VMDK is set up, we need to tell VMware what exactly it'll be booting, and which hardware to emulate. This is done by the information file, which in this case is WindowsXP.vmx:
The information file: WindowsXP.vmx config.version = "8" virtualHW.version = "4" uuid.location = "56 4d 56 4a 7b d7 4c 30-f5 80 d6 8b c4 59 aa eb" uuid.bios = "56 4d 56 4a 7b d7 4c 30-f5 80 d6 8b c4 59 aa eb" uuid.action = "create" checkpoint.vmState = "" displayName = "Windows XP Professional" annotation = "" guestinfo.vmware.product.long = "" guestinfo.vmware.product.url = "" guestOS = "winxppro" numvcpus = "1" memsize = "128" paevm = "FALSE" sched.mem.pshare.enable = "TRUE" MemAllowAutoScaleDown = "FALSE" MemTrimRate = "-1" nvram = "WindowsXP.nvram" mks.enable3d = "FALSE" vmmouse.present = "FALSE" vmmouse.fileName = "auto detect" tools.syncTime = "TRUE" tools.remindinstall = "FALSE" isolation.tools.hgfs.disable = "FALSE" isolation.tools.dnd.disable = "FALSE" isolation.tools.copy.enable = "TRUE" isolation.tools.paste.enabled = "TRUE" gui.restricted = "FALSE" ethernet0.present = "TRUE" ethernet0.connectionType = "nat" ethernet0.addressType = "generated" ethernet0.generatedAddress = "00:0c:29:59:aa:eb" ethernet0.generatedAddressOffset = "0" usb.present = "TRUE" usb.generic.autoconnect = "TRUE" sound.present = "TRUE" sound.virtualdev = "sb16" ide0:0.present = "TRUE" ide0:0.fileName = "WindowsXP.vmdk" ide0:0.mode = "independent-persistent" ide0:0.deviceType = "rawDisk" ide0:0.redo = "" ide0:0.writeThrough = "FALSE" ide0:0.startConnected = "TRUE" ide1:0.present = "TRUE" ide1:0.fileName = "/dev/cdrom" ide1:0.deviceType = "atapi-cdrom" ide1:0.writeThrough = "FALSE" ide1:0.startConnected = "TRUE" floppy0.present = "TRUE" floppy0.fileName = "/dev/fd0" floppy0.startConnected = "TRUE" serial0.present = "FALSE" serial1.present = "FALSE" parallel0.present = "FALSE"
I've highlighted a couple of values which you might want to change: the location of the CD drive in the device tree, and the amount of memory you want to allocate to the VM.

Stage 3: Setting up Windows

We're not done yet; the Linux side of setting up the VM is running, but we now need to tell Windows that it'll be booting into a VM. The problem is rooted in the fact that the VM's hardware is different to your physical computer; thus, we need to add a hardware profile to the Windows partition.

We can also take this opportunity to test that the Windows partition boots up immediately, without a boot menu in the way. Reboot, and if the boot process doesn't run straight to Windows, you may need to tweak the partition boot settings and recreate the bootsector copy.

Once you're booted into Windows, pull open the System properties (Control Panel -> System, or My Computer (rightclick)-> Properties):

The default setup for this Hardware Profiles display is one profile, called "Default". Click on "Copy", to create a new profile, and call that one "VMware", then move it to the top of the list with the arrow buttons. You can see the settings I use in the image above, and we'll see exactly what that means for the Windows boot process in a little while.

While you have the System Properties open, pull open the Driver Signing properties, and set the value of what action Windows should take to "Ignore"; this allows any drivers to be installed automatically if devices are picked up by Windows.

This is also a good time to set up a helper script, which will install the VMware Tools: a set of drivers for the VMware emulated devices, and some services to help the Windows VM along. This could be done after the VM is set up and running, but I had issues with that, as detailed later. I've decided to put it here in stage 3, to catch the problem before it begins.

The VMware tools are provided by VMware as a CD ISO image, buried within the VMware Workstation software. It's relatively easy to find: first of all, download Workstation from VMware, or any Linux mirror (I've given a sample mirror below):

http://ftp.snt.utwente.nl/pub/os/linux/gentoo/distfiles/VMware-workstation-5.5.3-34685.tar.gz

Once you've got the file, extract its contents (you may need third-party tools to do this in Windows), and look for a file called windows.iso; this is the Tools CD image. If you feel like wasting a CD, burn the image to one, or you can mount the image as a drive using Daemon Tools or similar software.

When you can see the contents of the image, copy the files to your hard disk, in a directory called C:\VMTools or something similar. Then dump the following into a .cmd file:
VMware Tools install helper: C:\VMTools\ToolsHelper.cmd if exist C:\VMTools\ToolsHelperLock.txt msiexec -i "C:\VMTools\VMware Tools.msi" /qn if exist C:\VMTools\ToolsHelperLock.txt shutdown -r -f -t 30 del C:\VMTools\ToolsHelperLock.txt
Also, put a small message (any you like) into ToolsHelperLock.txt in the same directory. The lock file will be used by the helper script to work out if it needs to reboot. Once you've done this, add the script to your Start Menu's Startup folder, and you'll be away.

Log yourself out of Windows and reboot; that should be the last time it needs to be started physically. Boot yourself into Linux; it's time to test this thing.

Stage 4: Virtual Windows

When you fire up the VM, it should now automatically pick up the Windows partition, and begin booting it. The presence of two hardware profiles means that Windows will ask which one you wish to use:

Pick the VMware profile, and Windows should boot, into a very minimal VGA-colour mode: it doesn't have the drivers for the "VMware SVGA" yet. At this point, the helper script should kick in, install the VMware tools, and then reboot the VM. Make sure to remove the helper script from your Startup folder upon the next reboot; even though it won't run, it'll still flash a Command-Prompt window up for a short while.

Snags: Getting the network running

I mentioned earlier that I had a problem with my keyboard and mouse not being detected by Windows in the VM, thus bringing about the helper script to allow the VMware tools to install the requisite drivers. I did have another issue with the VM, and that was getting it to talk to the outside world.

Initially, I set the VM's network card to a "bridged" type, allowing it to reside on the same network as the host machine; this usually works fine. In the case of a laptop with wireless, however, it didn't so much: no communication. After some experimentation, I resorted to the NAT method: one network that the host computer sits on, and another between the host and VM. This also involves a bit of iptables trickery, so I'm putting it in as a part of this guide.

In my case, the normal wireless network resides on 192.168.1.0/24, so I decided to put the virtual network at 192.168.58.0/24. This means the host gets an address of 192.168.58.1 on the NAT network, and the VM's network connection gets a static IP of 192.168.58.2; it also means the following changes to the Linux host's configuration:
Allowing IP forwarding: /etc/sysctl.conf net.ipv4.ip_forward = 1 Initialising iptables NAT: /etc/conf.d/local.start ifconfig ra0:1 192.168.1.252 iptables -t nat -A PREROUTING -i ra0 -d 192.168.1.252 -j DNAT --to-destination 192.168.58.2 iptables -t nat -A POSTROUTING -o ra0 -s 192.168.58.2 -j SNAT --to-source 192.168.1.252 iptables -A INPUT -i ra0 -d 192.168.1.252 -p tcp -j ACCEPT iptables -A INPUT -i ra0 -d 192.168.1.252 -p udp -j ACCEPT iptables -A INPUT -i ra0 -d 192.168.1.252 -p icmp -j ACCEPT
Note how the wireless interface ra0 has had a virtual interface added, dedicated to the transfer of traffic for the virtual machine NAT. The particular configuration files you need to change may differ depending on the distribution; the changes above are from my Gentoo system.

In the End: Windows in a VM

After all's said and done, you should have something like this:

Let me know if you get it working, or if you want to shout at me about something.

Imran Nazar (tf@oopsilon.com)

Trainfuck: Networking Extensions for Brainfuck

Tue, 26 Sep 2006 15:45:23 +0000

The Trainf*ck project is an attempt to extend the capabilities of the esoteric Brainf*ck language to include viable outside interaction, including file operations and TCP networking. In this respect, the goals of the project are similar to those of Brainf*ck++; Trainf*ck differs in the implementation of these goals.

As is documented elsewhere, Brainf*ck has eight simple operations, which may act on a data buffer of a given size. Trainf*ck extends the list of operations to eighteen, while leaving the other aspects as is. The new instructions are as follows (P refers to the current address in the data space).

Instructions

File I/O

# - Open or close a file. Takes a null-terminated filename string starting at P. If called again, closes the currently open file; no parameter required for close.

; - Read a byte from file. Saves the byte to P.

: - Write a byte to file. Fetches the byte from P, and writes.

( - Rewind one byte. Modifies P: 0 if start of file was reached, 1 otherwise.

) - Move forward a byte. Modifies P: 0 if end of file was reached, 1 otherwise.

Networking

% - Connect to an address/port. Takes two parameters: big-endian IPv4 address starting at P, and big-endian TCP port number starting at P+4. If called again, closes currently open socket; no parameters required.

$ - Listen on an address/port. Takes address and port as detailed for %. If called again, closes currently open socket; no parameters required.

@ - Accepts an incoming connection. If called again, closes the currently accepted connection.

` - Receive a byte from the network stream. Saves the byte to P, or zero if connection was closed.

' - Send a byte. Fetches the byte from P, and sends.

Examples

The canonical "Hello World" example as written in Brainf*ck will run under Trainf*ck exactly as it does under Brainf*ck. An example implementation follows, which prints out "Hello World!".
Trainf*ck sample: Hello World >+++++++++[<++++++++>-]<.> +++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]<. >>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+.
A simple example demonstrating Trainf*ck's file handling capabilities is that of finding the size of a file. The following example calculates the size of the file "abcd", and outputs it as a character code value. The caveat with this simple example is that it will wrap the counted value after 255.
Trainf*ck sample: File size >++++++++++++++++[<++++++>-]< Generate backtick in 0 [>+>+>+>+<<<<-] Copy 4 times >+>++>+++>++++ Generate "abcd" <<<#<+ Open file and init counter >[<+:>)] Count up file size <.# Output file size and close
Networking is where Trainf*ck's capabilities shine: the language provides the ability to perform networking tasks easily and efficiently. The following example is an implementation of an 'echo' daemon, which listens on port 20480 and repeats any text sent to it.
Trainf*ck sample: Echo daemon >>> Address 0x00000000 >>++++++++[<++++++++++>-] Port 0x5000 <<<<<$ Listen +[ Continual loop @ Accept a connection [`'] Loop echoing @ Close connection +] Continue endless loop
Implementation

The Trainf*ck interpreter is written in C++, and has been tested working on Linux. The code has been designed with portability in mind, and as such there should be no major issues with execution on any system.

The interpreter is currently incomplete, with the functionality of keyboard input being missing. It is anticipated that this will be added at a later date.

You can obtain the interpreter and the sample files detailed above from the following address.

http://oopsilon.com/software/Trainf_ck.tar.gz

Sci-fi Shorts: Hideout

Fri, 22 Sep 2006 22:49:30 +0000
OPEN: A display of a complex graphical shape, with numbers scrolling down one side. A phone rings. ABDUL Yeah? CHRIS (phone) We gotta talk about these test runs. ABDUL Yeah, I'm in the middle of one right now. Can't we talk a little later? CHRIS Seriously, I gotta talk to you. ABDUL (sighs) Alright, where are you? EXT: A shadowed street corner, looking up the road to Abdul's apartment. Abdul and Chris are standing behind a building, at the left of screen. CHRIS I think they're onto us. ABDUL That's the second time this week, Chris. We can go to the chemist tomorrow if the pills aren't working any more. CHRIS Nah, I saw a van a few blocks down. They're gonna move in. A black VW van, with lights off, rushes down the road, headed in the direction of Abdul's apartment. ABDUL Alright. I can take care of this. You made the backups, right? CHRIS (holding up a data tape) Yeah, last night. ABDUL Put it away already, we don't want to lose all the test runs. CUT TO: Horizontal-split screen [TOP] Abdul pulls out a PDA, and commences to tapping on the screen. [BOTTOM] Green text, typed: TEXT > barctl --pressure 0.75 [TOP] CUT TO: INT: Abdul's apartment. A mess of computer boxes and stacks of paper, with wires running across the floor. PAN TO: A meter at the right side of the door, marked "Air", delineated from 0 to 2 bar, with a needle dropping to 0.75 before stopping. [BOTTOM] Green text, typed: TEXT > sendserial 0b11110000 Connection lost [TOP] Screen whites out. CUT TO: EXT: Street corner ABDUL That's all four explosive lines triggered. I've dropped the air pressure, so the place doesn't blow out immediately. As long as they don't open the door, they'll be fine. CUT TO: INT: Corridor outside Abdul's apartment. A group of six men, dressed as a police squad, is striding down the corridor, camera tracking from in front. They arrive at a particular door, and stop. OFFICER (pounding on door) Police! Open up! The police officer considers for a second, then puts the palm of his hand against the door. OFFICER Shit, it's warm. Shouldn't we get the fire service or something? CAPTAIN Fuck the fire service, we need that data. Open the damn door! The officer joins another with a door-battering ram, and slams it against the door twice. CUT TO: EXT: Street corner. A few blocks behind Abdul and Chris, an apartment block is torn by an explosion, blowing out the windows of Abdul's place. A fireball emanates from the window, before flames start spewing. A car alarm is triggered nearer Abdul and Chris. ABDUL They never learn. Abdul and Chris get into Chris's car parked a few yards off, and drive away.

Sci-fi Shorts: Thirteen

Fri, 22 Sep 2006 22:49:19 +0000

Despite John's remonstrations, the bank of computer displays kept stolidly to its text: "13 minutes remaining."

By all accounts, John had a simple enough job in the organisation: monitor the computers, as they went about their task. He had been told by the previous holder of the post that the software had an "issue" with a certain process; it tended to get "stuck", as he put it, near the end. He'd referred to it as the "13-minute gap", because the software always stuck when it displayed that figure as the remaining time to completion.

This was John's second day, and already he'd seen the software stick twice: once on the process last night, and again right now. Both times, it had taken a good span of time before the computers were moving again, and it seemed that John could do nothing to speed it up. He was a little worried that his job seemed redundant; the setup never went wrong, since it was able to cut out any computer units that went bad for whatever reason, and compensate for it.

At a wild guess, it was this compensation that the system had been doing for the last half hour, since the displays still stubbornly read "13 minutes". John decided it was time to leave the system be, and do some homework. He was enrolled on a part-time course with the local university, and this job had already given him ample time for study, even in his short time in the post. He settled down with his dense book on legal theory, and set to.

John woke up, probably a few hours later, to find the computer room much quieter. Obviously, in the intervening period taken by his nap, the task allocated to the system had been completed. One of the computers had a few sheafs of paper freshly printed in its out-tray, and he guessed he'd have to pick those up and take them to the manager. It could wait a few more hours though, John thought, and he did need more sleep.

Abdul studied the report produced by the overnights in more detail. The first time he'd read it, the result had looked a bit surprising. It seemed that the task had finally hit the right nail with the right hammer.

It had been Moassim's idea; he tended to come up with such crazy theories, most of which had no possible basis in fact. Abdul had listened to this one like most that Moassim had come out with before, with a healthy dose of speculation, but as he listened, the idea toyed with him: it seemed to gain substance, to become a possibility. So Abdul had decided to try modelling the plan, to plug the specifications into a computer and see what would happen.

The idea ran thus. The Earth's crust was a pretty robust thing, and could take small perturbations like volcano eruptions, or climate shifts, in its stride. This was only the case when these shocks were delivered in solitary, however: one large eruption or one sudden cooling was no real problem, but a prolonged series could become a problem.

That was the root of Moassim's theory: apply a large enough number of significant but relatively small shocks to the crust, and it would dissolve over a large area. Moassim saw it as the ideal strike: highly sophisticated, beyond the comprehension of the enemy. Abdul hadn't been so sure, which was why he'd decided to try it out mathematically first.

Abdul had set up a bank of computers to model the chain of detonation as per Moassim's theory, looking for specific places and times where the chain could best be maintained. Each night, a build process would run the tweaked parameters, and change them a little, heading towards the ideal point: the longest possible chain for a given amount of energy input.

Last night, the computer bank had hit that ideal point. Small amounts of antimatter, placed at the coordinates indicated by the printout and then detonated in sequence, would cause the crust of the European plate to resonate; more detonations, as detailed by the model, would mean total dissolution of the crust across the European area.

It was a vindication: the plan would work. There was just one snag: Abdul didn't know of anyone in the world who could produce antimatter in any quantity, and he was going to need a lot.

Sci-fi Shorts: Tau

Fri, 22 Sep 2006 22:49:05 +0000

It didn't look much like a laboratory.

The room was lit by fluorescent white lamps from the ceiling, a few of which were flickering occasionally, emitting small pops as they fired on and then went out. Shelves lined the walls, littered with sheafs of paper and well-thumbed books, on such dry subjects as "Electromagnetic Fields at the Subatomic Scale"; stacks of paper and books lay haphazardly at the foot of a few of the shelves, as if abandoned.

In the corners lay relics of what could be old pieces of machinery, or perhaps half-finished projects: A set of thin windows were set high in the walls, not designed for looking out upon the world; the night made them appear as black slots in walls of grey. Only one area of the room was relatively clean: a desk, up against one set of bookshelves, on which were eight identical boxes. Wires trailed away to power sockets, and more wires connected the boxes together. They seemed to be computers, since there was a ninth, smaller box with a screen and keyboard attached.

There was a small thump at the door to the room, and then a lock turned. The door opened, and a man stumbled in, evidently the owner of the property. He was dressed in clothes that had probably seen better days: a plain blue tee-shirt, fading black jeans. Indeed, he had probably seen better days, since his face told a story of tipsiness, confirmed by the can of beer he held in one hand. Blinking at thesudden fluorescent light, and flicked a switch to douse the room in shade; a dim ambience filtered down from the slitted windows, so he could at least see.

His boots cleared a path through the detritus on the floor, as he made his way to the desk; setting his can down on the desk, he pulled a chair from nearby and sat down, before hitting a key on the keyboard. The screen flashed to life, and he again shrank away before his eyes adjusted. On the screen was depicted some sort of process: a block on the left side was slowly transferring itself to the right, as if through a horizontal hourglass. A column of numbers scrolled down the far right, continuously updating with new values as the block changed in size. The man seemed pleased with what he saw, and took a sip from his can.

The phone rang. An instinctive jerk away from the source of the noise nearly toppled the chair, before he realised that the phone was in his pocket. He fished it out, and set to finding the Answer button.

"Yuh?" he blurted when he finally found it.

"Abdul, you sound like crap. Get online."

"Wha- who's that?"

"Just connect, yeah. I've got something you might wanna see."

And with that, the phone hung up. Abdul stuffed the phone back into his pocket and took another sip from his can, before hitting a few strokes on the keyboard. The visualisation vanished, and another area appeared: the screen went black, apart from lines of grey text near the bottom.

[Joined #physics-discuss] the man himself. morning ye, what, i gotta be up at 6am yeah, why are you still awake? what you been up to drink. now come on, why you want me here i found someone working on producing tau neutrinos. interested?

That made Abdul sit up. His latest project needed a cheap and easy way to produce neutrinos: sub-atomic particles that could pass through matter without affecting it, only interacting one time in a billion. From the models Abdul had been running on the computers, neutrinos would speed his process up by many hundreds of time, and those specifically of the tau type were the best of the lot. He was definitely interested.

for sure. throw me an email addr, ill get in touch tomorrow but fer now, im going bed. gotta get _some_ sleep

Abdul's friend, who went by the name of Qubic, had helped a lot with this particular project: he had set up the little cluster of computers sitting on the desk, and written the software to get them talking and working on the modelling system. Now, he'd tracked down someone whom Abdul would need if he wanted to try the model in practice, someone who could provide these particles to speed up the process. Abdul resolved to buy the man a drink sometime.

Not now, though. For the moment, he wanted sleep. He stood up, nearly overbalancing but managing to correct himself. Stumbling over to the mattress which lay on the floor across the room, he fell into it, sending a few stray papers scattering across the floor and up a couple of feet into the air before they swung back down.

Sci-fi Shorts: Microwave

Fri, 22 Sep 2006 22:48:53 +0000

The problem had always been one of efficiency: getting electrical power into the home while wasting as little of it as possible in the transfer. At the far extreme of the home itself, the simplest way had always been to throw a wire at the property, charged with an electrical current; no-one had really found an easier way of getting electricity to the final usage points. There were inevitable losses involved in transfer over wires, but that wasn't where the major efficiency gains were to be made.

The way it had been before was this: a patchwork of locally-administered generation plants, perhaps a few serving each city, mostly using methods which hadn't changed for hundreds of years. By far the most common had been the heating of water in order to spin magnets, which had compound problems: use of a dirty fuel to heat the water, resulting in massive pollution, and the inherent inefficiency of the process. People in the power business once talked about 30% as if it was a good figure.

Of course, major problems arose if a city came close to running out of power. Because each of these local generation plants was fixed, and it was hugely expensive to build another one, the residents often faced such concepts as the rolling blackout instead of having a reliable supply. Such an idea sounded like a ridiculous way to run a city.

It took the invention of two seemingly seperate technologies to change the status quo. The first was practicable nuclear fusion: the creation of a little star in order to feed off its output. The Earth as a whole seemed to get along just fine with fusion, lapping up the rays of the Sun; if the planet could do it, the human race could adapt the technique.

The final race to build a viable fusion plant had been between China and the United States, but the breakthrough came from neither quarter, but from a research laboratory in France. The lab had been approached by both sides, but refused to give the plant away in exclusivity; instead, the technology was released for the world to use.

There was just one problem: size. The fusion plant was incredibly large, and no city could viably set aside such a huge chunk of land to house what was essentially a giant metal sphere. It took another invention before that issue could be alleviated.

It was called the HCD by the research team who came up with it: the High-power Collimating Diffractor. Its original target was satellite TV, where large amounts of power were wasted by spreading a signal over spaces which would never have a reception dish. Instead, the HCD split the signal into focussed beams, which could then be directed every which way, towards a specific dish on the ground or to another HCD for more refined splitting.

The real breakthrough hadn't come until Rihanna Johnson had her Eureka moment: generating radio waves from a fusion plant, and splitting with a HCD. The generation of radio from fusion was a simple enough matter, and had been done before; the plant gave out incredibly intense light, which could be shifted down to radio using technologies 50 years old. That wasn't the main thrust of the idea, though; the word which made the world sit up and take notice. That word was: space.

If fusion plants were so massively large that there was no way to house them on land, they could be hoisted into space instead. From there, they could generate high-powered microwaves, split into millions of fractions by a network of HCDs in orbit around the Earth, which could then beam the power down to reception dishes in every town and village.

It was a radical idea, not least because it meant abolishing the local generation infrastructure that had been painstakingly put into place over hundreds of years. The item which eventually forced the issue was, of course, cost: the price of all fossil fuels was steadily rising, and at some point in the late 21st century, all the accountants had worked out at the same time that it would in fact be cheaper to set up the fusion network than to keep the old plants running.

That was the gist of the history lesson Ryan gave in his unofficial capacity as tour guide for Fusion Pacific. He was a microwave researcher by profession, but most people up here had to take multiple jobs, simply due to a lack of personnel; it fell to him to show the tourists around.

Hawaii was the tethering point for Fusion Pacific: a huge cable, stronger than diamond, attached the land mass of the Earth to the giant ball of the power plant thousands of miles above. The space elevator had already been in construction when the fusion network had been floated as an idea; since the original plan had been to simply leave the end of the cable free in space, the fusion plant was deemed to add only a small percentage to the total mass of the system were it coupled to the endpoint.

There were two other stations in the fusion network: America, attached somewhere in the Amazon, and Eurasia which was fixed to an island off the coast of India. These had been the original three elevators, and three fusion plants was more than enough for all the world's needs.

The tour normally consisted of a trip around the plant, starting at Earth side and working around to star-side while Ryan explained the history of the network. While working back to Earth-side, the tourists could examine output graphs from the fusion plant if they so desired, or fiddle with a sample light-frequency HCD that Ryan had put together. Instead of splitting microwaves, it split red light into hundreds of thin beams, programatically directed at will; it made for interesting lighting, if nothing else.

Sci-fi Shorts: Sand

Fri, 22 Sep 2006 22:48:41 +0000

The cold seeps through him as he sits on the sand, making a slow march from his back down through his legs, and upwards towards his head. For the time being at least, he is frontally protected from the insidious penetration of the cold by a small fire, rapidly expending the little wood and few green branches he had managed to gather.
Winter wasn't supposed to be this cold, he was sure.
As the flames struggle upward before him, fighting a stiff breeze which threatens to drown everything in cold, he thinks about what happened to bring his life to this point. Of course, the bomb had been the start of it; as it had so abruptly ended so many lives, so it had ended any chance he had at life. Little Steph, one of the things that had been right about this world, his loving daughter: vaporised, along with the rest of Seattle, when the CAM warhead had struck all that time ago.
His brain tells him it has been but three weeks since that day when the War began, but every other part of him feels aged; he had lost any part of his mind that felt anything, otherwise he knows that would feel the pain too. He had been out of town that day, some kind of business at the Portland branch; as he sat at his desk facing the window, a sudden blinding flash had him covering his eyes for a moment. Every piece of electronic equipment in the office had been rendered dead by the flash, and a few people knew what that might mean.
Compressed anti-matter. Those bastards across the Atlantic had perfected their new weapon, and felt that somewhere in the States was a good place to test it.
He knew immediately that he'd never be able to get back into the state, but he tried anyway; his car wouldn't let him in (probably the electronics in there had gone too), so he ended up jumping on a bus. He was promptly deposited, about an hour later, at the state line, stopped by a police cordon. That confirmed it; nothing else would have brought out such a response. It could only be a CAM bomb.
So, that was it, then. The War with the Kingdom had started; the madman Thompson had decided it was time to drop the bomb that would end the world. The States would retaliate, of course; leaked documents had been splashed all over the news a few months ago about a secret research project into some new compressed form of anti-matter. "CAM", they called it, and everyone knew that the President was just insane enough to use it in a pre-emptive strike on the Kingdom. Looks like they had a project into this CAM stuff, as well.
And indeed, the bombs had flown across the Atlantic, vaporising Manchester and Leeds; the reply had been to annihilate much of the eastern seaboard, and so the two mad leaders successively ordered the destruction of each other's demesnes. Eventually, probably because the CAM had run out, the missiles stopped flying, and the smoking ruins of two countries were all that remained.
After about two weeks, he had finally been able to walk and hitch back to where Seattle had been. Those documents had been right about the power of CAM; nothing was left. Buildings, trees, people: in their place, nothing more than sand. The shop where Steph had worked was gone, along with the block and the streets around it; there wasn't even a strip of tarmac until a few miles out of downtown.
He didn't know why he'd come back. Perhaps he had some small, insane hope of seeing Steph again; deep in his mind, he knew that Steph was gone, but at that time he'd never admit to that. He had scavenged the suburbs for a week, seeing a few other people doing the same in that time, but he had spent most of that time alone.
The fire had burned down while he was lost in thinking. Now, a small pile of smouldering ashes sits in front of him; an occasional flicker of flame spurts up when an unburned piece of wood gathers enough heat to ignite. It's time to move on, find a warmer place to get some sleep, perhaps some food.
Sand. All that was left of his home.

Sci-fi Shorts: A Change of Clothing

Fri, 22 Sep 2006 22:43:10 +0000

He never thought there would be so much blood.

Of course, he had been working towards this night. He had been through many a simulation preparing, each one a virtual replica of the experiences and emotions that would pass through his mind; each one rendered by a network of computers, specifically ordered by the Organisation to reproduce the feel of the moment down to the smallest detail.

Whenever he would step into the immersion tank to enter the simulation, and don the mask which would feed his senses with virtual information, another world would flood into his brain: sights, sounds, touches all provided by the tank, on behalf of the simulation network. And it was a perfect world in many ways.

The chase would always be reproduced perfectly. The pretty young thing he would find in a deep level of a car park perhaps, or maybe in a back alley, near-deserted in the small hours of the night. Her eyes would widen in fear, the black pupils expanding as her flight response took hold, and then she would run. And he would follow, purposefully and without excess haste, for he always knew he would catch his prey before long.

The simulated victim would twist and turn through the streets of a deserted city, seeking a way to escape her inevitably approaching predator, but always he would be there: just behind, waiting for the slip to happen. And it would happen. She would stumble and fall, maybe, or take a bad turning and face a blank wall, and then the prey would be trapped; the implacable hunter on one side, the immovable stone on the other.

Tonight, he had been judged by the lead committee of the Organisation to be ready for a live chase. He had been transposed to a particularly run-down inner suburb of the city, its glory days long since blown into the winds by time and changing fashion. There he had seen his prey: a shapely young one, perhaps nineteen, with auburn hair spilling down towards her waist. She had sensed him somehow, turning back and showing those same wide eyes, dark pupils bordered by hazel; and then she had run.

The chase had been especially satisfying, lasting just long enough for him to be aroused, yet not dragging on until he would lose the urgent need to catch his prey. It had also been convoluted, mapping out almost every alley and street of that part of the city; he was sure that this one knew the area well, and was confident of shaking him off. At least to begin with.

But he knew that the chase would end in his favour, and towards the end she seemed to sense it too, her energy flagging, reserves failing as her body finally gave in to the panic rushing through her. And as he grabbed her by that length of auburn, a surge unlike any that had happened through the simulations passed through him. He knew that this was real.

The Organisation had trained him not to be one of those animals dispensing rape and murder on the world; the committee felt there were enough of those rabid dogs without their contribution. Instead, he was one of the artists: his speciality was the infliction of pain, delicate ribbons of flesh being cut slowly from his prey with surgical precision, as she writhed beneath.

At first, he had been too quick to dispense the pain, his victim lapsing into unconciousness almost immediately. But through the immersion tank, he had learned where to provide pressure, when to cut, to keep her hovering just concious, but still able to feel every ounce of the hurt he was working to inflict. And now his initiation had come, his arts being worked on a live subject for the first time, and he knew that this performance would be judged favourably by the committee.

But incongrously, there was always one thing the simulations lacked, and that was the flow of blood. The technicians said it was a simple problem of physics; the processing power to calculate the flow of liquids on such a precise scale simply wasn't there. How that could be, when the system gave such accurate impressions of the fear in his prey's eyes, he'd never be able to understand, but that was the simple fact of the matter.

And so it was that he was surprised by the sheer amount of blood released by his art tonight. It covered him and the stone floor of the final alley of the chase; it flowed freely from the limp body before him, which all the time was discharging more out of itself. The very air was tinged with salt, and he could taste metal in his breath if he opened his mouth. He was glad he had taken notice of his trainer's final remark before this night.

Take a change of clothing.

Collapsible Nested Lists in Vanilla JavaScript

Fri, 22 Sep 2006 21:40:53 +0000

We all know that a tree, like the one seen in Windows' File Explorer, is nothing more than a nested list. But is it possible to code a tree up in HTML/CSS as a nested list?

Let's start off with the list. This is a snippet of a standard file tree, organised in "folders".
Graphics gpu.h: Function prototypes gpu.c: Graphic output implementation Debug Output dbgout.h: Output prototypes dbgout.c: Output fixed-width font drawing font5x7.h: Fixed-width font definitions
And this is what we get from that code.

Graphics

gpu.h: Function prototypes

gpu.c: Graphic output implementation

Debug Output

dbgout.h: Output prototypes

dbgout.c: Output fixed-width font drawing

font5x7.h: Fixed-width font definitions

Having a tree means that each branching node can expand or collapse, to show or hide the elements of the tree within it. The showing and the hiding isn't so difficult; the display property in CSS allows us to do this pretty quickly, if we define two classes:
ul.hide { display: none; } ul.show { display: block; }
Of course, it's not quite that simple. You can't change the state of a UL very easily (clicking on it won't do); but you can change the state of an LI. So we move the two classes to the enclosing LI:
li.hide ul { display: none; } li.show ul { display: block; }
So, we have the CSS for hiding the tree. But how do we switch states? How can we show and hide the nodes at will? That's where the DOM comes in. If we put the description of the tree item ("Debug Output" for example") in an active element (DIV or A maybe), we can attach DOM events to it.

I've decided not to use A, because an anchor requires a href, and using a link of # will clutter up your browser's History facility. So, let's use a DIV.

What do we want to happen when we click the DIV? Basically, just flip the state of the parent LI, such that the ULs underneath are visible.
Graphics gpu.h: Function prototypes gpu.c: Graphic output implementation Debug Output dbgout.h: Output prototypes dbgout.c: Output fixed-width font drawing font5x7.h: Fixed-width font definitions
So when you click the "Graphics" DIV, toggle() runs and flips the top LI from hide to show. And of course, if you click it again, it flips back to hide. We'll need some JavaScript to do this; fortunately, JS gives us the ternary operator, where we can select two options based on a condition.
function toggle(x) { x.className = (x.className=='show') ? 'hide' : 'show'; }
What this means is: "If the className is show, set it to hide, otherwise [ie. if it's not show] set it to show". Since there're only two possibilities for the class name, you can see that this toggles between the two.

Just before we get to an actual working example, you should remember that you'll have to define CSS for each level of menu that we go down, since the properties won't inherit between ULs if there's an LI in the way (which there always is).

So now, we can put it all together, and come up with a simple tree that is collapsible/expandable with a bit of DOM fiddling.
Graphics gpu.h: Function prototypes gpu.c: Graphic output implementation Debug Output dbgout.h: Output prototypes dbgout.c: Output fixed-width font drawing font5x7.h: Fixed-width font definitions
Here, I've just added some styling to the text DIV, which can change along with the parent LI state just as the UL does. Again, the inheritance of properties will be lost between levels, so just put in an extra line for each level down.

Now we have just one problem. As you can see, the page loads with the Graphics item collapsed. What if you don't have JS running? Click on the DIV and nothing happens; you can't get to the list underneath! Obviously a problem. The way to alleviate this is to have everything expanded by default instead of collapsed; if you need to, use an onload tree collapse so that the tree will collapse if you run JS, and stay expanded if you don't.
function treeCollapse(){ var list = document.getElementById('yourtree').getElementsByTagName('li'); for(var i=0;i This'll just get a list of all the LIs in the tree, and set them to class hide. So, that's how to make a nested list into an expandable tree.

An Introduction to Bitwise Operators

Fri, 22 Sep 2006 21:39:57 +0000

Computers work with binary numbers. The binary system allows for some cool manipulations to be performed on the per-digit level; the problem is that relatively few people learn about these operations, since they seem at first glance to be quite complicated. This document will show that they're not, really.

For this quick run-through, I'll be assuming you know what the binary numbering system is, and how it works; furthermore, I'll assume a little familiarity with working in binary. Everyone who uses a computer, whether it be to connect to the Internet, for video games, or simply to edit a Word document, will have experienced the binary numbering system, as this is how all computers function internally. However, to make binary manipulations, you will need a slightly more advanced idea of how this system operates. I'll also be presenting any code examples using the syntax of C and its syntactical derivatives (C++, PHP, Java and the like). Don't worry if you don't know the C syntax for the operators; I'll be putting a small table at the end of the document.

AND: Putting on the mask

The first operation to look at is called AND. It's called that because that's exactly what it does: take two inputs, and only return any output if both input 1 AND input 2 are on.

in1 in2 in1 AND in2

0 0 0

0 1 0

1 0 0

1 1 1

That table is an example of a truth table; it plots out all the possible combinations of inputs, and what the operator will do with them. As you can see, the AND operator only returns 1 if both inputs are 1. But how does that help us in the real world, of numbers bigger than one bit?

Using AND to mask values

The major thing that can be done with AND is masking: only using the part of a number that you want to use. For example, let's say you have a 32-bit variable, and you're incrementing it in a loop. But you want to wrap the value after 255, back to 0. A simple case of using an if statement, you may think. But think again.
AND: Masking a 32-bit value while(1) { i++; // How we'd do this with an if clause // if(i > 255) i = 0; // How to do it with AND i = i & 255; }
What's happening here? Let's have a look at a normal case first; say i is at a value of 77. The AND operation looks like this:
AND: Masking at i=72 00000000 00000000 00000000 01001101 00000000 00000000 00000000 11111111 [AND] ----------------------------------- 00000000 00000000 00000000 01001101
The AND "mask" is essentially passing through the low 8 bits of i into the result. In this case, 77 fits fine into 8 bits, so no change. What happens at the borderline case: when 255 becomes 256?
AND: Masking at i=256 00000000 00000000 00000001 00000000 00000000 00000000 00000000 11111111 [AND] ----------------------------------- 00000000 00000000 00000000 00000000
As stated above, the low 8 bits of i are being passed through. That's 0. The rest of the value, including the '256' bit, is essentially ignored by the mask, meaning the value automagically wraps from 255 to 0. Quite useful, you'll admit.

AND's role in IP addressing

Something else you can do with AND is to clear a certain portion of a value. For example, you want to check the network that an IP address lives on. An IP address is just another 32-bit number, and the subnet mask that accompanies it is exactly that: a mask, that's applied to the IP using the AND operator, to find the subnet for that IP address.

Let's take my setup at home. I have a few computers at home, and they all have addresses in a private IP range. One of those computers is 172.16.55.37, with a subnet mask of 255.255.255.224; how does the router know which network I'm coming from?
AND: What subnet are you on? IP address: 10101100.00010000.00110111.00100101 Snet Mask: 11111111.11111111.11111111.11100000 AND: ----------------------------------- Subnet: 10101100.00010000.00110111.00100000 [172.16.55.32]
So when the router sees a packet destined for 172.16.55.37, it applies this AND operation, works out that the network is in its route table, and forwards the packet. In other words, the Internet wouldn't work without the AND operator.

AND used to clear a bit

A final example for AND: When you want to make a Windows application, you generally want to display a window, and windows can have various styles associated with them. The most commonly used style is WS_OVERLAPPEDWINDOW, which is just a number given an easier-to-read name: it combines various styles which tell Windows to provide a caption, system menu, minimise and maximise buttons.

But what if you don't want a minimise button? You can build the style you need by taking the normal one, and cutting out the value for WS_MINIMIZEBOX. The style values were set with just this idea in mind: simple styles are binary power values, and complicated styles combine them together.

So let's give that a go: a window with no minimise button.
AND: Clearing a bit OVERLAPPEDWINDOW: 00000000 11001111 00000000 00000000 MINIMIZEBOX: (00000000 00000010 00000000 00000000) Clear mask: 11111111 11111101 11111111 11111111 AND: ----------------------------------- Overall style: 00000000 11001101 00000000 00000000
If the new value is passed to Windows, you'll get a shiny new window with no minimise button. To generate the clear mask, you can use the NOT operator, which we'll be coming to later.

OR: Setting bits

The second operator we'll look into is called OR. And as you can guess, it's called OR for a reason: it takes two inputs, and gives an output if either one or the other, or both, is set. Here's another of those truth tables.

in1 in2 in1 OR in2

0 0 0

0 1 1

1 0 1

1 1 1

Note how it doesn't matter whether in1 is on or off; if in2 is on, the result is on. And, of course, the inverse applies. So, how can that be used in the real world?

The major application of OR is in setting bits; filling a portion of a value with 1's. Let's take the Windows styles from above: you may want not just a normal window, but a window which can handle vertical scrolling of its contents. Luckily, it's simple to define such a window: just tack the WS_OVERLAPPEDWINDOW default style and WS_VSCROLL together, thus.
OR: Filling in a value OVERLAPPEDWINDOW: 00000000 11001111 00000000 00000000 VSCROLL: 00000000 00100000 00000000 00000000 OR together: ----------------------------------- New window style: 00000000 11101111 00000000 00000000
By using AND and OR in this manner, it's quite easy to build up exactly the style of window you're looking for.

XOR: OR with a twist

OR is the operation that allows you to set a bit if either or both of two inputs is set. But what if you only want to check either, and not both? There is an operator for that, and it's called the Exclusive OR, XOR for short. Another truth table for you:

in1 in2 in1 XOR in2

0 0 0

0 1 1

1 0 1

1 1 0

Now, it might seem a bit esoteric and theoretical, to have an operator that only sets output if either input is set, and not both. However, the XOR operation does come in useful.

XOR to clear a value

If you have a variable or a CPU register, and you want to clear it to 0, you don't care about what's inside; those contents will be obliterated anyway. Of course, you can simply move "0" into that variable, but in some cases that might be a problem. Instead, you can use XOR.
XOR: Clearing to zero Random value: 11001001 01001110 11010010 11110001 Value again: 11001001 01001110 11010010 11110001 XOR: ----------------------------------- Output: 00000000 00000000 00000000 00000000
For every one of the 32 bits in that example, either the top or the bottom line of the truth table matched, which means the output was 0 in both cases. The end result is, of course, that all the bits of the value turn out as 0 after the XOR operation. Now, why would one want to perform such an operation? Take a look at this example, in Intel assembly.
XOR: Assembly usage mov eax, 0 ; Assembles to 5 bytes xor eax, eax ; Assembles to 1 byte
If you're at a premium for space, it's obvious which of the two you'd pick; instead of wasting 4 bytes of space, a bitwise operator can do the same job.

XOR to flip a bit

The other major application of XOR is to flip a bit, or range of bits, within a value. If you take a look at the truth table in two halves, the top half is essentially the bottom half upside-down; and that flipping is controlled by the value of in1. This comes in very useful for certain situations.

For example, let's say you've been tasked with producing a square wave: a constant series of pulses, 64 then 191, then 64, then 191. You could implement a complicated series of if statements, or you could use a simple XOR.
XOR: Making a 1kHz square wave char output = 64; while(1) { output = output ^ 255; usleep(500); // sleep for half the wave period }
So, what's happening here? The output value starts at 64, and gets changed by the XOR every time the loop runs. How does that change manifest itself?
XOR: Sending the square wave high output: 01000000 Mask: 11111111 XOR: -------- Result: 10111111
So we started at 64, and the XOR has flipped all the bits, to 191. After 500 milliseconds, the loop is re-entered, and the XOR gets applied again:
XOR: Sending the same wave low output: 10111111 Mask: 11111111 XOR: -------- Result: 01000000
And as if by magic, the XOR returns the result we entered with the first time: 191 gets flipped to 64. Because the value is 64 for 500 microseconds, and 191 for another 500 microseconds, the overall wave period is 1 millisecond, and we've produced the 1kHz square wave, with just one XOR operation.

NOT: The unary one

If all you want to do is flip the whole of a value, as in the above example, there is an alternative. It's called NOT, and it's different to the rest of the bitwise operators we've covered so far. Instead of taking two inputs, it just takes one; the output is the opposite of the input.

x NOT x

0 1

1 0

NOT is one of those more esoteric operations; it doesn't see much use. One place where it can be used to good effect is to generate masks for use in the AND operation. Let's say you want to mask off all the bits of a value except the bottom two. Instead of hassling around with long strings of binary or hexadecimal digits, we can let bitwise operators do the grunt work for us.
NOT: Generating an AND mask value = value & (~3);
Alright; short, succinct, but what's it doing? Let's take a look at the binary level.
NOT: How the generation happens Three: 00000000 00000000 00000000 00000011 NOT 3: 11111111 11111111 11111111 11111100 Value: 11001001 00011110 10001101 00111011 AND: ----------------------------------- Result: 11001001 00011110 10001101 00111000
In this way, you can easily generate the masks you need for your AND operations, without having to dig deep into hexadecimal strings and remembering any combination tables; NOT's doing the hard work for you.

Shifts: Do the shuffle

So far, the bitwise operators have been 1-bit affairs: take one bit from an input (or two), give one bit of output. There are operators, however, that take a whole string of binary digits, and do simple operations with that. The major example of those operators is the shift.

Shifting comes in two variants: left and right. As you can probably guess, a left shift takes a binary string of values and shifts it left, pushing it up the binary powers. Let's take a look with an example.
Left shift: Pushing a value up Before: 00001111 Left 1: 00011110 Left 2: 00111100 Left 3: 01111000
The 1's get pushed up from the bottom of the value, towards the top; the gaps that are left are filled in by 0's. If you look carefully, you'll note that this is essentially equivalent to multiplying the value by powers of two: before shifting, we had 15. Shift left by 1 and we end up with 30; left by 2 gives us 60, and left by 3 gives 120.

However, what happens if we start with a 1 near the top, and then start shifting left? Let's take a look.
Left shift: The bit bucket Before: 01100101 Left 1: 11001010 Left 2: 10010100 Left 3: 00101000
The value got shifted along sure enough, and 0's were inserted at the bottom, but where did the upper 1's go? The answer is, they vanished; basically, they fell off the end of the value as a result of the shifting operation. If you decide to use shifts, keep in mind that this will happen if you run out of space for the bits.

Also note how the most-significant bit changes during the successive shift operations. If you're dealing with signed values, the most significant bit is often used to denote the sign (1 meaning that this is a negative number); in this case, your number is going from positive to negative, then back to positive! That is one disadvantage of the shift operation that has to be kept in mind when you use it.

The right shift is very similar to the left shift, only it acts in the reverse direction: bits get moved to the right, and 0's are inserted from the left hand side.
Right shift: Pushing down Before: 00111000 Right 1: 00011100 Right 2: 00001110 Right 3: 00000111
Just as left shift can be thought of as multiplication by a binary power, right shift can be used to divide by a binary power; in the case above, 56 becomes 28, then 14, then 7, simply by successive shifts to the right.

Just as with the left shift also, the right shift provides no safeguards if 1's are at the low end of the scale. In the example above, another right shift would yield a value of 3, since the lowest 1 simply drops off the end of the value.

Also just like the left shift, there's no safeguard on the highest bit; if it started out as 1 (which in a signed value denotes negative), a right shift would fill in 0's, making the value positive. For that reason, processors and programming languages often offer an "arithmetic" right shift operation, which fills with 0 if the top bit was 0, and fills with 1 if the top bit was 1; by using the arithmetic right shift, the sign of the value is preserved.

That's all well and good, throwing values around inside a binary string. But to what end can it be put? How can shifts be used in the real world?

Shifts to combine values

Let's say you have in your possession four 8-bit values, and you wish to build a 32-bit value by tacking them all together. The left shift makes this a very easy task to accomplish.
Left shift: Building a value Values: 11001010, 00111010, 01001101, 00110011 Pushing all values into 32-bit variables: var4: 00000000 00000000 00000000 00110011 var3: 00000000 00000000 01001101 00000000 <-- left shift 8 var2: 00000000 00111010 00000000 00000000 <-- left shift 16 var1: 11001010 00000000 00000000 00000000 <-- left shift 24 OR: ----------------------------------- Final: 11001010 00111010 01001101 00110011 value = var4 | (var3 << 8) | (var2 << 16) | (var1 << 24);
Similarly, if you wanted to split that 32-bit value up into 8-bit chunks, you could simply perform a successive right-shift by 8, in conjunction with an AND operation to mask off the part of the result you're looking for.
Right shift: Splitting a value Initial: 11001010 00111010 01001101 00110011 Mask: 00000000 00000000 00000000 11111111 AND: ----------------------------------- var4: 00110011 RShift: 00000000 11001010 00111010 01001101 Mask: 00000000 00000000 00000000 11111111 AND: ----------------------------------- var3: 01001101 RShift: 00000000 00000000 11001010 00111010 Mask: 00000000 00000000 00000000 11111111 AND: ----------------------------------- var2: 00111010 RShift: 00000000 00000000 00000000 11001010 Mask: 00000000 00000000 00000000 11111111 AND: ----------------------------------- var1: 11001010 var4 = value & 255; var3 = (value >> 8) & 255; var2 = (value >> 16) & 255; var1 = (value >> 24) & 255;
A little bit long-winded, at least when written down in raw binary. Keep in mind, though, that the computer can do this at a stupidly high pace, especially when you use operators designed to work with bits.

Shifts used to multiply

Another use of shifts is to multiply values by multiplers that aren't direct binary powers. This isn't such a common operation any more, since multiplier units are quite quick nowadays, but if you ever come across a weird sequence of shift operations in some old code, you'll know what it's doing, and why it's there.

In the old VGA days, the most common graphic mode used on PCs had 320 pixels to a line. Pixels were written to the screen in a flat framebuffer: a portion of memory which translated directly to the screen, 320 bytes to a line. That meant finding out a location in memory, given an X and Y coordinate to draw to. The formula is pretty simple, but the multiplier units of the CPUs back then were quite slow, so every advantage was sought out.
Left shift: Multiplying by 320 Memory Location = (y * 320) + x = (y * 256) + (y * 64) + x = (y << 8) + (y << 6) + x
Since shifts were orders of magnitude faster than multiplies, this series of two shifts and two adds was done much more quickly than one multiply and one add. Of course, you probably won't see this so much any more, since CPUs actually got good at multiplying, but it's possible you'll come across this kind of technique in another place.

The End

So, that's all the bitwise operators you'll come across in your travels through programming. Some of them may come in very useful, and some may be used less often, but have no doubt: they'll all be used sometime.

A final note: This table shows how C-syntax languages denote the bitwise operators.

Operator Symbol Syntax

AND Ampersand x & y

OR Vertical pipe x | y

XOR Caret x ^ y

NOT Tilde ~ x

Left shift Two less-than signs x << y

Right shift Two greater-than signs x >> y

Placed into the public domain by Imran Nazar, 2006.

Virus Detection with Message Digests

Fri, 22 Sep 2006 21:38:53 +0000

Files stored in a computer are simply streams of numbers. A text file, for instance, has a number for each letter or number in the text; a graphic file has a number for each element of colour. Computer programs are also files: each number tells the computer to perform a function like adding two numbers together, or storing a number in another file.

The problem is that, because programs are just files like any other files, they can be changed: certain numbers modified, or new streams added in the middle of the file. This is one way in which viruses are able to infect a file; they may add to a program file, and the next time that program is run, the virus is executed and performs the functions it was designed to do.

One of the best ways to alleviate this is to use a technique known as the 'message digest': a number which describes the entire contents of a file. A simple example would be an algorithm like the following:

If each letter of the alphabet is assigned a number, A being 1, B being 2 and so forth, it's possible to add all the letters in a text file together, to obtain a "check-sum"; a sum of the contents of the file, which can be used to check it. As an example, HELLO WORLD would add up to 127.

One of the problems with this simplest example of message digest algorithm is that the space could be removed entirely, and the checksum would be the same; also, the message could be changed and more content added, as long as the total of all the letters was still 127. As a result, more complex algorithms have been developed to take these issues into account.

When these methods are used on a program file, they generate a number which is unique to the combination of numbers within that file. If anything changes, like an instruction being changed or instructions being added by a virus, the digest number will change.

This can be used to detect viruses every time a program is run. The process is relatively simple: when the program is first installed, a digest is generated based on that program file. Every time the program is used, the digest is again calculated, and if the numbers differ, some outside agent has changed the file.

One of the most popular implementations of this system is used inside Windows, known as File Protection; it's used on files which are important to the system, such as device drivers. If Windows detects that one of the files has a different digest to that which it knows about, the user will be alerted to the fact that a system file has been changed. Many other systems are also in place in other software packages to perform similar functions.

The Smallest Nintendo DS ROM

Fri, 22 Sep 2006 21:37:38 +0000

Have you ever wondered what exactly is inside a Nintendo DS ROM file, and why the simple DS demos are so much larger than their GBA equivalents? Some people have, and this page documents their exploits.

Introduction

It started innocently enough. I was looking for a small ROM which would be used to test the framebuffer display mode of DSemu, a Nintendo DS emulator. LiraNuna agreed to put a small C demo together, to fill the 'main' screen with red, demonstrating the framebuffer's use. When compiled and spliced up, the .nds ended up at around 7.5KB.

That, LiraNuna thought, was a bit large for something that did so little as his demo evidently did. Stepping through with DSemu's debugger, I noticed a whole lot of code being run which wasn't strictly required: setting up cache parameters and the stack, clearing out regions of memory, and such like. Referred to as the crt0, this code is inserted into every project, to safeguard the execution environment.

Furthermore, there was the standard ARM7 code also inserted into the .nds file, which does such things as set up the touchscreen. All this, we thought, was a bit over-the-top for a demo that was literally doing almost nothing. So, the cut-down began.

Early stages: Chopping the ARM7

First off, LiraNuna thought the functionality of the ARM7 wasn't particularly required for this demo. So, the thought process went, why not simply tell that 'sub' CPU to enter an infinite loop and not do anything? The reasoning was sound, and so the ARM7 source file was replaced with a simple assembly file, looking something like this.
ARM7 cut-down: Infinite loop main: b main
Once put together, that reduced the size of the overall .nds file by quite a way; down to approximately 5KB. However, I still thought that was a touch large. A quick peek into the .nds file showed why that was: the sub CPU, just like the main CPU, has a crt0 automatically inserted by the build process, and this made up the vast majority of the ARM7 portion of the .nds file.

Therefore, LiraNuna took the step of subverting a part of the build process, by deleting the result of the ARM7 compilation, and replacing it with a straight binary file, encoding the infinite-loop opcode.
ARM7 cut-down: Final binary 0000 FE FF FF EA
That left the overall binary at around 4KB. Still plenty of room for improvement, I thought.

The Next Step: Assembly

The main code was still in C, and compiled to Thumb binary. Stepping through that in DSemu's debugger, I noticed a few odd things introduced by the compiler, that seemed to do very little; values being left-shifted and then right-shifted again, to no overall effect, and similar oddities. So, the next logical step was to write that portion without the intervention of the compiler, in assembly.

LiraNuna put together a first attempt at an assembly version of the program, as follows.
ARM9 cut-down: First run main: @ sets POWER_CR mov r0, #0x4000000 orr r0, r0, #0x300 orr r0, r0, #0x4 mov r1, #0x3 str r1, [r0] @ sets mode mov r0, #0x04000000 mov r1, #0x00020000 str r1, [r0] @ sets VRAM bank a mov r0, #0x04000000 add r0, r0, #0x240 mov r1, #0x80 strb r1, [r0] @ loop mov r0, #0x06800000 mov r1, #0x1F orr r1, r1, #0x8000 mov r2, #0x18000 filloop: strh r1, [r0], #0x1 subs r2, r2, #0x1 bne filloop lforever: b lforever
When compiled up, that definitely made a difference; the overall ROM size dropped to approximately 1.5KB. However, I started to have an inkling that we could do better. And that's when pepsiman piped up with a suggestion: place the code inside the .nds header.

Going deeper: Inside the .nds file

What did pepsiman mean by that? In order to understand that, it's important to know what a .nds ROM looks like, on the inside.

File offset Component

0000 NDS ROM header (512 bytes)

0200 ARM9 binary

0200+ARM9 ARM7 binary

0200+both Optional file table

The conventional layout dictates that the main CPU's binary be placed after the header, and the sub CPU's binary after that. However, that doesn't have to hold true all the time; the order can be swapped, blank space can be inserted between the binaries, or after them.

That's all well and good, but inside the header? In order to understand that, it's required to look inside that top chunk of the file: the ROM header.
Header structure: ndstool sample output 0x00 Game title 0x0C Game code #### 0x10 Maker code 0x12 Unit code 0x00 0x13 Device type 0x00 0x14 Device capacity 0x00 (1 Mbit) 0x15 (8 bytes blank space) 0x1E ROM version 0x00 0x1F reserved 0x04 0x20 ARM9 ROM offset 0x200 0x24 ARM9 entry address 0x2000000 0x28 ARM9 RAM address 0x2000000 0x2C ARM9 code size 0x3A0 0x30 ARM7 ROM offset 0x600 0x34 ARM7 entry address 0x3800000 0x38 ARM7 RAM address 0x3800000 0x3C ARM7 code size 0x8 0x40 File name table offset 0x608 0x44 File name table size 0x9 0x48 FAT offset 0x614 0x4C FAT size 0x0 0x50 ARM9 overlay offset 0x0 0x54 ARM9 overlay size 0x0 0x58 ARM7 overlay offset 0x0 0x5C ARM7 overlay size 0x0 0x60 ROM control info 1 0x00586000 0x64 ROM control info 2 0x001808F8 0x68 Icon/title offset 0x0 0x6C Secure area CRC 0x0000 (-, homebrew) 0x6E ROM control info 3 0x0000 0x70 (16 bytes blank space) 0x80 Application end offset 0x00000000 0x84 ROM header size 0x00000200 0x88 (36 bytes blank space) 0xAC PassMe autoboot detect 0x53534150 ("PASS") 0xB0 (16 bytes blank space) 0xC0 Nintendo Logo (156 bytes) 0x15C Logo CRC 0x9E1A (OK) 0x15E Header CRC 0xC9D3 (OK) 0x160 (160 bytes blank space)
The entries highlighted red indicate regions of empty space in the header structure. These are normally left behind during the construction of the format, to allow for expansion. In this case, however, it's possible to make use of the blank regions in the header for the purposes of holding code.

From looking at the above output, it's simple to see that the structure of the .nds file as a whole is dictated by the entries in this header. The fact that the ARM9 binary follows the header is simply due to the setting of "ARM9 ROM offset" to 0x200, which is the first byte in the file after the header. Similarly, the ARM7 code following the ARM9 is a simple effect of the "ARM7 ROM offset" being set to 0x600, which corresponds to an offset in the file of 1.5KB.

Simply by changing the "ROM offset" values in this header, it's possible to change the point from which the code for the CPUs is loaded, from the default location after the header to somewhere inside the header; overwrite the zeros in that position with ARM opcodes, and load from there. It seemed a good idea by pepsiman, and viable.

LiraNuna's ARM9 code seemed quite short, but I thought I could go one better, shrinking the code down further.
ARM9 cut-down: Second run main: mov r0,#0x04000000 ; I/O space offset mov r1,#0x3 ; Both screens on mov r2,#0x00020000 ; Framebuffer mode mov r3,#0x80 ; VRAM bank A enabled, LCD str r1,[r0, #0x304] ; Set POWERCNT str r2,[r0] ; DISPCNT str r3,[r0, #0x240] ; VRAMCNT_A mov r0,#0x06800000 ; VRAM offset mov r1,#31 ; Writing red pixels mov r2,#0xC000 ; 96k of them lp: strh r1,[r0],#2 ; Write a pixel subs r2,r2,#1 ; Move along one bne lp ; And loop back if not done nf: b nf ; Sit in an infinite loop to finish
Once assembled, this code ended up looking like the following.
ARM9 cut-down: Assembled binary 0000 01 03 A0 E3 03 10 A0 E3 02 28 A0 E3 80 30 A0 E3 0010 04 13 80 E5 00 20 80 E5 40 32 80 E5 1A 05 A0 E3 0020 1F 10 A0 E3 03 29 A0 E3 B2 10 C0 E0 01 20 52 E2 0030 FC FF FF 1A FE FF FF EA
Definitely a little smaller; now the matter remained of where to put it, along with the ARM7 binary of one opcode (EAFFFFFE). The ARM7 was simple enough: the first region of blank space, 8 bytes, was ample space to place this opcode. The ARM7 offset was changed, the size changed to 4, and that part was done.

The ARM9 code was similarly simple to place in: the 160 bytes of free space at the end of the header seemed more than enough to stash the binary, and all that remained was to modify the ARM9 ROM offset and size.

And that, it seemed, was that. All the code fit comfortably into the header, and the final .nds was just 512 bytes in size. Surely that was all that could be done? Not quite.

To the core: Repositioning

As it turns out, not all 512 bytes of the header are used. The 160 bytes on the end are in the header simply by convention; one might as well say that the .nds file consists of a 352-byte header, 160 bytes of padding, and then the two CPU binaries. Was it possible to fit the 56-byte ARM9 binary somewhere else inside the header, and eliminate this padding?

I started by changing the "header size" field at 0x84 to reflect the new size of the header, which would be 0x160 bytes. Then, I started inserting the opcodes, until I had something like this.
ARM9 placement: Within the header 0070 01 03 A0 E3 03 10 A0 E3 02 28 A0 E3 80 30 A0 E3 0080 00 00 00 00 A0 01 00 00 04 13 80 E5 00 20 80 E5 0090 40 32 80 E5 1A 05 A0 E3 1F 10 A0 E3 03 29 A0 E3 00A0 B2 10 C0 E0 01 20 52 E2 FC FF FF 1A 50 41 53 53 00B0 FE FF FF EA 00 00 00 00 00 00 00 00 00 00 00 00
The fields in the header at 0x80, 0x84 and 0xAC can be seen, nestled within the ARM9 code. Now, this is quite a problem; if those values correspond to valid opcodes, they may be executed, and that might prove disastrous for the state of the program.

A disassembly was called for. I loaded up the new binary in DSemu, and the debugger gave the following output:
ARM9 cut-down: Code after insertion mov r0,#0x04000000 mov r1,#0x3 mov r2,#0x00020000 mov r3,#0x80 andeq r0, r0, r0 andeq r0, r0, r0, lsr #3 str r1,[r0, #0x304] str r2,[r0] str r3,[r0, #0x240] mov r0,#0x06800000 mov r1,#31 mov r2,#0xC000 lp: strh r1,[r0],#2 subs r2,r2,#1 bne lp cmppls r3, #0x14 nf: b nf
It seems I was fortunate. The first two AND statements will never be executed, since they depend on the ZERO flag being set, and said flag is not set by the instructions above. As for the CMP, it slots into place after the VRAM-writing loop, which is indeed fortunate; if the CMP had fallen before the BNE, the loop may have executed forever, eventually running out of VRAM to write to.

Surprisingly fortunate, I thought; I hadn't planned for such a consequence, and it had simply come about due to the size and structure of the code. Either way, I wasn't about to complain.

Conclusion

So, there we have it. The smallest .nds file you're ever likely to see, which still does something. The ARM7 sticks itself into an infinite loop, and the ARM9 fills the main-core framebuffer with red before entering its own infinite loop. I eventually got my wish, of a small framebuffer-testing demo, but it was fun to get there.
Final binary: 352 bytes 0000 4E 44 53 2E 54 69 6E 79 46 42 00 00 23 23 23 23 NDS.TinyFB..#### 0010 00 00 00 00 00 00 FE FF FF EA 00 00 00 00 00 04 ................ 0020 70 00 00 00 00 00 00 02 00 00 00 02 44 00 00 00 p...........D... 0030 16 00 00 00 00 00 80 03 00 00 80 03 04 00 00 00 ................ 0040 A0 01 00 00 00 00 00 00 A0 01 00 00 00 00 00 00 ................ 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060 00 60 58 00 F8 08 18 00 00 00 00 00 00 00 00 00 .`X............. 0070 01 03 A0 E3 03 10 A0 E3 02 28 A0 E3 80 30 A0 E3 .........(...0.. 0080 00 00 00 00 A0 01 00 00 04 13 80 E5 00 20 80 E5 ............. .. 0090 40 32 80 E5 1A 05 A0 E3 1F 10 A0 E3 03 29 A0 E3 @2...........).. 00A0 B2 10 C0 E0 01 20 52 E2 FC FF FF 1A 50 41 53 53 ..... R.....PASS 00B0 FE FF FF EA 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00C0 C8 60 4F E2 01 70 8F E2 17 FF 2F E1 12 4F 11 48 .`O..p..../..O.H 00D0 12 4C 20 60 64 60 7C 62 30 1C 39 1C 10 4A 00 F0 .L `d`|b0.9..J.. 00E0 14 F8 30 6A 80 19 B1 6A F2 6A 00 F0 0B F8 30 6B ..0j...j.j....0k 00F0 80 19 B1 6B F2 6B 00 F0 08 F8 70 6A 77 6B 07 4C ...k.k....pjwk.L 0100 60 60 38 47 07 4B D2 18 9A 43 07 4B 92 08 D2 18 ``8G.K...C.K.... 0110 0C DF F7 46 04 F0 1F E5 00 FE 7F 02 F0 FF 7F 02 ...F............ 0120 F0 01 00 00 FF 01 00 00 00 00 00 04 00 00 00 00 ................ 0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0150 00 00 00 00 00 00 00 00 00 00 00 00 1A 9E 7B EB ..............{.
http://imrannazar.com/content/files/TinyFB.nds

Two9A, with thanks to LiraNuna and pepsiman

Major	Minor	Full type	Data
Text documents
text	plain	text/plain	Plain text documents
text	html	text/html	HTML documents
text	csv	text/csv	Comma-separated data files
Images
image	jpeg	image/jpeg	JPEG-formatted images
image	png	image/png	PNG-formatted images
Application-specific types
application	pdf	application/pdf	Portable Document Format (PDF)
application	zip	application/zip	PKZIP compressed archives
application	msword	application/msword	MS Word documents
Types with multiple components
multipart	form-data	multipart/form-data	Web forms with uploaded files
multipart	mixed	multipart/mixed	Messages with many types of component

Bits 27-20	Bits 7-4
Bits 27-20	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
00	AND lli	AND llr	AND lri	AND lrr	AND ari	AND arr	AND rri	AND rrr	AND lli	MUL	AND lri	STRH ptrm	AND ari	LDRD ptrm	AND rri	STRD ptrm
01	ANDS lli	ANDS llr	ANDS lri	ANDS lrr	ANDS ari	ANDS arr	ANDS rri	ANDS rrr	ANDS lli	MULS	ANDS lri	LDRH ptrm	ANDS ari	LDRSB ptrm	ANDS rri	LDRSH ptrm
02	EOR lli	EOR llr	EOR lri	EOR lrr	EOR ari	EOR arr	EOR rri	EOR rrr	EOR lli	MLA	EOR lri	STRH ptrm	EOR ari	LDRD ptrm	EOR rri	STRD ptrm
03	EORS lli	EORS llr	EORS lri	EORS lrr	EORS ari	EORS arr	EORS rri	EORS rrr	EORS lli	MLAS	EORS lri	LDRH ptrm	EORS ari	LDRSB ptrm	EORS rri	LDRSH ptrm
04	SUB lli	SUB llr	SUB lri	SUB lrr	SUB ari	SUB arr	SUB rri	SUB rrr	SUB lli		SUB lri	STRH ptim	SUB ari	LDRD ptim	SUB rri	STRD ptim
05	SUBS lli	SUBS llr	SUBS lri	SUBS lrr	SUBS ari	SUBS arr	SUBS rri	SUBS rrr	SUBS lli		SUBS lri	LDRH ptim	SUBS ari	LDRSB ptim	SUBS rri	LDRSH ptim
06	RSB lli	RSB llr	RSB lri	RSB lrr	RSB ari	RSB arr	RSB rri	RSB rrr	RSB lli		RSB lri	STRH ptim	RSB ari	LDRD ptim	RSB rri	STRD ptim
07	RSBS lli	RSBS llr	RSBS lri	RSBS lrr	RSBS ari	RSBS arr	RSBS rri	RSBS rrr	RSBS lli		RSBS lri	LDRH ptim	RSBS ari	LDRSB ptim	RSBS rri	LDRSH ptim
08	ADD lli	ADD llr	ADD lri	ADD lrr	ADD ari	ADD arr	ADD rri	ADD rrr	ADD lli	UMULL	ADD lri	STRH ptrp	ADD ari	LDRD ptrp	ADD rri	STRD ptrp
09	ADDS lli	ADDS llr	ADDS lri	ADDS lrr	ADDS ari	ADDS arr	ADDS rri	ADDS rrr	ADDS lli	UMULLS	ADDS lri	LDRH ptrp	ADDS ari	LDRSB ptrp	ADDS rri	LDRSH ptrp
0A	ADC lli	ADC llr	ADC lri	ADC lrr	ADC ari	ADC arr	ADC rri	ADC rrr	ADC lli	UMLAL	ADC lri	STRH ptrp	ADC ari	LDRD ptrp	ADC rri	STRD ptrp
0B	ADCS lli	ADCS llr	ADCS lri	ADCS lrr	ADCS ari	ADCS arr	ADCS rri	ADCS rrr	ADCS lli	UMLALS	ADCS lri	LDRH ptrp	ADCS ari	LDRSB ptrp	ADCS rri	LDRSH ptrp
0C	SBC lli	SBC llr	SBC lri	SBC lrr	SBC ari	SBC arr	SBC rri	SBC rrr	SBC lli	SMULL	SBC lri	STRH ptip	SBC ari	LDRD ptip	SBC rri	STRD ptip
0D	SBCS lli	SBCS llr	SBCS lri	SBCS lrr	SBCS ari	SBCS arr	SBCS rri	SBCS rrr	SBCS lli	SMULLS	SBCS lri	LDRH ptip	SBCS ari	LDRSB ptip	SBCS rri	LDRSH ptip
0E	RSC lli	RSC llr	RSC lri	RSC lrr	RSC ari	RSC arr	RSC rri	RSC rrr	RSC lli	SMLAL	RSC lri	STRH ptip	RSC ari	LDRD ptip	RSC rri	STRD ptip
0F	RSCS lli	RSCS llr	RSCS lri	RSCS lrr	RSCS ari	RSCS arr	RSCS rri	RSCS rrr	RSCS lli	SMLALS	RSCS lri	LDRH ptip	RSCS ari	LDRSB ptip	RSCS rri	LDRSH ptip
10	MRS rc					QADD			SMLABB	SWP	SMLATB	STRH ofrm	SMLABT	LDRD ofrm	SMLATT	STRD ofrm
11	TSTS lli	TSTS llr	TSTS lri	TSTS lrr	TSTS ari	TSTS arr	TSTS rri	TSTS rrr	TSTS lli		TSTS lri	LDRH ofrm	TSTS ari	LDRSB ofrm	TSTS rri	LDRSH ofrm
12	MSR rc	BX		BLX reg		QSUB		BKPT	SMLAWB		SMULWB	STRH prrm	SMLAWT	LDRD prrm	SMULWT	STRD prrm
13	TEQS lli	TEQS llr	TEQS lri	TEQS lrr	TEQS ari	TEQS arr	TEQS rri	TEQS rrr	TEQS lli		TEQS lri	LDRH prrm	TEQS ari	LDRSB prrm	TEQS rri	LDRSH prrm
14	MRS rs					QDADD			SMLALBB	SWPB	SMLALTB	STRH ofim	SMLALBT	LDRD ofim	SMLALTT	STRD ofim
15	CMPS lli	CMPS llr	CMPS lri	CMPS lrr	CMPS ari	CMPS arr	CMPS rri	CMPS rrr	CMPS lli		CMPS lri	LDRH ofim	CMPS ari	LDRSB ofim	CMPS rri	LDRSH ofim
16	MSR rs	CLZ				QDSUB			SMULBB		SMULTB	STRH prim	SMULBT	LDRD prim	SMULTT	STRD prim
17	CMNS lli	CMNS llr	CMNS lri	CMNS lrr	CMNS ari	CMNS arr	CMNS rri	CMNS rrr	CMNS lli		CMNS lri	LDRH prim	CMNS ari	LDRSB prim	CMNS rri	LDRSH prim
18	ORR lli	ORR llr	ORR lri	ORR lrr	ORR ari	ORR arr	ORR rri	ORR rrr	ORR lli		ORR lri	STRH ofrp	ORR ari	LDRD ofrp	ORR rri	STRD ofrp
19	ORRS lli	ORRS llr	ORRS lri	ORRS lrr	ORRS ari	ORRS arr	ORRS rri	ORRS rrr	ORRS lli		ORRS lri	LDRH ofrp	ORRS ari	LDRSB ofrp	ORRS rri	LDRSH ofrp
1A	MOV lli	MOV llr	MOV lri	MOV lrr	MOV ari	MOV arr	MOV rri	MOV rrr	MOV lli		MOV lri	STRH prrp	MOV ari	LDRD prrp	MOV rri	STRD prrp
1B	MOVS lli	MOVS llr	MOVS lri	MOVS lrr	MOVS ari	MOVS arr	MOVS rri	MOVS rrr	MOVS lli		MOVS lri	LDRH prrp	MOVS ari	LDRSB prrp	MOVS rri	LDRSH prrp
1C	BIC lli	BIC llr	BIC lri	BIC lrr	BIC ari	BIC arr	BIC rri	BIC rrr	BIC lli		BIC lri	STRH ofip	BIC ari	LDRD ofip	BIC rri	STRD ofip
1D	BICS lli	BICS llr	BICS lri	BICS lrr	BICS ari	BICS arr	BICS rri	BICS rrr	BICS lli		BICS lri	LDRH ofip	BICS ari	LDRSB ofip	BICS rri	LDRSH ofip
1E	MVN lli	MVN llr	MVN lri	MVN lrr	MVN ari	MVN arr	MVN rri	MVN rrr	MVN lli		MVN lri	STRH prip	MVN ari	LDRD prip	MVN rri	STRD prip
1F	MVNS lli	MVNS llr	MVNS lri	MVNS lrr	MVNS ari	MVNS arr	MVNS rri	MVNS rrr	MVNS lli		MVNS lri	LDRH prip	MVNS ari	LDRSB prip	MVNS rri	LDRSH prip
20	AND imm
21	ANDS imm
22	EOR imm
23	EORS imm
24	SUB imm
25	SUBS imm
26	RSB imm
27	RSBS imm
28	ADD imm
29	ADDS imm
2A	ADC imm
2B	ADCS imm
2C	SBC imm
2D	SBCS imm
2E	RSC imm
2F	RSCS imm
30
31	TSTS imm
32	MSR ic
33	TEQS imm
34
35	CMPS imm
36	MSR is
37	CMNS imm
38	ORR imm
39	ORRS imm
3A	MOV imm
3B	MOVS imm
3C	BIC imm
3D	BICS imm
3E	MVN imm
3F	MVNS imm
40	STR ptim
41	LDR ptim
42	STRT ptim
43	LDRT ptim
44	STRB ptim
45	LDRB ptim
46	STRBT ptim
47	LDRBT ptim
48	STR ptip
49	LDR ptip
4A	STRT ptip
4B	LDRT ptip
4C	STRB ptip
4D	LDRB ptip
4E	STRBT ptip
4F	LDRBT ptip
50	STR ofim
51	LDR ofim
52	STR prim
53	LDR prim
54	STRB ofim
55	LDRB ofim
56	STRB prim
57	LDRB prim
58	STR ofip
59	LDR ofip
5A	STR prip
5B	LDR prip
5C	STRB ofip
5D	LDRB ofip
5E	STRB prip
5F	LDRB prip
60	STR ptrmll		STR ptrmlr		STR ptrmar		STR ptrmrr		STR ptrmll		STR ptrmlr		STR ptrmar		STR ptrmrr
61	LDR ptrmll		LDR ptrmlr		LDR ptrmar		LDR ptrmrr		LDR ptrmll		LDR ptrmlr		LDR ptrmar		LDR ptrmrr
62	STRT ptrmll		STRT ptrmlr		STRT ptrmar		STRT ptrmrr		STRT ptrmll		STRT ptrmlr		STRT ptrmar		STRT ptrmrr
63	LDRT ptrmll		LDRT ptrmlr		LDRT ptrmar		LDRT ptrmrr		LDRT ptrmll		LDRT ptrmlr		LDRT ptrmar		LDRT ptrmrr
64	STRB ptrmll		STRB ptrmlr		STRB ptrmar		STRB ptrmrr		STRB ptrmll		STRB ptrmlr		STRB ptrmar		STRB ptrmrr
65	LDRB ptrmll		LDRB ptrmlr		LDRB ptrmar		LDRB ptrmrr		LDRB ptrmll		LDRB ptrmlr		LDRB ptrmar		LDRB ptrmrr
66	STRBT ptrmll		STRBT ptrmlr		STRBT ptrmar		STRBT ptrmrr		STRBT ptrmll		STRBT ptrmlr		STRBT ptrmar		STRBT ptrmrr
67	LDRBT ptrmll		LDRBT ptrmlr		LDRBT ptrmar		LDRBT ptrmrr		LDRBT ptrmll		LDRBT ptrmlr		LDRBT ptrmar		LDRBT ptrmrr
68	STR ptrpll		STR ptrplr		STR ptrpar		STR ptrprr		STR ptrpll		STR ptrplr		STR ptrpar		STR ptrprr
69	LDR ptrpll		LDR ptrplr		LDR ptrpar		LDR ptrprr		LDR ptrpll		LDR ptrplr		LDR ptrpar		LDR ptrprr
6A	STRT ptrpll		STRT ptrplr		STRT ptrpar		STRT ptrprr		STRT ptrpll		STRT ptrplr		STRT ptrpar		STRT ptrprr
6B	LDRT ptrpll		LDRT ptrplr		LDRT ptrpar		LDRT ptrprr		LDRT ptrpll		LDRT ptrplr		LDRT ptrpar		LDRT ptrprr
6C	STRB ptrpll		STRB ptrplr		STRB ptrpar		STRB ptrprr		STRB ptrpll		STRB ptrplr		STRB ptrpar		STRB ptrprr
6D	LDRB ptrpll		LDRB ptrplr		LDRB ptrpar		LDRB ptrprr		LDRB ptrpll		LDRB ptrplr		LDRB ptrpar		LDRB ptrprr
6E	STRBT ptrpll		STRBT ptrplr		STRBT ptrpar		STRBT ptrprr		STRBT ptrpll		STRBT ptrplr		STRBT ptrpar		STRBT ptrprr
6F	LDRBT ptrpll		LDRBT ptrplr		LDRBT ptrpar		LDRBT ptrprr		LDRBT ptrpll		LDRBT ptrplr		LDRBT ptrpar		LDRBT ptrprr
70	STR ofrmll		STR ofrmlr		STR ofrmar		STR ofrmrr		STR ofrmll		STR ofrmlr		STR ofrmar		STR ofrmrr
71	LDR ofrmll		LDR ofrmlr		LDR ofrmar		LDR ofrmrr		LDR ofrmll		LDR ofrmlr		LDR ofrmar		LDR ofrmrr
72	STR prrmll		STR prrmlr		STR prrmar		STR prrmrr		STR prrmll		STR prrmlr		STR prrmar		STR prrmrr
73	LDR prrmll		LDR prrmlr		LDR prrmar		LDR prrmrr		LDR prrmll		LDR prrmlr		LDR prrmar		LDR prrmrr
74	STRB ofrmll		STRB ofrmlr		STRB ofrmar		STRB ofrmrr		STRB ofrmll		STRB ofrmlr		STRB ofrmar		STRB ofrmrr
75	LDRB ofrmll		LDRB ofrmlr		LDRB ofrmar		LDRB ofrmrr		LDRB ofrmll		LDRB ofrmlr		LDRB ofrmar		LDRB ofrmrr
76	STRB prrmll		STRB prrmlr		STRB prrmar		STRB prrmrr		STRB prrmll		STRB prrmlr		STRB prrmar		STRB prrmrr
77	LDRB prrmll		LDRB prrmlr		LDRB prrmar		LDRB prrmrr		LDRB prrmll		LDRB prrmlr		LDRB prrmar		LDRB prrmrr
78	STR ofrpll		STR ofrplr		STR ofrpar		STR ofrprr		STR ofrpll		STR ofrplr		STR ofrpar		STR ofrprr
79	LDR ofrpll		LDR ofrplr		LDR ofrpar		LDR ofrprr		LDR ofrpll		LDR ofrplr		LDR ofrpar		LDR ofrprr
7A	STR prrpll		STR prrplr		STR prrpar		STR prrprr		STR prrpll		STR prrplr		STR prrpar		STR prrprr
7B	LDR prrpll		LDR prrplr		LDR prrpar		LDR prrprr		LDR prrpll		LDR prrplr		LDR prrpar		LDR prrprr
7C	STRB ofrpll		STRB ofrplr		STRB ofrpar		STRB ofrprr		STRB ofrpll		STRB ofrplr		STRB ofrpar		STRB ofrprr
7D	LDRB ofrpll		LDRB ofrplr		LDRB ofrpar		LDRB ofrprr		LDRB ofrpll		LDRB ofrplr		LDRB ofrpar		LDRB ofrprr
7E	STRB prrpll		STRB prrplr		STRB prrpar		STRB prrprr		STRB prrpll		STRB prrplr		STRB prrpar		STRB prrprr
7F	LDRB prrpll		LDRB prrplr		LDRB prrpar		LDRB prrprr		LDRB prrpll		LDRB prrplr		LDRB prrpar		LDRB prrprr
80	STMDA
81	LDMDA
82	STMDA w
83	LDMDA w
84	STMDA u
85	LDMDA u
86	STMDA uw
87	LDMDA uw
88	STMIA
89	LDMIA
8A	STMIA w
8B	LDMIA w
8C	STMIA u
8D	LDMIA u
8E	STMIA uw
8F	LDMIA uw
90	STMDB
91	LDMDB
92	STMDB w
93	LDMDB w
94	STMDB u
95	LDMDB u
96	STMDB uw
97	LDMDB uw
98	STMIB
99	LDMIB
9A	STMIB w
9B	LDMIB w
9C	STMIB u
9D	LDMIB u
9E	STMIB uw
9F	LDMIB uw
A0	B
A1
A2
A3
A4
A5
A6
A7
A8
A9
AA
AB
AC
AD
AE
AF
B0	BL
B1
B2
B3
B4
B5
B6
B7
B8
B9
BA
BB
BC
BD
BE
BF
C0	STC ofm
C1	LDC ofm
C2	STC prm
C3	LDC prm
C4	STC ofm
C5	LDC ofm
C6	STC prm
C7	LDC prm
C8	STC ofp
C9	LDC ofp
CA	STC prp
CB	LDC prp
CC	STC ofp
CD	LDC ofp
CE	STC prp
CF	LDC prp
D0	STC unm
D1	LDC unm
D2	STC ptm
D3	LDC ptm
D4	STC unm
D5	LDC unm
D6	STC ptm
D7	LDC ptm
D8	STC unp
D9	LDC unp
DA	STC ptp
DB	LDC ptp
DC	STC unp
DD	LDC unp
DE	STC ptp
DF	LDC ptp
E0	CDP	MCR	CDP	MCR	CDP	MCR	CDP	MCR	CDP	MCR	CDP	MCR	CDP	MCR	CDP	MCR
E1		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
E2		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
E3		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
E4		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
E5		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
E6		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
E7		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
E8		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
E9		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
EA		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
EB		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
EC		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
ED		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
EE		MCR		MCR		MCR		MCR		MCR		MCR		MCR		MCR
EF		MRC		MRC		MRC		MRC		MRC		MRC		MRC		MRC
F0	SWI
F1
F2
F3
F4
F5
F6
F7
F8
F9
FA
FB
FC
FD
FE
FF

Bits 15-12	Bits 11-8
Bits 15-12	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0	LSL imm								LSR imm
1	ASR imm								ADD reg		SUB reg		ADD imm3		SUB imm3
2	MOV i8r0	MOV i8r1	MOV i8r2	MOV i8r3	MOV i8r4	MOV i8r5	MOV i8r6	MOV i8r7	CMP i8r0	CMP i8r1	CMP i8r2	CMP i8r3	CMP i8r4	CMP i8r5	CMP i8r6	CMP i8r7
3	ADD i8r0	ADD i8r1	ADD i8r2	ADD i8r3	ADD i8r4	ADD i8r5	ADD i8r6	ADD i8r7	SUB i8r0	SUB i8r1	SUB i8r2	SUB i8r3	SUB i8r4	SUB i8r5	SUB i8r6	SUB i8r7
4	DP g1	DP g2	DP g3	DP g4	ADDH	CMPH	MOVH	BX reg	LDRPC r0	LDRPC r1	LDRPC r2	LDRPC r3	LDRPC r4	LDRPC r5	LDRPC r6	LDRPC r7
5	STR reg		STRH reg		STRB reg		LDRSB reg		LDR reg		LDRH reg		LDRB reg		LDRSH reg
6	STR imm5								LDR imm5
7	STRB imm5								LDRB imm5
8	STRH imm5								LDRH imm5
9	STRSP r0	STRSP r1	STRSP r2	STRSP r3	STRSP r4	STRSP r5	STRSP r6	STRSP r7	LDRSP r0	LDRSP r1	LDRSP r2	LDRSP r3	LDRSP r4	LDRSP r5	LDRSP r6	LDRSP r7
A	ADDPC r0	ADDPC r1	ADDPC r2	ADDPC r3	ADDPC r4	ADDPC r5	ADDPC r6	ADDPC r7	ADDSP r0	ADDSP r1	ADDSP r2	ADDSP r3	ADDSP r4	ADDSP r5	ADDSP r6	ADDSP r7
B	ADDSP imm7				PUSH	PUSH lr							POP	POP pc	BKPT
C	STMIA r0	STMIA r1	STMIA r2	STMIA r3	STMIA r4	STMIA r5	STMIA r6	STMIA r7	LDMIA r0	LDMIA r1	LDMIA r2	LDMIA r3	LDMIA r4	LDMIA r5	LDMIA r6	LDMIA r7
D	BEQ	BNE	BCS	BCC	BMI	BPL	BVS	BVC	BHI	BLS	BGE	BLT	BGT	BLE		SWI
E	B								BLX off
F	BL setup								BL off

Bits 9-8	Bits 7-6
Bits 9-8	0	1	2	3
0	AND	EOR	LSL	LSR
1	ASR	ADD	SUB	ROR
2	TST	NEG	CMP	CMN
3	ORR	MUL	BIC	MVN

Operator	Symbol	Syntax
AND	Ampersand	x & y
OR	Vertical pipe	x \| y
XOR	Caret	x ^ y
NOT	Tilde	~ x
Left shift	Two less-than signs	x << y
Right shift	Two greater-than signs	x >> y

File offset	Component
0000	NDS ROM header (512 bytes)
0200	ARM9 binary
0200+ARM9	ARM7 binary
0200+both	Optional file table