Have you ever wondered what exactly is inside a Nintendo DS ROM file, and why the simple DS demos are so much larger than their GBA equivalents? Some people have, and this page documents their exploits.
Introduction
It started innocently enough. I was looking for a small ROM which would be used to test the framebuffer display mode of DSemu, a Nintendo DS emulator. LiraNuna agreed to put a small C demo together, to fill the 'main' screen with red, demonstrating the framebuffer's use. When compiled and spliced up, the .nds ended up at around 7.5KB.
That, LiraNuna thought, was a bit large for something that did so little as his demo evidently did. Stepping through with DSemu's debugger, I noticed a whole lot of code being run which wasn't strictly required: setting up cache parameters and the stack, clearing out regions of memory, and such like. Referred to as the crt0, this code is inserted into every project, to safeguard the execution environment.
Furthermore, there was the standard ARM7 code also inserted into the .nds file, which does such things as set up the touchscreen. All this, we thought, was a bit over-the-top for a demo that was literally doing almost nothing. So, the cut-down began.
Early stages: Chopping the ARM7
First off, LiraNuna thought the functionality of the ARM7 wasn't particularly required for this demo. So, the thought process went, why not simply tell that 'sub' CPU to enter an infinite loop and not do anything? The reasoning was sound, and so the ARM7 source file was replaced with a simple assembly file, looking something like this.
ARM7 cut-down: Infinite loop
Once put together, that reduced the size of the overall .nds file by quite a way; down to approximately 5KB. However, I still thought that was a touch large. A quick peek into the .nds file showed why that was: the sub CPU, just like the main CPU, has a crt0 automatically inserted by the build process, and this made up the vast majority of the ARM7 portion of the .nds file.
Therefore, LiraNuna took the step of subverting a part of the build process, by deleting the result of the ARM7 compilation, and replacing it with a straight binary file, encoding the infinite-loop opcode.
ARM7 cut-down: Final binary
That left the overall binary at around 4KB. Still plenty of room for improvement, I thought.
The Next Step: Assembly
The main code was still in C, and compiled to Thumb binary. Stepping through that in DSemu's debugger, I noticed a few odd things introduced by the compiler, that seemed to do very little; values being left-shifted and then right-shifted again, to no overall effect, and similar oddities. So, the next logical step was to write that portion without the intervention of the compiler, in assembly.
LiraNuna put together a first attempt at an assembly version of the program, as follows.
ARM9 cut-down: First run
When compiled up, that definitely made a difference; the overall ROM size dropped to approximately 1.5KB. However, I started to have an inkling that we could do better. And that's when pepsiman piped up with a suggestion: place the code inside the .nds header.
Going deeper: Inside the .nds file
What did pepsiman mean by that? In order to understand that, it's important to know what a .nds ROM looks like, on the inside.
File offset | Component |
---|---|
0000 | NDS ROM header (512 bytes) |
0200 | ARM9 binary |
0200+ARM9 | ARM7 binary |
0200+both | Optional file table |
The conventional layout dictates that the main CPU's binary be placed after the header, and the sub CPU's binary after that. However, that doesn't have to hold true all the time; the order can be swapped, blank space can be inserted between the binaries, or after them.
That's all well and good, but inside the header? In order to understand that, it's required to look inside that top chunk of the file: the ROM header.
Header structure: ndstool sample output
The entries highlighted red indicate regions of empty space in the header structure. These are normally left behind during the construction of the format, to allow for expansion. In this case, however, it's possible to make use of the blank regions in the header for the purposes of holding code.
From looking at the above output, it's simple to see that the structure
of the .nds file as a whole is dictated by the entries in this header.
The fact that the ARM9 binary follows the header is simply due to the setting
of "ARM9 ROM offset" to 0x200
, which is the first byte in the
file after the header. Similarly, the ARM7 code following the ARM9 is a
simple effect of the "ARM7 ROM offset" being set to 0x600
, which
corresponds to an offset in the file of 1.5KB.
Simply by changing the "ROM offset" values in this header, it's possible to change the point from which the code for the CPUs is loaded, from the default location after the header to somewhere inside the header; overwrite the zeros in that position with ARM opcodes, and load from there. It seemed a good idea by pepsiman, and viable.
LiraNuna's ARM9 code seemed quite short, but I thought I could go one better, shrinking the code down further.
ARM9 cut-down: Second run
Once assembled, this code ended up looking like the following.
ARM9 cut-down: Assembled binary
Definitely a little smaller; now the matter remained of where to put it,
along with the ARM7 binary of one opcode (EAFFFFFE
). The ARM7
was simple enough: the first region of blank space, 8 bytes, was ample space
to place this opcode. The ARM7 offset was changed, the size changed to 4,
and that part was done.
The ARM9 code was similarly simple to place in: the 160 bytes of free space at the end of the header seemed more than enough to stash the binary, and all that remained was to modify the ARM9 ROM offset and size.
And that, it seemed, was that. All the code fit comfortably into the header, and the final .nds was just 512 bytes in size. Surely that was all that could be done? Not quite.
To the core: Repositioning
As it turns out, not all 512 bytes of the header are used. The 160 bytes on the end are in the header simply by convention; one might as well say that the .nds file consists of a 352-byte header, 160 bytes of padding, and then the two CPU binaries. Was it possible to fit the 56-byte ARM9 binary somewhere else inside the header, and eliminate this padding?
I started by changing the "header size" field at 0x84
to
reflect the new size of the header, which would be 0x160
bytes.
Then, I started inserting the opcodes, until I had something like this.
ARM9 placement: Within the header
The fields in the header at 0x80, 0x84 and 0xAC can be seen, nestled within the ARM9 code. Now, this is quite a problem; if those values correspond to valid opcodes, they may be executed, and that might prove disastrous for the state of the program.
A disassembly was called for. I loaded up the new binary in DSemu, and the debugger gave the following output:
ARM9 cut-down: Code after insertion
It seems I was fortunate. The first two AND statements will never be executed, since they depend on the ZERO flag being set, and said flag is not set by the instructions above. As for the CMP, it slots into place after the VRAM-writing loop, which is indeed fortunate; if the CMP had fallen before the BNE, the loop may have executed forever, eventually running out of VRAM to write to.
Surprisingly fortunate, I thought; I hadn't planned for such a consequence, and it had simply come about due to the size and structure of the code. Either way, I wasn't about to complain.
Conclusion
So, there we have it. The smallest .nds file you're ever likely to see, which still does something. The ARM7 sticks itself into an infinite loop, and the ARM9 fills the main-core framebuffer with red before entering its own infinite loop. I eventually got my wish, of a small framebuffer-testing demo, but it was fun to get there.
Final binary: 352 bytes
http://imrannazar.com/content/files/TinyFB.nds
Two9A, with thanks to LiraNuna and pepsiman
Article dated: 22nd Sep 2006