One of the major historical issues with the transmission of computer programs over the Internet has been how to avoid encoding problems: with FTP, for example, a file sent in text mode may have its newline characters translated by the FTP client into different values. For a program file, this would change the meaning of the code, and may result in failure when the downloaded file is executed.
The issue of transmision encoding is obviously a solved problem: FTP has a binary mode, which avoids text-based translation, while email has MIME-formatted attachments for the inclusion of files. Another way to approach the issue, however, is to produce executables which avoid the occurrence of the problem. Since a program, like any file, is nothing more than a stream of numbers, we can alleviate the transmission issue by producing programs which only use numbers that would be found in a text file; in the ASCII encoding standard, we can define these numbers as the "printable range", between 32 and 126.
Depending on the target CPU for the executable program, the limitation of only using the printable range will have different effects: for some processors, it is impossible to write a program under these constraints. In this article, I'll be looking at the combination of the x86 PC and MS-DOS, since this allows for a wide range of instructions to be used.
Opcodes in the printable range
Since the x86 base instruction set is derived from the 8080 CPU, a number of groups of instructions fall under the printable range, including conditional branches and arithmetic manipulation. The full list is as follows.
20/ | AND r/m,r (8-bit) | 40/@ | INC AX | 60/` | PUSHA |
---|---|---|---|---|---|
21/! | AND r/m,r (16-bit) | 41/A | INC CX | 61/a | POPA |
22/" | AND r,r/m (8-bit) | 42/B | INC DX | 62/b | BOUND r,m |
23/# | AND r,r/m (16-bit) | 43/C | INC BX | 63/c | ARPL r/m,r (16-bit) |
24/$ | AND AL,imm | 44/D | INC SP | 64/d | FS: |
25/% | AND AX,imm | 45/E | INC BP | 65/e | GS: |
26/& | ES: | 46/F | INC SI | 66/f | N/A |
27/' | DAA | 47/G | INC DI | 67/g | N/A |
28/( | SUB r/m,r (8-bit) | 48/H | DEC AX | 68/h | PUSH imm16 |
29/) | SUB r/m,r (16-bit) | 49/I | DEC CX | 69/i | IMUL r,r/m,imm16 |
2A/* | SUB r,r/m (8-bit) | 4A/J | DEC DX | 6A/j | Push imm8 |
2B/+ | SUB r,r/m (16-bit) | 4B/K | DEC BX | 6B/k | IMUL r,r/m,imm8 |
2C/, | SUB AL,imm8 | 4C/L | DEC SP | 6C/l | INSB |
2D/- | SUB AX,imm16 | 4D/M | DEC BP | 6D/m | INSW |
2E/. | CS: | 4E/N | DEC SI | 6E/n | OUTSB |
2F// | DAS | 4F/O | DEC DI | 6F/o | OUTSW |
30/0 | XOR r/m,r (8-bit) | 50/P | PUSH AX | 70/p | JO rel8 |
31/1 | XOR r/m,r (16-bit) | 51/Q | PUSH CX | 71/q | JNO rel8 |
32/2 | XOR r,r/m (8-bit) | 52/R | PUSH DX | 72/r | JC rel8 |
33/3 | XOR r,r/m (16-bit) | 53/S | PUSH BX | 73/s | JNC rel8 |
34/4 | XOR AL,imm | 54/T | PUSH SP | 74/t | JZ rel8 |
35/5 | XOR AX,imm | 55/U | PUSH BP | 75/u | JNZ rel8 |
36/6 | SS: | 56/V | PUSH SI | 76/v | JBE rel8 |
37/7 | AAA | 57/W | PUSH DI | 77/w | JNBE rel8 |
38/8 | CMP r/m,r (8-bit) | 58/X | POP AX | 78/x | JS rel8 |
39/9 | CMP r/m,r (16-bit) | 59/Y | POP CX | 79/y | JNS rel8 |
3A/: | CMP r,r/m (8-bit) | 5A/Z | POP DX | 7A/z | JP rel8 |
3B/; | CMP r,r/m (16-bit) | 5B/[ | POP BX | 7B/{ | JNP rel8 |
3C/< | CMP AL,imm | 5C/\ | POP SP | 7C/| | JL rel8 |
3D/= | CMP AX,imm | 5D/] | POP BP | 7D/} | JNL rel8 |
3E/> | DS: | 5E/^ | POP SI | 7E/~ | JLE rel8 |
3F/? | AAS | 5F/_ | POP DI |
As can be seen in the above list, a good selection of opcodes is available in the x86 printable set; this allows three possible methods for producing programs using these operations:
- Programs from scratch: Using printable opcodes from the first stages of program production, to make applications that can be directly transmitted over a plain-text channel. This is obviously the most difficult option, and provides no consideration for transmission of existing programs; as such, it will not be looked at here.
- External decoding: Programs are encoded in a plain-text representation, similar to Base64, and transmitted; a decoding application at the receiving end then processes the transmission into the original program. This solution is used by the
uuencode
utility, but is not an integrated solution. - Inline decoding: The program is encoded as with external decoding, but is provided alongside a decoding routine written with printable opcodes; this eliminates the need for an external decoder, and is the simplest solution: the program can simply be run as received, and will perform as expected.
Producing the encoding
As mentioned above, the concept of Base64 can be applied in this case. Base64 encoding treats the program file as a long number, modulo 64: under this base, a number provided as a stream of bytes can be evenly divided into a stream of digits.
The canonical Base64 encoding is an adaptation of the concept behind hexadecimal: instead of using the standard denary digits in their normal position, the letters of the alphabet are used first. Values 0 to 25 are represented by the uppercase letters A-Z, 26 to 51 by the lowercase letters, and 52 to 61 by the digits 0-9. The remaining encodings 62 and 63 change between variations of the Base64 model, but are used as + and / in most variations.
This canonical encoding skips around the ASCII character set, which makes for complexity in the encoding and decoding algorithm. In the case of this article, the particular characters used for the encoding are unimportant as long as they are within the printable range: the algorithms are simplified if a contiguous range is used, so for simplicity a range from 32-95 will serve as the translation endpoint.
Encoding of the program for transmission can be performed by an external utility; the following C extract will produce a contiguous-base64 encoding of a source file.
cb64-encode.c: Encoder to contiguous base64
#include <stdio.h> int main(int argc, char **argv) { FILE *in, *out; int fsize, i, n, o[4]; if(argc != 2) { printf("Usage: encode <file>\n"); return 1; } in = fopen(argv[1], "rb"); out = fopen("encode.out", "wb");/* Find out the size of the file */fseek(in, 0, SEEK_END); fsize = ftell(in); fseek(in, 0, SEEK_SET); for(i=0; i<fsize; i+=3) {/* Retrieve 24 bits from the file */n = (fgetc(in) << 0); n |= (fgetc(in) << 8); n |= (fgetc(in) << 16);/* Break out into four 6-bit values */o[0] = ((n >> 18) & 63) + 32; o[1] = ((n >> 12) & 63) + 32; o[2] = ((n >> 6) & 63) + 32; o[3] = ((n ) & 63) + 32;/* Write encoded values */fputc(o[0], out); fputc(o[1], out); fputc(o[2], out); fputc(o[3], out); } fclose(in); fclose(out); return 0; }
The inline decoder
With the source program encoded into the printable range, the remaining part of the process is the inline decoder. In theory, this is a simple reversal of the above code; the issue is complicated by the fact that the decoder must be written using printable opcodes only. In addition to using printable opcodes, some of the opcodes in the x86 instruction set use an optional ModR/M
byte to specify their arguments: this additional byte defines the source and destination for the operation. A few examples:
ModR/M samples
XOR AX,BX; 33 C3MOV AL,BYTE [SI]; 8B 04AND CX,WORD [SI+0293]; 23 8C 93 02
As can be seen above, the ModR/M
byte can take on any value, and each of the possible values encodes to a combination of source and destination. Of course, only some of these combinations encode to printable byte values: those selections are shown below.
Destination | |||||||||
---|---|---|---|---|---|---|---|---|---|
8-bit | AL | CL | DL | BL | AH | CH | DH | BH | |
16-bit | AX | CX | DX | BX | SP | BP | SI | DI | |
Source | [BX+SI] | 20 | 28 | 30 | 38 | ||||
[BX+DI] | 21 | 29 | 31 | 39 | |||||
[BP+SI] | 22 | 2A | 32 | 3A | |||||
[BP+DI] | 23 | 2B | 33 | 3B | |||||
[SI] | 24 | 2C | 34 | 3C | |||||
[DI] | 25 | 2D | 35 | 3D | |||||
[addr] | 26 | 2E | 36 | 3E | |||||
[BX] | 27 | 2F | 37 | 3F | |||||
[BX+SI+disp] | 40 | 48 | 50 | 58 | 60 | 68 | 70 | 78 | |
[BX+DI+disp] | 41 | 49 | 51 | 59 | 61 | 69 | 71 | 79 | |
[BP+SI+disp] | 42 | 4A | 52 | 5A | 62 | 6A | 72 | 7A | |
[BP+DI+disp] | 43 | 4B | 53 | 5B | 63 | 6B | 73 | 7B | |
[SI+disp] | 44 | 4C | 54 | 5C | 64 | 6C | 74 | 7C | |
[DI+disp] | 45 | 4D | 55 | 5D | 65 | 6D | 75 | 7D | |
[BP+disp] | 46 | 4E | 56 | 5E | 66 | 6E | 76 | 7E | |
[BX+disp] | 47 | 4F | 57 | 5F | 67 | 6F | 77 |
addr is 16-bit, disp is 8-bit
Due to the limitations imposed by both the opcode range and the allowable addressing of ModR/M bytes, a few techniques have to be employed in code production:
-
Use of DOS initial values: When a program is loaded by MS-DOS or the Windows VDM, the registers have certain values at the start of execution. One of the important instances of this is that
AX
is zero. This value can be used throughout the program if it's saved to a register that's used solely for the purpose of providing zeroes. For example:; Use BX for zero register PUSH AX POP BX
-
XOR masking: Since the
MOV
opcode is disallowed from use, and most immediate values cannot be represented as printable bytes, XOR can be used to clear unused bits. For example:; MOV AX, 0013hPUSH BX POP AX XOR AX, 2020h XOR AX, 2033h; MOV AL, [SI+1Fh]DEC SI PUSH BX POP AX XOR AX, [SI+20h] - Jump padding: Since jumps in the printable range are relative, forward jumps can only be made to destinations more than 31 bytes away. To facilitate this, extra bytes can be inserted if the destination isn't far enough away. This requires that the encoded output of the operations between the jump and its destination is known, since the code length is required to calculate the amount of padding required.
-
Manual encoding: Some operations will not be intuitively selected by the assembler, and will be encoded into the non-printable range. To avoid this, the operation can be assembled by hand and inserted at the appropriate point. For the example of
XOR
where the source byte is 65 bytes from the opcode:; XOR AL, [SI+($-PRSTART)]DB 32h, 44h, 41h
Using these techniques, it's possible to write the reverse decoder for the printable-opcode encoder detailed above, and attach the encoded program to it. The resultant program is shown below, in NASM assembly format.
linenoise.com: Input program to be encoded
mov al, 0x13 int 0x10; Set 320x200x8 graphics modepush 0xA000 pop es; Set destination segmentlineloop: mov cx, 0x0140; For 200 linesin ax, 0x40; Get a "random" number from the timerand ax, 0xBF mul cx mov di, ax; Write to that line on screen (mod 191)in al, 0x40; Get a "random" numberrep stosb; Write a line of that colourin al, 0x60; Check for a keydec al jnz lineloop; Loop back if not ESCend: ret; Return to DOS
linenoise-encoded.com: Encoded into printable opcodes
pusha; Save all registers for post-decoding; Initialise registers and zero-holderpush ax push ax push ax push ax push ax pop edx pop ebx pop cx; Rewrite jpouter and jpinnerxor al, jpouter-256-96; Get a printable-range 8-bit valuexor ax, 0x235B xor ax, 0x225B; Add 256push ax pop si; Use this as the addresspush bx pop ax xor al, 0x20 sub al, 0x55; AL = 0xCBxor [si+0x60], al; Opcode = 0xEB (JMP rel)xor [si+0x62], al xor al,0x34; AL = 0xFFxor [si+0x61], al; Set jump point to known valuexor [si+0x63], al inc ax; AX = 0x0100push ax pop bp; BP = 0x0100; Set destination point (CS:1000 - 32)xor ax,0x3E30 xor ax,0x3030 sub al,32 push ax pop di; Set source point (+256+8)push bx pop ax xor ax,0x3120 xor ax,0x307E push ax pop si; Perform the decodinglpouter:; In blocks of 4push bx push bx pop eax and [di+0x20],eax inc cx inc cx inc cx inc cx; Read 6 bitslpinner: push bx pop ax db 0x32, 0x44, 0x41; xor al, [si+(source-$)]sub al,32 cmp al,94 je end; Push along by 6 bits, shift in the new bitsimul edx,[di+0x20],byte 64 and [di+0x20],ebx xor [di+0x20],edx; [Dest] = Shifted valxor [di+0x20],al; [Dest] += nextinc si dec cx jnz jpinner; Once the block of 4 is done, DI += 3inc di inc di inc di jnz jpouter; Jump indirect; Jump padding, since the above code is 25 bytes longdb '@@@@@@@' end: push bp pop ax; AX = 0x0100; Rewrite jump to CS:1000xor al,jpouter-256-64 push ax pop si push bx pop ax and [si+0x60],ax and [si+0x62],ax; Clear opcodesub al,0x37 xor al,0x20; AL = 0xE9xor [si+0x60],al; JMP (rel16)push bx pop ax xor ax,0x3E30 xor ax,0x3072; AX = 0x1000 - $xor [si+0x61],ax dec di; Clear Z flagpopa; Retrieve all registersjnz jpouter+32; Jump indirect to CS:1000; Rewritable jump pointsjpouter: db 0x20 db ($-lpouter) jpinner: db 0x20 db ($-lpinner); The encoded programsource: db 'S1.P &@0N0>@Y0% OR5 X?< Y,>)JO- _F#DZG7(___#' db 126; EOF marker
The above encoded program runs in the same manner as the original, and produces the following output:
Caveats and conclusion
There are a few issues with this encoding mechanism. Firstly, it is heavily tied to MS-DOS .com programs, since the decoder is itself a DOS program and relies on the initial conditions provided by the MS-DOS program loader. Furthermore, it has a fixed destination for the decoded program of CS:1000h, which only allows programs of a little under 4 kilobytes to be used without modifying the decoder.
These issues, however, can be overlooked if the use of the encoding mechanism is kept to the domain of graphic demos and other simple programs which are to be written using printable opcodes. That domain may, of course, not be a large one.
Imran Nazar <tf@imrannazar.com>, Jun 2011.