Printable Opcodes in x86 Real Mode

18th Jun 2011

One of the major historical issues with the transmission of computer programs over the Internet has been how to avoid encoding problems: with FTP, for example, a file sent in text mode may have its newline characters translated by the FTP client into different values. For a program file, this would change the meaning of the code, and may result in failure when the downloaded file is executed.

The issue of transmision encoding is obviously a solved problem: FTP has a binary mode, which avoids text-based translation, while email has MIME-formatted attachments for the inclusion of files. Another way to approach the issue, however, is to produce executables which avoid the occurrence of the problem. Since a program, like any file, is nothing more than a stream of numbers, we can alleviate the transmission issue by producing programs which only use numbers that would be found in a text file; in the ASCII encoding standard, we can define these numbers as the "printable range", between 32 and 126.

Depending on the target CPU for the executable program, the limitation of only using the printable range will have different effects: for some processors, it is impossible to write a program under these constraints. In this article, I'll be looking at the combination of the x86 PC and MS-DOS, since this allows for a wide range of instructions to be used.

Opcodes in the printable range

Since the x86 base instruction set is derived from the 8080 CPU, a number of groups of instructions fall under the printable range, including conditional branches and arithmetic manipulation. The full list is as follows.

20/	AND r/m,r (8-bit)	40/@	INC AX	60/`	PUSHA
21/!	AND r/m,r (16-bit)	41/A	INC CX	61/a	POPA
22/"	AND r,r/m (8-bit)	42/B	INC DX	62/b	BOUND r,m
23/#	AND r,r/m (16-bit)	43/C	INC BX	63/c	ARPL r/m,r (16-bit)
24/$	AND AL,imm	44/D	INC SP	64/d	FS:
25/%	AND AX,imm	45/E	INC BP	65/e	GS:
26/&	ES:	46/F	INC SI	66/f	N/A
27/'	DAA	47/G	INC DI	67/g	N/A
28/(	SUB r/m,r (8-bit)	48/H	DEC AX	68/h	PUSH imm16
29/)	SUB r/m,r (16-bit)	49/I	DEC CX	69/i	IMUL r,r/m,imm16
2A/*	SUB r,r/m (8-bit)	4A/J	DEC DX	6A/j	Push imm8
2B/+	SUB r,r/m (16-bit)	4B/K	DEC BX	6B/k	IMUL r,r/m,imm8
2C/,	SUB AL,imm8	4C/L	DEC SP	6C/l	INSB
2D/-	SUB AX,imm16	4D/M	DEC BP	6D/m	INSW
2E/.	CS:	4E/N	DEC SI	6E/n	OUTSB
2F//	DAS	4F/O	DEC DI	6F/o	OUTSW
30/0	XOR r/m,r (8-bit)	50/P	PUSH AX	70/p	JO rel8
31/1	XOR r/m,r (16-bit)	51/Q	PUSH CX	71/q	JNO rel8
32/2	XOR r,r/m (8-bit)	52/R	PUSH DX	72/r	JC rel8
33/3	XOR r,r/m (16-bit)	53/S	PUSH BX	73/s	JNC rel8
34/4	XOR AL,imm	54/T	PUSH SP	74/t	JZ rel8
35/5	XOR AX,imm	55/U	PUSH BP	75/u	JNZ rel8
36/6	SS:	56/V	PUSH SI	76/v	JBE rel8
37/7	AAA	57/W	PUSH DI	77/w	JNBE rel8
38/8	CMP r/m,r (8-bit)	58/X	POP AX	78/x	JS rel8
39/9	CMP r/m,r (16-bit)	59/Y	POP CX	79/y	JNS rel8
3A/:	CMP r,r/m (8-bit)	5A/Z	POP DX	7A/z	JP rel8
3B/;	CMP r,r/m (16-bit)	5B/[	POP BX	7B/{	JNP rel8
3C/<	CMP AL,imm	5C/\	POP SP	7C/\|	JL rel8
3D/=	CMP AX,imm	5D/]	POP BP	7D/}	JNL rel8
3E/>	DS:	5E/^	POP SI	7E/~	JLE rel8
3F/?	AAS	5F/_	POP DI

Table 1: Printable opcodes in the x86 instruction map

As can be seen in the above list, a good selection of opcodes is available in the x86 printable set; this allows three possible methods for producing programs using these operations:

Programs from scratch: Using printable opcodes from the first stages of program production, to make applications that can be directly transmitted over a plain-text channel. This is obviously the most difficult option, and provides no consideration for transmission of existing programs; as such, it will not be looked at here.
External decoding: Programs are encoded in a plain-text representation, similar to Base64, and transmitted; a decoding application at the receiving end then processes the transmission into the original program. This solution is used by the uuencode utility, but is not an integrated solution.
Inline decoding: The program is encoded as with external decoding, but is provided alongside a decoding routine written with printable opcodes; this eliminates the need for an external decoder, and is the simplest solution: the program can simply be run as received, and will perform as expected.

Producing the encoding

As mentioned above, the concept of Base64 can be applied in this case. Base64 encoding treats the program file as a long number, modulo 64: under this base, a number provided as a stream of bytes can be evenly divided into a stream of digits.

The canonical Base64 encoding is an adaptation of the concept behind hexadecimal: instead of using the standard denary digits in their normal position, the letters of the alphabet are used first. Values 0 to 25 are represented by the uppercase letters A-Z, 26 to 51 by the lowercase letters, and 52 to 61 by the digits 0-9. The remaining encodings 62 and 63 change between variations of the Base64 model, but are used as + and / in most variations.

This canonical encoding skips around the ASCII character set, which makes for complexity in the encoding and decoding algorithm. In the case of this article, the particular characters used for the encoding are unimportant as long as they are within the printable range: the algorithms are simplified if a contiguous range is used, so for simplicity a range from 32-95 will serve as the translation endpoint.

Encoding of the program for transmission can be performed by an external utility; the following C extract will produce a contiguous-base64 encoding of a source file.

cb64-encode.c: Encoder to contiguous base64 #include <stdio.h> int main(int argc, char **argv) { FILE *in, *out; int fsize, i, n, o[4]; if(argc != 2) { printf("Usage: encode <file>\n"); return 1; } in = fopen(argv[1], "rb"); out = fopen("encode.out", "wb"); /* Find out the size of the file */ fseek(in, 0, SEEK_END); fsize = ftell(in); fseek(in, 0, SEEK_SET); for(i=0; i<fsize; i+=3) { /* Retrieve 24 bits from the file */ n = (fgetc(in) << 0); n |= (fgetc(in) << 8); n |= (fgetc(in) << 16); /* Break out into four 6-bit values */ o[0] = ((n >> 18) & 63) + 32; o[1] = ((n >> 12) & 63) + 32; o[2] = ((n >> 6) & 63) + 32; o[3] = ((n ) & 63) + 32; /* Write encoded values */ fputc(o[0], out); fputc(o[1], out); fputc(o[2], out); fputc(o[3], out); } fclose(in); fclose(out); return 0; }

The inline decoder

With the source program encoded into the printable range, the remaining part of the process is the inline decoder. In theory, this is a simple reversal of the above code; the issue is complicated by the fact that the decoder must be written using printable opcodes only. In addition to using printable opcodes, some of the opcodes in the x86 instruction set use an optional ModR/M byte to specify their arguments: this additional byte defines the source and destination for the operation. A few examples:

ModR/M samples XOR AX,BX ; 33 C3 MOV AL,BYTE [SI] ; 8B 04 AND CX,WORD [SI+0293] ; 23 8C 93 02

As can be seen above, the ModR/M byte can take on any value, and each of the possible values encodes to a combination of source and destination. Of course, only some of these combinations encode to printable byte values: those selections are shown below.

	Destination
	8-bit	AL	CL	DL	BL	AH	CH	DH	BH
	16-bit	AX	CX	DX	BX	SP	BP	SI	DI
Source	[BX+SI]					20	28	30	38
	[BX+DI]					21	29	31	39
	[BP+SI]					22	2A	32	3A
	[BP+DI]					23	2B	33	3B
	[SI]					24	2C	34	3C
	[DI]					25	2D	35	3D
	[addr]					26	2E	36	3E
	[BX]					27	2F	37	3F
	[BX+SI+disp]	40	48	50	58	60	68	70	78
	[BX+DI+disp]	41	49	51	59	61	69	71	79
	[BP+SI+disp]	42	4A	52	5A	62	6A	72	7A
	[BP+DI+disp]	43	4B	53	5B	63	6B	73	7B
	[SI+disp]	44	4C	54	5C	64	6C	74	7C
	[DI+disp]	45	4D	55	5D	65	6D	75	7D
	[BP+disp]	46	4E	56	5E	66	6E	76	7E
	[BX+disp]	47	4F	57	5F	67	6F	77

Table 2: Printable values for the ModR/M byte
addr is 16-bit, disp is 8-bit

Due to the limitations imposed by both the opcode range and the allowable addressing of ModR/M bytes, a few techniques have to be employed in code production:

Use of DOS initial values: When a program is loaded by MS-DOS or the Windows VDM, the registers have certain values at the start of execution. One of the important instances of this is that AX is zero. This value can be used throughout the program if it's saved to a register that's used solely for the purpose of providing zeroes. For example: ; Use BX for zero register PUSH AX POP BX
XOR masking: Since the MOV opcode is disallowed from use, and most immediate values cannot be represented as printable bytes, XOR can be used to clear unused bits. For example: ; MOV AX, 0013h PUSH BX POP AX XOR AX, 2020h XOR AX, 2033h ; MOV AL, [SI+1Fh] DEC SI PUSH BX POP AX XOR AX, [SI+20h]
Jump padding: Since jumps in the printable range are relative, forward jumps can only be made to destinations more than 31 bytes away. To facilitate this, extra bytes can be inserted if the destination isn't far enough away. This requires that the encoded output of the operations between the jump and its destination is known, since the code length is required to calculate the amount of padding required.
Manual encoding: Some operations will not be intuitively selected by the assembler, and will be encoded into the non-printable range. To avoid this, the operation can be assembled by hand and inserted at the appropriate point. For the example of XOR where the source byte is 65 bytes from the opcode: ; XOR AL, [SI+($-PRSTART)] DB 32h, 44h, 41h

Using these techniques, it's possible to write the reverse decoder for the printable-opcode encoder detailed above, and attach the encoded program to it. The resultant program is shown below, in NASM assembly format.

linenoise.com: Input program to be encoded mov al, 0x13 int 0x10 ; Set 320x200x8 graphics mode push 0xA000 pop es ; Set destination segment lineloop: mov cx, 0x0140 ; For 200 lines in ax, 0x40 ; Get a "random" number from the timer and ax, 0xBF mul cx mov di, ax ; Write to that line on screen (mod 191) in al, 0x40 ; Get a "random" number rep stosb ; Write a line of that colour in al, 0x60 ; Check for a key dec al jnz lineloop ; Loop back if not ESC end: ret ; Return to DOS linenoise-encoded.com: Encoded into printable opcodes pusha ; Save all registers for post-decoding ; Initialise registers and zero-holder push ax push ax push ax push ax push ax pop edx pop ebx pop cx ; Rewrite jpouter and jpinner xor al, jpouter-256-96 ; Get a printable-range 8-bit value xor ax, 0x235B xor ax, 0x225B ; Add 256 push ax pop si ; Use this as the address push bx pop ax xor al, 0x20 sub al, 0x55 ; AL = 0xCB xor [si+0x60], al ; Opcode = 0xEB (JMP rel) xor [si+0x62], al xor al,0x34 ; AL = 0xFF xor [si+0x61], al ; Set jump point to known value xor [si+0x63], al inc ax ; AX = 0x0100 push ax pop bp ; BP = 0x0100 ; Set destination point (CS:1000 - 32) xor ax,0x3E30 xor ax,0x3030 sub al,32 push ax pop di ; Set source point (+256+8) push bx pop ax xor ax,0x3120 xor ax,0x307E push ax pop si ; Perform the decoding lpouter: ; In blocks of 4 push bx push bx pop eax and [di+0x20],eax inc cx inc cx inc cx inc cx ; Read 6 bits lpinner: push bx pop ax db 0x32, 0x44, 0x41 ; xor al, [si+(source-$)] sub al,32 cmp al,94 je end ; Push along by 6 bits, shift in the new bits imul edx,[di+0x20],byte 64 and [di+0x20],ebx xor [di+0x20],edx ; [Dest] = Shifted val xor [di+0x20],al ; [Dest] += next inc si dec cx jnz jpinner ; Once the block of 4 is done, DI += 3 inc di inc di inc di jnz jpouter ; Jump indirect ; Jump padding, since the above code is 25 bytes long db '@@@@@@@' end: push bp pop ax ; AX = 0x0100 ; Rewrite jump to CS:1000 xor al,jpouter-256-64 push ax pop si push bx pop ax and [si+0x60],ax and [si+0x62],ax ; Clear opcode sub al,0x37 xor al,0x20 ; AL = 0xE9 xor [si+0x60],al ; JMP (rel16) push bx pop ax xor ax,0x3E30 xor ax,0x3072 ; AX = 0x1000 - $ xor [si+0x61],ax dec di ; Clear Z flag popa ; Retrieve all registers jnz jpouter+32 ; Jump indirect to CS:1000 ; Rewritable jump points jpouter: db 0x20 db ($-lpouter) jpinner: db 0x20 db ($-lpinner) ; The encoded program source: db 'S1.P &@0N0>@Y0% OR5 X?< Y,>)JO- _F#DZG7(___#' db 126 ; EOF marker

The above encoded program runs in the same manner as the original, and produces the following output:

Linenoise output — Figure 1: Output of linenoise-encoded.com

Caveats and conclusion

There are a few issues with this encoding mechanism. Firstly, it is heavily tied to MS-DOS .com programs, since the decoder is itself a DOS program and relies on the initial conditions provided by the MS-DOS program loader. Furthermore, it has a fixed destination for the decoded program of CS:1000h, which only allows programs of a little under 4 kilobytes to be used without modifying the decoder.

These issues, however, can be overlooked if the use of the encoding mechanism is kept to the domain of graphic demos and other simple programs which are to be written using printable opcodes. That domain may, of course, not be a large one.

Imran Nazar <tf@imrannazar.com>, Jun 2011.