Printable Opcodes in x86 Real Mode

Display mode

Back to Articles

One of the major historical issues with the transmission of computer programs over the Internet has been how to avoid encoding problems: with FTP, for example, a file sent in text mode may have its newline characters translated by the FTP client into different values. For a program file, this would change the meaning of the code, and may result in failure when the downloaded file is executed.

The issue of transmision encoding is obviously a solved problem: FTP has a binary mode, which avoids text-based translation, while email has MIME-formatted attachments for the inclusion of files. Another way to approach the issue, however, is to produce executables which avoid the occurrence of the problem. Since a program, like any file, is nothing more than a stream of numbers, we can alleviate the transmission issue by producing programs which only use numbers that would be found in a text file; in the ASCII encoding standard, we can define these numbers as the "printable range", between 32 and 126.

Depending on the target CPU for the executable program, the limitation of only using the printable range will have different effects: for some processors, it is impossible to write a program under these constraints. In this article, I'll be looking at the combination of the x86 PC and MS-DOS, since this allows for a wide range of instructions to be used.

Opcodes in the printable range

Since the x86 base instruction set is derived from the 8080 CPU, a number of groups of instructions fall under the printable range, including conditional branches and arithmetic manipulation. The full list is as follows.

20/ AND r/m,r (8-bit)40/@INC AX60/`PUSHA
21/!AND r/m,r (16-bit)41/AINC CX61/aPOPA
22/"AND r,r/m (8-bit)42/BINC DX62/bBOUND r,m
23/#AND r,r/m (16-bit)43/CINC BX63/cARPL r/m,r (16-bit)
24/$AND AL,imm44/DINC SP64/dFS:
25/%AND AX,imm45/EINC BP65/eGS:
26/&ES:46/FINC SI66/fN/A
27/'DAA47/GINC DI67/gN/A
28/(SUB r/m,r (8-bit)48/HDEC AX68/hPUSH imm16
29/)SUB r/m,r (16-bit)49/IDEC CX69/iIMUL r,r/m,imm16
2A/*SUB r,r/m (8-bit)4A/JDEC DX6A/jPush imm8
2B/+SUB r,r/m (16-bit)4B/KDEC BX6B/kIMUL r,r/m,imm8
2C/,SUB AL,imm84C/LDEC SP6C/lINSB
2D/-SUB AX,imm164D/MDEC BP6D/mINSW
2E/.CS:4E/NDEC SI6E/nOUTSB
2F//DAS4F/ODEC DI6F/oOUTSW
30/0XOR r/m,r (8-bit)50/PPUSH AX70/pJO rel8
31/1XOR r/m,r (16-bit)51/QPUSH CX71/qJNO rel8
32/2XOR r,r/m (8-bit)52/RPUSH DX72/rJC rel8
33/3XOR r,r/m (16-bit)53/SPUSH BX73/sJNC rel8
34/4XOR AL,imm54/TPUSH SP74/tJZ rel8
35/5XOR AX,imm55/UPUSH BP75/uJNZ rel8
36/6SS:56/VPUSH SI76/vJBE rel8
37/7AAA57/WPUSH DI77/wJNBE rel8
38/8CMP r/m,r (8-bit)58/XPOP AX78/xJS rel8
39/9CMP r/m,r (16-bit)59/YPOP CX79/yJNS rel8
3A/:CMP r,r/m (8-bit)5A/ZPOP DX7A/zJP rel8
3B/;CMP r,r/m (16-bit)5B/[POP BX7B/{JNP rel8
3C/<CMP AL,imm5C/\POP SP7C/|JL rel8
3D/=CMP AX,imm5D/]POP BP7D/}JNL rel8
3E/>DS:5E/^POP SI7E/~JLE rel8
3F/?AAS5F/_POP DI
Table 1: Printable opcodes in the x86 instruction map

As can be seen in the above list, a good selection of opcodes is available in the x86 printable set; this allows three possible methods for producing programs using these operations:

Producing the encoding

As mentioned above, the concept of Base64 can be applied in this case. Base64 encoding treats the program file as a long number, modulo 64: under this base, a number provided as a stream of bytes can be evenly divided into a stream of digits.

The canonical Base64 encoding is an adaptation of the concept behind hexadecimal: instead of using the standard denary digits in their normal position, the letters of the alphabet are used first. Values 0 to 25 are represented by the uppercase letters A-Z, 26 to 51 by the lowercase letters, and 52 to 61 by the digits 0-9. The remaining encodings 62 and 63 change between variations of the Base64 model, but are used as + and / in most variations.

This canonical encoding skips around the ASCII character set, which makes for complexity in the encoding and decoding algorithm. In the case of this article, the particular characters used for the encoding are unimportant as long as they are within the printable range: the algorithms are simplified if a contiguous range is used, so for simplicity a range from 32-95 will serve as the translation endpoint.

Encoding of the program for transmission can be performed by an external utility; the following C extract will produce a contiguous-base64 encoding of a source file.

cb64-encode.c: Encoder to contiguous base64

#include <stdio.h>

int main(int argc, char **argv)
{
    FILE *in, *out;
    int fsize, i, n, o[4];

    if(argc != 2)
    {
        printf("Usage: encode <file>\n");
        return 1;
    }

    in = fopen(argv[1], "rb");
    out = fopen("encode.out", "wb");

    /* Find out the size of the file */
    fseek(in, 0, SEEK_END); fsize = ftell(in); fseek(in, 0, SEEK_SET);

    for(i=0; i<fsize; i+=3)
    {
        /* Retrieve 24 bits from the file */
        n  = (fgetc(in) << 0);
        n |= (fgetc(in) << 8);
        n |= (fgetc(in) << 16);

        /* Break out into four 6-bit values */
        o[0] = ((n >> 18) & 63) + 32;
        o[1] = ((n >> 12) & 63) + 32;
        o[2] = ((n >>  6) & 63) + 32;
        o[3] = ((n      ) & 63) + 32;

        /* Write encoded values */
        fputc(o[0], out); fputc(o[1], out);
        fputc(o[2], out); fputc(o[3], out);
    }

    fclose(in);
    fclose(out);
    return 0;
}

The inline decoder

With the source program encoded into the printable range, the remaining part of the process is the inline decoder. In theory, this is a simple reversal of the above code; the issue is complicated by the fact that the decoder must be written using printable opcodes only. In addition to using printable opcodes, some of the opcodes in the x86 instruction set use an optional ModR/M byte to specify their arguments: this additional byte defines the source and destination for the operation. A few examples:

ModR/M samples

XOR AX,BX                     ; 33 C3
MOV AL,BYTE [SI]              ; 8B 04
AND CX,WORD [SI+0293]         ; 23 8C 93 02

As can be seen above, the ModR/M byte can take on any value, and each of the possible values encodes to a combination of source and destination. Of course, only some of these combinations encode to printable byte values: those selections are shown below.

Destination
8-bitALCLDLBLAHCHDHBH
16-bitAXCXDXBXSPBPSIDI
Source[BX+SI]20283038
[BX+DI]21293139
[BP+SI]222A323A
[BP+DI]232B333B
[SI]242C343C
[DI]252D353D
[addr]262E363E
[BX]272F373F
[BX+SI+disp]4048505860687078
[BX+DI+disp]4149515961697179
[BP+SI+disp]424A525A626A727A
[BP+DI+disp]434B535B636B737B
[SI+disp]444C545C646C747C
[DI+disp]454D555D656D757D
[BP+disp]464E565E666E767E
[BX+disp]474F575F676F77
Table 2: Printable values for the ModR/M byte
addr is 16-bit, disp is 8-bit

Due to the limitations imposed by both the opcode range and the allowable addressing of ModR/M bytes, a few techniques have to be employed in code production:

Using these techniques, it's possible to write the reverse decoder for the printable-opcode encoder detailed above, and attach the encoded program to it. The resultant program is shown below, in NASM assembly format.

linenoise.com: Input program to be encoded

    mov al, 0x13
    int 0x10               ; Set 320x200x8 graphics mode
    push 0xA000
    pop es                 ; Set destination segment

lineloop:
    mov cx, 0x0140         ; For 200 lines
    in ax, 0x40            ; Get a "random" number from the timer
    and ax, 0xBF
    mul cx
    mov di, ax             ; Write to that line on screen (mod 191)
    in al, 0x40            ; Get a "random" number
    rep stosb              ; Write a line of that colour

    in al, 0x60            ; Check for a key
    dec al
    jnz lineloop           ; Loop back if not ESC

end:
    ret                    ; Return to DOS
 

linenoise-encoded.com: Encoded into printable opcodes

    pusha               ; Save all registers for post-decoding

    ; Initialise registers and zero-holder
    push ax
    push ax
    push ax
    push ax
    push ax
    pop edx
    pop ebx
    pop cx

    ; Rewrite jpouter and jpinner
    xor al, jpouter-256-96     ; Get a printable-range 8-bit value
    xor ax, 0x235B
    xor ax, 0x225B             ; Add 256
    push ax
    pop si                     ; Use this as the address
    push bx
    pop ax
    xor al, 0x20
    sub al, 0x55               ; AL = 0xCB
    xor [si+0x60], al          ; Opcode = 0xEB (JMP rel)
    xor [si+0x62], al
    xor al,0x34                ; AL = 0xFF
    xor [si+0x61], al          ; Set jump point to known value
    xor [si+0x63], al

    inc ax                     ; AX = 0x0100
    push ax
    pop bp                     ; BP = 0x0100

    ; Set destination point (CS:1000 - 32)
    xor ax,0x3E30
    xor ax,0x3030
    sub al,32
    push ax
    pop di

    ; Set source point (+256+8)
    push bx
    pop ax
    xor ax,0x3120
    xor ax,0x307E
    push ax
    pop si

    ; Perform the decoding
lpouter:
    ; In blocks of 4
    push bx
    push bx
    pop eax
    and [di+0x20],eax
    inc cx
    inc cx
    inc cx
    inc cx

    ; Read 6 bits
lpinner:
    push bx
    pop ax

    db 0x32, 0x44, 0x41           ; xor al, [si+(source-$)]
    sub al,32
    cmp al,94
    je end

    ; Push along by 6 bits, shift in the new bits
    imul edx,[di+0x20],byte 64

    and [di+0x20],ebx
    xor [di+0x20],edx          ; [Dest] = Shifted val
    xor [di+0x20],al           ; [Dest] += next
    inc si
    dec cx
    jnz jpinner

    ; Once the block of 4 is done, DI += 3
    inc di
    inc di
    inc di
    jnz jpouter                ; Jump indirect

    ; Jump padding, since the above code is 25 bytes long
    db '@@@@@@@'

end:
    push bp
    pop ax                     ; AX = 0x0100
    
    ; Rewrite jump to CS:1000
    xor al,jpouter-256-64
    push ax
    pop si
    push bx
    pop ax
    and [si+0x60],ax
    and [si+0x62],ax           ; Clear opcode
    sub al,0x37
    xor al,0x20                ; AL = 0xE9
    xor [si+0x60],al           ; JMP (rel16)
    push bx
    pop ax
    xor ax,0x3E30
    xor ax,0x3072              ; AX = 0x1000 - $
    xor [si+0x61],ax
    dec di                     ; Clear Z flag
    popa                       ; Retrieve all registers
    jnz jpouter+32             ; Jump indirect to CS:1000

    ; Rewritable jump points
jpouter:
    db 0x20
    db ($-lpouter)
jpinner:
    db 0x20
    db ($-lpinner)

    ; The encoded program
source:
    db 'S1.P &@0N0>@Y0% OR5 X?< Y,>)JO- _F#DZG7(___#'
    db 126                     ; EOF marker

The above encoded program runs in the same manner as the original, and produces the following output:

Linenoise output
Figure 1: Output of linenoise-encoded.com

Caveats and conclusion

There are a few issues with this encoding mechanism. Firstly, it is heavily tied to MS-DOS .com programs, since the decoder is itself a DOS program and relies on the initial conditions provided by the MS-DOS program loader. Furthermore, it has a fixed destination for the decoded program of CS:1000h, which only allows programs of a little under 4 kilobytes to be used without modifying the decoder.

These issues, however, can be overlooked if the use of the encoding mechanism is kept to the domain of graphic demos and other simple programs which are to be written using printable opcodes. That domain may, of course, not be a large one.

Imran Nazar <tf@imrannazar.com>, Jun 2011.