Virus Detection with Message Digests

Display mode

Back to Quick Hacks

Files stored in a computer are simply streams of numbers. A text file, for instance, has a number for each letter or number in the text; a graphic file has a number for each element of colour. Computer programs are also files: each number tells the computer to perform a function like adding two numbers together, or storing a number in another file.

The problem is that, because programs are just files like any other files, they can be changed: certain numbers modified, or new streams added in the middle of the file. This is one way in which viruses are able to infect a file; they may add to a program file, and the next time that program is run, the virus is executed and performs the functions it was designed to do.

One of the best ways to alleviate this is to use a technique known as the 'message digest': a number which describes the entire contents of a file. A simple example would be an algorithm like the following:

If each letter of the alphabet is assigned a number, A being 1, B being 2 and so forth, it's possible to add all the letters in a text file together, to obtain a "check-sum"; a sum of the contents of the file, which can be used to check it. As an example, HELLO WORLD would add up to 127.

One of the problems with this simplest example of message digest algorithm is that the space could be removed entirely, and the checksum would be the same; also, the message could be changed and more content added, as long as the total of all the letters was still 127. As a result, more complex algorithms have been developed to take these issues into account.

When these methods are used on a program file, they generate a number which is unique to the combination of numbers within that file. If anything changes, like an instruction being changed or instructions being added by a virus, the digest number will change.

This can be used to detect viruses every time a program is run. The process is relatively simple: when the program is first installed, a digest is generated based on that program file. Every time the program is used, the digest is again calculated, and if the numbers differ, some outside agent has changed the file.

One of the most popular implementations of this system is used inside Windows, known as File Protection; it's used on files which are important to the system, such as device drivers. If Windows detects that one of the files has a different digest to that which it knows about, the user will be alerted to the fact that a system file has been changed. Many other systems are also in place in other software packages to perform similar functions.