Winfried Gleissner

*Computers & Security, 8, 1989, pp.35-41*

*ISSN 0167-4048*

*February 1989*

[Back to index] [Comments]

Industrieanlagen-Betriebgesellschaft mbH, Ottobrunn, FRG

- 1. The Situation
- 2. The Notation
- 3. The Recursion Formulae
- 4. The Calculation of the Infection Probabilities
- 5. Conclusions
- 6. Numerical Examples
- References

A model is introduced to treat the spread of computer viruses mathematically. A recurrence formula is given which allows a closed expression to be derived for the probability that, starting from an initial state, a given viral state will be reached after executing exactly *k* programs. In some special cases this recurrence formula can be used for numeric computations, It is shown that the infection process does not stop before all programs are infected, which are visible for any infected program in the initial state.

*Keywords:* Computer viruses, Mathematical model of virus infection

0167-4048/89/$3.50 © 1989, Elsevier Science Publishers Ltd.

A computer virus is a program that can reproduce itself and modify other programs by including a possibly evolved copy of itself [1]. That means that whenever an infected program is called, the virus is implanted into another program, if there is any, of the account from where it was called. In refs 1 and 2 some experiments with this kind of program are described. The result was that within a few hours the whole computer system was infected. The aim of this paper is to develop a closed expression for the probability that any given program is infected after a given time. To achieve this aim it must be known which programs were initially infected and for each program one needs the probability that it is called. The use of a computer is regarded as a sequence of program calls.

For the purpose of the present paper there is no difference between the call of a system command, a standard program (editor, linkage editor, compiler), and a user-written program.

In the computer system there are accounts. The programs in account are denoted by . Let denote the probability that user calls the program in the account . This gives

Let denote the set of all -vectors with integer components, which is defined as follows:

For the computer system is said to be in viral state , if in account there are infected programs. One can further assume that these are the first programs according to the alphabetic order in which the programs are written down. For convenience the viral state in which all programs are infected is denoted by , i.e.:

On one defines a modulus by

and an order by

Given the viral degree , one needs the probability that an infected program will be called from account

The probability that any not infected program is executed is calculated as

For purposes which will become clear later, one defines for and for

Let denote the probability that starting from viral state viral stae will be reached after exactly the program calls

The can be calcualted recursively as follows:

For denotes the vector whose th component is reduced by unity

Using this notation one obtains

The following recursion formula holds for the .

Lemma 1:

Proof: The assertion is proven by induction on For

and as

Induction from to

This ends the proof of the lemma.

For one defines

Let denote the set of all sequences of length of vectors with

with

if and differ in the th component then

For an element one defines the Vandermond determinant by

(For more details see ref [3] p.179.)

If the sequence is obtained from the sequence by deleting the last element, the following recursion formula holds:

For one obtains the determinant deleting the last line and the th column of . The following determinant is derived from the Vandermond form substituting the exponent for exponent in the last line:

and if

The can be calculated using the by the following lemma:

Lemma 2:

Proof: Adding the first line times to the negative of the last line one obtains

Now the following theorem can be proven without undue effort:

Theorem:

Proof: The assertion is proven by induction on the distance of and . For

yields

Induction from to By Lemma 1

With Lemma 2 one infers

and with this and eqn (2)

This ends the proof of the theorem.

The conclusions will be stated in the form of three remarks.

Remark 1: Accounts that do not call any infected programs cannot be infected. If for there exists an such that then for with

Proof: As there must be an infection in the account of the th user, i.e. for every there must be an index such that . As

For one defines

Remark 2: If is the initial viral state, the infection process does not stop before the viral state is reached. One has to show that for with

Proof: As for all . If all are distinct

If some are equal, once can choose sequences with

Using de l'Hopital's rule one calculates

and hence

Remark 3: The maximum possible viral degree is actually reached in the limit

Proof: The case where some are equal can be treated as the limit of the case where all are distinct. Hence only this case will be considered. The assertion shall be proven by induction on the distance between and . For and

Hence

For the induction denotes the viral degree which is obtained from by adding 1 to the th component. The distance between and is assumed to be . Then

**Fig.1 Infection process with the probability model**

Number of infection paths: Using the notation introduced above one can calculate the number of possible infection paths leading from an initial viral state to the final state .

Theorem: The following formula holds for the number of infection paths:

Proof: Whenever a program is infected one writes down the number of the account in which it resides. Thus the infection process is represented as a finite sequence of integers with elements between 1 and . The length of the sequence is

There are exactly entries with value in this sequence for . This gives for

The rest follows by induction on .

**Fig. 2 Infection process of a real system**

This shows that it is not feasible to do calculations for more than once account and a sufficiently large number of programs.

One account with programs: For convenience the assumption is made that all programs are called with equal frequency . The computation proceeds as follows. Initially only one program is infected. After each call of a program the expectation for the number of infected programs is calculated. As soon as it exceeds the number of initially infected programs by more than one, the process is started anew but with one more program, which is initially infected.

On the axis the number of runs of a program is plotted and on the axis the percentage of infected programs. Supposing that there are 80 programs of which only one is infected in the beginning the model shows that all of them are infected after 378 calls (Fig. 1).

This is contrasted with an infection process on a real system, where the number of the program to be called was determined by a random number generator. Figure 2 shows a typical infection process on a real system. The number of program calls, after which the whole account was infected, lay in the range from 230 and 700. The nature of the curve shows an exponential growth of the infection process for the real system as well as for the probability model. Assuming an average number of ten runs per hour this shows that after about 40 hours all programs are infected. The computations show that programs will be infected after approximately runs.

Takeover of users: The model can be used to say something when all users are infected instead of the takeover of all programs. This problem can be treated with similar arguments as above. Assuming that each account has only one program or that the virus infectes all programs of the account from which it was called, this case is reduced to the situation discussed earlier.

- F. Cohen,
*Computer Viruses, Theory and Experiments, Preprint,*University of Southern California, 1984. - F. Cohen, Computer Viruses,
*PhD Thesis*, University of Southern California, 1985. - S. Lang,
*Linear Algebra*, Addison-Wesley, Reading, MA, 1970.

**Dr. W. Gleissner** received a Masters degree in numerical analysis from Oxford University in 1972 and a Diploma in mathematics from the Technical University of Munich in 1973. His main fields of interest were interval arithmetic and approximation theory but later changed to pure mathematics and representation theory of infinite groups. He received a doctoral degree in 1978 from Technical University of Munich. He then joined the Munich Reinsurance Co. as a project leader for the computer-based administration of the life reinsurance business for client insurance companies from all over the world. Later he was in charge of introducing workstations and in 1985 he became responsible for guideline development and programming methodology in an institution produsing software for the administration of the Federal Republic of Germany. His main fields of interest now are computer viruses and security-related topics in Unix. He was published more than a dozen papers about mathematical models in economics (Industrieanlager-Betriebgesellschaft mbH, Einsteunstrasse 20, D-8012 Ottobrunn, F.R.G.)

By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! vxer.org aka vx.netlux.org