# An Abstract Theory of Computer Viruses

Advances in Cryptology - CRYPT0 '88, LNCS 403, pp. 354-374, 1990
1990

17

Research supported by NSF through grant CCR 8519296

## 1 Introduction

In recent years the detection of computer viruses has become common place. It appears that for the most part these viruses have been 'benign' or only mildly destructive. However, whether or not computer viruses have the potential to cause major and prolonged disruptions of computing environments is an open question.

Such basic questions as:

1. How hard is it to detect programs infected by computer viruses?
2. Can infected programs be 'disinfected'?
3. What forms of protection exist?
4. How destructive can computer viruses be?

have been at most partially addressed [Col][Co2]1. Indeed a generally accepted definition of computer virus has yet to emerge.

For these reasons, a rigorous study of computer viruses seems appropriate.

## 2 Basic Definitions

For the purpose of motivating the definitions which follow, consider this (fabricated) case study':

A text editor becomes infected with a computer virus. Each time the text editor is used, it performs the text editing tasks as it did prior to infection, but it also searches the files for a program and infects it. When Tun, each of these newly infected programs performs its intended' tasks as before, but also searches the files for a program and infects it. This process continues. As these infected programs pass between systems, as when they are sold,OTgiven to others, new opportunities for spreading the virus are created. Finally, after Jan. 1, 1990, the infected programs cease acting as before. Now,each time such a program is run, it deletes all files.

Such a computer virus can easily be created using a program scheme (in an ad hoc language) similar to that found in [Col]:

{main:=
call injure;
...
call submain;
...
call infect;
...
}

{injure:=
if condition then whatever damage is to be done and halt
}

{infect:=
if condition then infect files
}


where for the case study virus':

{main:=
call injure;
call submain;
call infect;
}
{injure:=
if date ≥ Jan. 1,1990 then
while files ≠ 0:
file = get-random-file;
delete file;
halt;
}

{infect:=
if true then
file = get-random-executable-file;
rename main routine submain;
prepend self to file;
}


By modifying the scheme above, a wide variety of viruses can be created. Even helpful' viruses may be created. For example the following minor variant of Cohen's [Col] compression virus which saves storage space:

{main:=
call injure;
decompress compressed part of program;
call submain;
call infect;
}

{injure:=
if false then halt
}

{infect:=
if executable-files ≠ 0 then
file = get-random-executable-file;
rename main routine submain;
compress file;
prepend self to file;
}


With the case study virus' and all of those which could be created by the scheme above, it appears that the following properties are relevant:

1. For every program, there is an infected'form of that program. That is, it is possible to think of the virus as a map from programs to (infected') programs.
2. Each infected program on each input (where here by input is meant all accessible' information: e.g. the user's input, the system's clock, files containing data or programs) makes one of three choices:
Injure:
Ignore the intended' task and compute some other function. Note that in the case study, which inputs result in injury (i.e. those where the system clock indicates that the date is Jan. 1, 1990 or later), and what kind of injury occurs (file deletion) are the same whether the infected program is a text editor or a compiler or something else. Thus which inputs result in injury and what form the injury takes is independent of which infected program is running and is actually dependent solely on the virus itself.
Infect:
Perform the intended' task and if it halts, infect programs. Notice in particular that the clock, the user/prograin communications and all other accessible' information other than programs, are handled just as they would have been had the uninfected version of the program been run. Further, notice that whether the infected program is a text editor or a compiler or something else, when it infects a program the resulting infected program is the same, Thus the infected form of a program is independent of which infected program produces the infection.
Imitate:
Neither injure nor infect. Perform the intended' task without modification. This ma3 be thought of as a special case of Infect', where the number of programs getting infected is zero. (In the case study, imitation only occurs when no programs are accessible for infection).

A formal definition of computer virus is presented next.

Notation 1

1. denotes the set of all finite sequences of natural numbers.
2. denotes a computable injective function from onto with computable inverse.
3. For all denotes .
4. For all partial , for all , denotes .
5. denotes a computable injective function from onto with computable inverse such that for all .
6. For all denotes .
7. For all partial , for all denotes .
8. For all partial , for all , write is defined.
9. For alI partial , for all , write is undefined.

Definition 1 For all partial , for all either:

Definition 2 For all , for all , for all partial functions iff:

1. there exists an , with such that and
2. for , either

Definition 3 For all partial , for all iff .

Definition 4 For all partial , for all iff .

Definition 5 For all Gödel numberings of the partial recursive functions , a total recursive function is a virus with respect to iff for all , either:

1. Injure:
2. Infect or Imitate:

Remark 1 The choice of symbols above is intended to suggest the decomposition of all accessible'information into data' (information not susceptible to infection) and programs'(infomation susceptible to infection).

## 3 Types of Viruses

In this section the set of viruses is decomposed into the disjoint union of four principal types. The nature of so called Trojan horses' is considered.

Definition 6 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , for all :

• is pathogenic with respect to and iff
• is contagious with respect to and iff
• is benignant with respect to and iff
• is a Trojan horse with respect to and iff
• is a carrier with respect to and iff
• is virulent with respect to and iff

When there exists a unique such that (e.g. when is injective) then if is pathogenic (contagious, benignant, a Trojan horse, a carrier, virulent) with respect to and , the reference to will be dropped and will be said to be pathogenic (contagious, benignant, a Trojan horse, a carrier, virulent) with respect to .

Hence, if with respect to some virus an infected program is benignant, then it computes the same function as its uninfected predecessor. If it is a Trojan horse then it is incapable of infecting other programs. It can only imitate or injure, and under the right conditions it will do the latter. If it is a carrier, it is incapable of causing injury but under the right conditions it will infect other programs.

Definition 7 For all Gödel numberings of the partial recursive functions , for all viruses with respect to :

• is benign iff both:
• is Epeian2 iff both:
• is disseminating iff both:
• is malicious iff both:

The next theorem records some simple facts about types of viruses.

Theorem 1 For all Gödel numberings of the partial recursive functions for all viruses with respect to :

1. is benign iff
2. if is Epeian then
3. if is disseminating then

Proof

Part 1 follows immediately from the recursion theorem.

All other parts follow immediately from the definitions.

Thus, all programs infected by a benign virus are benignant with respect to their uninfected predecessors. They function just as if they had never been infected. Viruses in this class appear to be the least threatening. This class includes many degenerate' viruses such as the identity function and padding' functions.

Programs infected by an Epeian virus can only be benignant or Trojan horses with respect to their uninfected predecessors. Further the latter option must sometimes occur. Epeian viruses will not be able to spread themselves; however, an infected program may imitate the intended' task of its uninfected predecessor until some trigger' causes it to do damage. Among the Epeian viruses are the degenerate' class of constant functions, which never imitate-or-infect but only injure.

Programs infected by a disseminating viruses can only be benignant or carriers with respect to their uninfected predecessors. Further the latter option must sometimes occur. Thus programs infected with such viruses are never pathogenic. However, it is worth noting that disseminating viruses may modify the size of programs or their complexity characteristics, and by this means become detectable or cause harm (or benefit as in the case of the compression virus). In fact, size and complexity may be important properties when considering viruses. An extension of the current theory to account for size and complexity seems appropriate (see § further research).

Malicious viruses can both spread and produce injuries. They appear to be the most threatening kind of virus. The case study virus' in § basic definitions is malicious.

Remark 2 It may be appropriate to view contagiousness as a necessary property of computer viruses. With this perspective, it would be reasonable to define the set of viruses as the union of the set of disseminating viruses and the set malicious viruses, and to exclude benign and Epeian viruses altogether.

## 4 Detecting The Set Of Viruses

The question of detecting viruses is addressed in the next theorem:

Theorem 2 For all Gödel numberings of the partial recursive functions :

Proof

Let . It is well known (§13 and §14 [Ro]) that is .

To establish that , let (for examplelet be an index for the identity function) and consider the function such that for all :

Then is a partial recursive function. Let be an index for , and let , be such that:

where is as in the theorem [Ro].

Then is a total recursive function and:

It follows that

Thus . It follows, as in §7.2 [Ro], that as desired.

To establish that , consider the following formula for which arises directly from the definition of virus:

Where is a step counting' predicate for such that:

And where is a predicate for such that:

Since for all acceptable Gödel numberings of the partial recursive functions it is easily seen that there exist recursive predicates and as above, it follows that .

Thus detecting viruses is quite intractable, and it seems unlikely that protection systems predicated on virus detection will be successful.

## 5 Isolation As A Protection Strategy

As noted in [Co1] isolating a computing environment from its surroundings is a powerful method of protecting it from viruses. For example, if no new programs can be introduced, no old programs can be updated, and no communication can occur, then it seems viruses are no threat.

Unfortunatly, such isolation is unrealistic in many computing environments. The next theorems explore the possibility of protecting computing environments with less severe forms of isolation.

Definition 8 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , let:

The infected set of

Definition 9 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , is absolutely isolable iff is decidable.

Clearly if a virus is absolutely isolable, then (at least in theory) it can be neutralized. Whenever a program becomes infected, it is detected and removed. The following is a simple fact about absolutely isolable viruses:

Theorem 3 For all Gödel numberings of the partial recursive functions , for all viruses with respect to if for all then is absolutely isolable.

Proof trivial.

Thus the case study virus, as implemented using the scheme in §basic definitions would be absolutely isolable. In fact, what little experience with viruses there is to date seems to suggest that in practice people who produce viruses begin by producing ones with the increasing property necessary for theorem 3 to apply. Unfortunately, not all viruses have this property. For example, with any reasonable compression scheme, the compression virus of §basic definitions would not have this property. Nonetheless, the compression virus is absolutely isolable. Given a program with the proper syntax, it is in the infected set if and only if decompressing the compressed part results in a legitimate program.

Is every virus absolutely isolable?

Regretably, the next theorem shows that the answer is no.

Theorem 4 For all Gödel nurnberings of the partial recursive functions , there exists a total recursive function such that:

1. is a malicious virus with respect to
2. is .

Proof

Let be a total recursive function such that:

Let be a 1 - 1 total recursive function such that for all :

Such a function, known as a padding function, exists by Proposition 3.4.5 [MY].

Let be such that:

where is the least natural number such that, for all with .

Then is a monotonically increasing total recursive function and by (1), it follows that:

Let be such that for all :

Then since is monotonically increasing, it follows that is a total recursive function.

Consider the function such that for all and :

where for all , denotes the one element sequence in consisting only of .

Then by standard arguments, is a total recursive function and:

Let be such that for all :

Then is a total recursive function and it follows from (2) and (3) that:

Applying the s-m-n theorem there exists a total recursive function such that for all :

By the recursion theorem, there exists an such that for all :

Let . Then is a total recursive function since is.

Let , then using that fact that is a total recursive function and applying (4) gives:

1 of the theorem now follows directly from the definition of malicious virus.

Since, for all total recursive functions is recursively enumerable, it follows that .

Let be such that for all . Since is 1 - 1 so is . Then implies the existence of a such that . Let , then:

On the other hand, assume and . Then there exists an such that:

Since is 1 - 1, it follows that . . Hence, and 2 of the theorem holds.

Thus, for the viruses described in the previous theorem, protection cannot be based upon deciding whether a particular program is infected or not. Paradoxically, despite this, it is often possible to defend against such viruses. How such a defense could be mounted will be described below; however, a few definitions are in order first.

Definition 10 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , let:

The germ set of

Thus the germs of a virus are functionally the same as infected programs, but are syntactically different. They can infect programs, but cannot result from infection. They may start epidemics', but are never propagated with them.

Definition 11 For all Gödel numberings of the partial recursive functions , for all virzlses with respect to , is isolable within its germ set iff there exists an such that:

1. .
2. is decidable.

Notice that if a virus is isolable within its germ set by a decidable set , then not allowing programs in the set to be written to storage or to be communicated will stop the virus from infecting. Further, the isolation of some uninfected germs by this process appears to be an added benefit.

Returning now to the viruses described in the previous theorem: assume that the function above had the property that for all . The proof of the previous theorem could easily have been modified to assure this. Further, in Godel numberings derived in the usually fashion from natural programming languages, a constructed in a straightforward manner would have this property. Consider the set

By the monotonically increasing property of and the property of which is being assumed, is decidable. On the other hand if then there exists such that

And it follows that . On the other hand if then there exist an such that

By (2) and (4):

And hence as desired.

Thus viruses like the ones in theorem 4 demonstrate that decidability of is sufficient but not necessary for neutralization. Apparently, more work needs to be done before a clear idea of the value of isolation will emerge. Are all viruses isolable within their germ set? The answer is no (proof omitted). Are all disseminating viruses isolable within their germ set? The answer is not known. Are there notions of isolation which provide significant protection at a reasonable cost?

## 6 Further Research

The study of computer viruses is embryonic. Since so little is known, virtually any idea seems worth exploring.

Listed below are a few avenues for further investigation.

1. Complexity theoretic and program size theoretic aspects of computer viruses.

Introduce complexity theory and program size theory into the study of computer viruses. As noted earlier, even disseminating viruses may affect the complexity characteristics and size of infected programs and as a result become detectable or harmful.

Complexity theory and program size considerations can be introduced at a abstract level (see for example [MY]) or a concrete level.

For example, viruses in the real world' would probably have the property that the running time of an infected program, at least while imitating or infecting, would be at most polynomial (linear) in the running time of its uninfected precursor. Does this class of polynomid (linear) viruses' pose a less serious threat? Do NP-completeness considerations, or cryptographic considerations come into play?

2. Protection Mechanisms

In this paper one form of protection mechanism, isolation, was briefly considered. In addition to considering isoIation in greater depth, numerous other possibilities exist. For example:

Quarinteening
Is there value in taking a new program and running it in a safe environment for a while before introducing it into an environment were it could spread or do harm? For example, putting the new program on an isolated machine with dummy infectable programs and with a variety of settings of the system clock might evoke behavior indicative of infection. In particular would this be helpful with the class of polynomial viruses or linear viruses?
Disinfecting
Under what circumstances can an infected program be disinfected? Certainly when a virus is absolutely isolable there exists a procedure which when given an infected program will return a program which infects to' the original one. How general is this phenomena?
Certificates
Can some programs be given a clean bill of health'? For example, if it is know that a certain virus is about, would it be possible for a vendor to prove' that his program was not in the germ set? Would it be possible to prove that the software was not in the germ set of a large class of viruses?
Operating System Modification
Could modifications to the operating system provide some protection. For example, assume that the (secure) operating system required that the user initiate' a3l new programs by designating the files which the program is given the privilege to read and write. Then, for example, a simple program (e.g. a game) could be given only the privilege to read and write files it creates. If the program was uninfected it might perform satisfactorily under this constraint, If however the program was infected, this constraint might severely limit the damage due to the virus. (This example arose during joint work with K. Kompella).
3. Other Models Of Computer Viruses.

The notion of computer viruses presented here is not the only one possible. It was selected because it seemed to be art adequate place to begin an investigation. More general, and more restrictive notions are possible. Indeed it seems possible that no definition will conform to everyone's intuitions about computer viruses'.

More machine dependent' approaches could be considered. Approaches which take into account the communications channels over which viruses pass seem particularly important.

One interesting generalization of the current notion is inspired by [Co1], where viruses are assumed to be capable of evolving. The Mutating Viruses' (-viruses) partially defined next are an attempt to capture this property.

Definition 12 For all , for all , for all sets of partial functions from to , iff:

1. there ezists an , with such that and
2. for , either
1. there exists an such that and .

Definition 14 For all sets of partial functions from to , for all partial , for all iff .

Definition 15 For all Gödel numberings of the partial recursive functions a set of total recursivefunctions is a mutating virus, -virus, with respect to iff both:

1. for all , for all either:
1. Injure:
2. Infect or Imitate:

Some computer viruses which have recently caused problems (e.g. the so called Scores virus' [Up] which attacked Macintosh computers) are -viruses and not just viruses. Hence this generalization of the notion of virus may be of more than theoretical interest.

This is only a partial definition because some notion of connectivity' is needed. That is, the union of two -viruses, neither of which evolves' into the other should not be a -virus. Many definitions of connectivity' can be defined, but further study will be required to choose those which are most appropriate. Once an appropriate choice is made, an important question will be whether the set of infected indices of a -virus can be harder to detect than those of a virus.

4. Computer Organisms.

This issue has evolved during joint work with K. Kompells.

There appear to be programs which can reproduce or reproduce and injure but which are not viruses (e.g. programs which just make copies of themselves but never infect'). These computer organisms' may be a serious security problem.

It may be appropriate to study computer organisms' and treat computer viruses' as special case.

## 7 Acknowledgments

I would like to thank Dean Jacobs, and Gary Miller for contributing their ideas to this paper.

I would also like to thank two of my students: Fred Cohen and Kireeti Kompella. Cohen brought the threat of computer viruses to my (and everyone's) attention. Kompella has spent many hours reviewing this work and has made numerous suggestions which have improved it.

## References

• [Co1] Cohen F. Computer Viruses. Ph.D. dissertation, University of Southern California, Jan. 1986.
• [Co2] Cohen F. Computer Viruses - Theory and Experiments. Computers and Security 6 (1987) 22-35. North-Holland.
• [MY] Machtey M, Young P. An introduction to the general theory of algorithms. North-Holland, NY 1978.
• [Ro] Rogers, H Jr. Theory of Recursive Functions and Effective Computability. McGraw-Hill Book Co., NY 1967.
• [Up] Upchurch, H. The Scores Virus, unpublished manuscript, 1988.

1 It appears that F. Cohen is the first researcher in an academic setting to consider the practical and theoretical aspects of computer viruses. The formalism presented here differs considerably from that explored by Cohen [Col][Co2].

2
Now rhift your theme, and ring that wooden horse
Epeior built, inrpired by Athena -
the ambuscade Odysseus filled with fighters
and rent to take the inner town of troy

The Odyssey of Homer, 8.492-495.
translation by Robert Fitzgerald
Doubleday & Co.,NY, 1961