Suzana Stojakovic-Celustka

*Alive Vol I, Issue 1*

*July 1994*

- 1. Good representations
- 2. The knowledge and the language
- 3. Where we are?
- 4. The end is new beginning
- 5. References

There is a problem which bothered me since the results of Contest for the Best Virus Definition were published. It seemed that plain language was not suitable to define computer virus properly. Well, the problem of good definition of whatever is not anything new.

Looking for the recipe how to make good definitions I found some books. The first one is "Artificial Intelligence" by Patrick Henry Winston [5]. There are few words there about good representations:

"...In general, a representation is a set of conventions about how to describe a class of things. A description makes a use of the conventions of a representation to describe some particular thing.

Finding the appropriate representation is a major part of problem solving. Consider, for example, the following children's puzzle:

- The Farmer, Fox, Goose and Grain:
- A farmer wants to move himself, a silver fox, a fat goose, and some tasty grain across a river. Unfortunately, his boat is so tiny he can take only one of his possessions across on any trip. Worse yet, an unattended fox will eat a goose, and an unattended goose will eat grain, so the farmer must not leave the fox alone with the goose or the goose alone with the grain. What is he to do?

Described in English, the problem takes a few minutes to solve because you have to separate important constraints from irrelevant details. English is not a good representation.

Described more appropriately, however, the problem takes no time at all, for everyone can draw a line from the start to finish in figure 1. instantly. Yet drawing that line solves the problem because each boxed picture denotes a safe arrangement of the farmer and his possessions on the banks of the river, and each connection between pictures denotes a legal crossing. The drawing is a good description because the allowed situations and legal crossings are clearly defined and there are no irrelevant details.

-------- -------- | Grain | | Farmer | | ====== |-->| Goose | | Farmer | | Grain | | Fox |<--| ====== | | Goose | | Fox | |________| |________| ^ | ^ | | V | V -------- -------- -------- -------- -------- -------- | Farmer | | Fox | | Farmer | | Goose | | Farmer | | ====== | | Fox |-->| Grain |-->| Fox | | ====== |-->| Goose |-->| Farmer | | Goose | | ====== | | Grain | | Farmer | | ====== | | Fox | | Grain |<--| Farmer |<--| ====== | | Fox |<--| Fox |<--| Goose | | ====== | | Goose | | Goose | | Grain | | Grain | | Grain | |________| |________| |________| |________| |________| |________| ^ | ^ | | V | V -------- -------- | Fox | | Farmer | | ====== |-->| Fox | | Farmer | | Goose | | Goose |<--| ====== | | Grain | | Grain | |________| |________| Figure 1. ( ====== denotes a river)

The representation principle:

Once a problem is described using an appropriate representation, the problem is almost solved..."

Reading this, one could say: "Oh, I knew that. What is so special? If I can describe problem properly then solution is not so far. But, I should know something about the problem first.."

Yes, here we come. What is the knowledge at all? Another interesting book "The Tao of Physics" by Fritjof Capra [2] says:

"...Rational knowledge is derived from the experience we have in objects and events in our everyday environment. It belongs to the realm of the intellect whose function is to discriminate, divide, compare, measure and categorize. In this way, a world of intellectual distinctions is created; of opposites which can only exist in relation to each other.

Abstraction is a crucial feature of this knowledge, because in order to compare and to classify the immense variety of shapes, structures and phenomena around us we cannot take all their features into account, but have to select a few significant ones. Thus we construct an intellectual map of reality in which things are reduced to their general outlines. Rational knowledge is thus a system of abstract concepts and symbols, characterized by linear, sequential structure which is typical of our thinking and speaking. In most languages this linear structure is made explicit by the use of alphabets which serve to communicate experience and thought in long line of letters..."

Here comes the question again: how much is the plain language suitable to describe natural world if it is an abstraction itself? Reading the same book further:

"...The natural world on the other hand, is one of infinite varieties and complexities, a multidimensional world which contains no straight lines or completely regular shapes, where things do not happen in sequences, but all together...It is clear that our abstract system of conceptual thinking can never describe or understand this reality completely. In thinking about the world we are faced with the same kind of problem as the cartographer who tries to cover the curved face of the Earth with a sequence of plane maps. We can only expect an approximate representation of reality from such a procedure, and all rational knowledge is therefore necessarily limited... To quote the semanticist Alfred Korzybski: 'The map is not the territory'...

...For most of us it is very difficult to be constantly aware of the limitations and of the relativity of conceptual knowledge. Because our representation of reality is so much easier to grasp than reality itself, we tend to confuse the two and to take our concepts and symbols for reality..."

Oh well, it is clearer now (or maybe not), but what to do? Especially in science where we need unambiguous descriptions. Ibidem:

"...The inaccuracy and ambiguity of our language is essential for poets who work largely with its subconscious layers and associations. Science, on the other hand, aims for clear definitions and unambiguous connections, and therefore it abstracts language further by limiting the meaning of its words and by standardizing its structure, in accordance with the rules of logic. The ultimate abstraction takes place in mathematics where words are replaced by symbols and where the operations of connecting the symbols are rigorously defined. In this way, scientists can condense information into one equation, i.e. into one single line of symbols, for which they would need several pages of ordinary writing..."

So, it seems that mathematics is a proper language for the science. Is it really? Continuing:

"...The view that mathematics is nothing but an extremely abstracted and compressed language does not go unchallenged. Many mathematicians, in fact, believe that mathematics is not just a language to describe nature, but is inherent in nature itself. The originator of this belief was Pythagoras who made the famous statement 'All things are numbers' and developed a very special kind of mathematical mysticism. Phytagorean philosophy thus introduced logical reasoning into the domain of religion...

...The scientific method of abstraction is very efficient and powerful, but we have to pay a price for it. As we define our system of concepts more precisely, as we streamline it and make the connections more and more rigorous, it becomes increasingly detached from the real world. Using again Korzybski's analogy of the map, we could say that ordinary language is a map which due to its intrinsic inaccuracy, has a certain flexibility so that it can follow the curved shape of the territory to some degree. As we make it more rigorous, this flexibility gradually disappears, and with the language of mathematics we have reached a point where the links with reality are so tenuous that the relation of the symbols to our sensory experience is no longer evident. This is why we have to supplement our mathematical models and theories with verbal interpretations, again using concepts which can be understood intuitively, but which are slightly ambiguous and inaccurate..."

It looks like a magic circle: real world - language - mathematics - language - real world. Where is the reality?

"...It is important to realize the difference between the mathematical models and their verbal counterparts. The former are rigorous and consistent as far as their internal structure is concerned, but their symbols are not related to our experience. The verbal models, on the other hand, use concepts which can be understood intuitively, but which are slightly ambiguous and inaccurate..."

Taking this trip through the theory we are coming back to the initial question: is natural language appropriate tool to define a computer virus? There is no doubt that computer viruses belong to the real world. One can try to define a computer virus using natural language only. As results of Contest for the Best Virus Definition and many bitter discussions on Virus-L show, such definitions are still very inaccurate. Even worse, everybody can define a computer virus on his or her own way which leads to confusion. Few mathematical definitions while more accurate are not widely understandable...

The one of most known mathematical definitions of computer virus was given by Fred Cohen. Here are few words from him about this subject:

A: Can the use of mathematics avoid ambiguity of plain language in definition of computer virus?

FC: I translate - Can the use of a precise and well defined language avoid ambiguity of plain language?...Mathematics is a subclass of the more general class of languages. All mathematics is linguisticly defined, therefore language, if used precisely, can be as accurate as mathematics. The real problem is that mathematics says a lot of things more concisely than language because it is essentially a set of macros. For linguistic definitions to work for regular people, they have to be short enough to remember and accurate enough to apply. Hence my very short linguistic definition:

A life form (substitute virus if desired) is an information structure that reproduces in a particular environment.

Well, I could summarize now what I have learnt about how to make a good definition:

- The first step is to check what is our knowledge about the problem. It is also a first level of abstraction, i.e. we cannot take all features of observed phenomenon into account, but have to select a few significant ones.
This process is common in everyday life. One evokes a "mental model" about some concept. What will such a "mental model" show depends on information one has collected about the subject till that moment. Such an information is usually different for every individual depending on his or her experience, education, source of information, interest, etc. In the case of computer viruses the knowledge will include the information about computers, programming, possibly biological viruses, etc.

The problem with "mental models" is that probably no two persons with the same "model" exist. Also exchange of "mental models" is not usual way of communication today.

- The second step is to find a representation for a "mental model", so one could share it with other people. It is the further level of abstraction, i.e. choice of a set of conventions about how to describe a class of things.
The most common tool one will use for description is natural language. It means one will describe a "mental model" using words which are sequences of letters from some alphabet. In fact, one is constructing a "natural language model" of phenomenon. To represent computer virus by English language the words used could be: "reproduction", "infection", "program", etc.

The problem with natural language is that there does not exist universal language which all people would understand (that problem is impressively demonstrated in the story of the Tower of Babylon [3]). Furthermore, even in the limits of one language, it can often happen that the same words will have different meanings for different people ("There are many different languages in the world, yet none of them is without meaning." - 1 Corinthians 14.10). It is what we call ambiguity and inaccuracy of natural language.

- The science and technique need unambiguous descriptions. For that reason it is necessary to abstract the language further. Such an extremely abstracted and compressed language is mathematics. This language is more accurate and precise than natural language. It is also universal for the people who understand it.
The problem with mathematics is that it is not a language which is commonly used for communication in everyday life. Mathematical models will be understood by particular groups of people only.

- To ease understanding of mathematical models to wider audience, they should be accompanied with verbal interpretations which will explain symbols used. The graphic representation of mathematical models is also useful. As it was shown in the example at the beginning of this text, drawings are pretty convenient descriptions in some cases.
The problem here arises when one separates verbal or graphic interpretation from mathematical definition. It may cause the similar confusion as stated in point two.

The above steps show different levels of abstraction (or modelling) one should pass to obtain an accurate definition. Each level has its own inherent problems. The accuracy required depends, in the last instance, on the environment where definition will be applied. In the case of computer viruses the most of the people will be satisfied with definition in natural language. It has to be stressed again that such a definition will be inaccurate due to ambiguity of natural language. The good technical definition of computer virus should be the mathematical one because of its accuracy and consistence. It should be also accompanied with verbal and graphical interpretations for better understanding.

Although above text does not give a good definition of computer virus immediately, it answers to some questions. Namely, it explains why the results of the Contest in technical categories were so poor. Simply, because mathematical and verbal parts were separated from each other in the guidelines of the Contest for the Best Virus Definition. It also explains the very good results in poetical category. The ambiguity of natural language was not an obstacle there, just the opposite, it was an advantage. Greater freedom in wording gave interesting results.

Talking again about technical definitions, there are new questions which bother me now. The natural language and mathematics follow different logic in their structure. The formal mathematical logic is monotone, i.e. if formula is provable in some theory T it is also provable in every theory T', where T is subset of T'. It means that the more initial axioms exist, the more new statements is possible to prove. It does not always work in real life. There are many universal statements in real life which have numerous implicit suppositions which are not possible to include initially. For example, from supposition that every bird flies, we can conclude that certain bird named Quido can also fly. Later we find out that Quido is a penguin and penguins do not fly. In that moment our system of reasoning should fall apart, because this fact is obviously controversial. Nevertheless, such a type of inconsistency is not an obstacle in everyday life. The natural language covers this inconsistency better. It can be said that natural language follows non-monotone logic. So, having a mathematical definition which is accompanied by verbal counterpart it is still questionable how they will match each other.

There is also the question how the final model or "picture" corresponds to reality, i.e. how to prove that it is true. That problem is not new. Ludwig Wittgenstein says in his Tractatus Logico-Philosophicus [6]:

"2.223 To recognize if picture is true or false, we should compare it with reality. (Um zu erkennen, ob das Bild wahr oder falsch ist, muessen wir es mit der Wirklichkeit vergleichen.)

2.224 From picture itself it is not possible to recognize if it is true or false. (Aus dem Bild allein ist nicht zu erkennen, ob es wahr oder falsch ist.)

2.225 An a priori true picture does not exist. (Ein a priori wahres Bild gibt es nicht.)

3 Logical picture of fact is thought. (Das logische Bild der Tatsache ist der Gedanke.)"

It is not so easy to answer the question of the truth. If we recall of Korzybski's analogy of the map, the main question remains: How to find the map which will cover the territory on the best way?

- Anzenbacher A., Introduction to Philosophy, SPNP, 1990. (in Czech)
- Capra F., The Tao of Physics, Shambhala Publications Inc., 1975.
- Good News Bible, The Bible Societies, 1990.
- Marik V., Stepankova O., Lazansky J., et all, Artificial Intelligence I, Academia Praha, 1993. (in Czech)
- Winston P.H., Artificial intelligence, Third edition, Addison - Wesley Publishing Company, 1992.
- Wittgenstein L., Tractatus Logico-Philosophicus, Oikoymenh, Prague, 1993. (in Czech with original German text)
- E-mail conversation with Fred Cohen

By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! vxer.org aka vx.netlux.org