VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Dealing with Metamorphism

Myles Jordan
Virus Bulletin, 1 Oct 2002
ISSN 0956-9979
October 2002

[Back to index] [Comments]

When the virus writer known as z0mbie released Win95.Zmist.A in early 2001, much of the attention paid to this virus by the AV community was directed at its remarkable ability to intersperse its own code with that of its infection target. However, this virus also embodied the continuation of z0mbie's work on viral evolution towards metamorphism - a form of camouflage being developed by virus writers that is so potent and radically different from common encryption that AV scanners will soon need powerful new tools to confront this threat. This article will discuss one possible method that AV scanners could use to deal with metamorphism.

The Technology Arms Race

Computer virus technology has been undergoing continuous evolution since the development of the very first viruses. At every stage, AV technology has been forced to undergo correlated evolution in order to continue to detect viruses equipped with the latest stealth and protection mechanisms. These early viruses had no means of camouflage, and were easily recognised by AV scanners using simple template matching. When encryption was developed for viruses in order to camouflage their bodies, AV scanners began to use algorithmic methods instead of template matching to detect these encrypted viruses. When encryption evolved through oligomorphism into polymorphism, AV scanners developed emulation to detect what would be excessively difficult for algorithmic detection methods.

It is evident that the majority of viral technology development has always been aimed at disguising the actual viral code via fixed, oligomorphic or polymorphic encryption. Each of these techniques involves encrypting the virus body, and supplying a layer of code to decrypt when necessary. Of these techniques, polymorphism is by far the most powerful, with, theoretically, a virtually infinite number of different decryption code layers (or "wrappers") that could be created. However, it is the fundamental property of polymorphism as merely a wrapper that limits it as a form of camouflage; no matter how good the polymorphism, after sufficient emulation of the decryption code, the body of the viral code is laid bare and can be easily recognized, even by template matching. As the following figure demonstrates, once the decryption layer is penetrated, the virus body is just as open to examination as if the virus had never been encrypted.

Ultimately, despite its seemingly infinite complexity, polymorphism can only ever be a finite problem.

The Evolutionary Process

There has been a concurrent stream of development that does not necessarily involve encryption at all. Rather, this branch of camouflage techniques involves modifying the body of the virus itself instead of modifying some form of decryption wrapper. This is commonly known as metamorphism.

The first attempt at metamorphism in PE viruses was Win32.Apparition. This virus carried around its source code and recompiled itself with some random "junk code" that was inserted whenever it discovered an appropriate compiler. This technique seemed never to be fully realised, presumably due to the small percentage of computers with an appropriate compiler available.

Following this was a more direct, though relatively simple, attempt at metamorphism in Win95.Regswap, which swapped the registers used to perform particular tasks. This form of camouflage was carried a step further by the Win32.Evol virus, which swaps certain instructions for different ones that perform the same function. It is also able to insert junk code between the essential instructions.

In early to mid 2000, two so-called permutating viruses were released: Win95.Ghost and Win95.Smash. These permutating viruses have the ability to split their code into blocks, and then change the order in which these blocks appear in the code. This of course does not change the functionality of the virus; it merely changes the structure, making template matching difficult. This technique is demonstrated in the following figure:

This technique of permutation was also improved upon in Win95.Zperm, which is able to completely reorganize its code and insert jump instructions wherever necessary to maintain the correct flow of control through the virus's code. As mentioned above, Win95.Zmist was yet another step in the development of metamorphic techniques, and contains the ability to reorganize its code, insert junk instructions, and also perform instruction substitutions. This work has since been continued by the author of Win32.Metaphor (a.k.a. Win32.Etap), which contains even more advanced metamorphism.

An Observation

Win32.Metaphor's metamorphism works by disassembling its own code into a custom pseudo-code, which is a meta-language for describing the actions of the code of the virus without any reference to the actual code. Using this layer of abstraction, the virus dissociates function from implementation, theoretically allowing the virus to generate new copies of itself completely from scratch. This technique produces instances of the virus that appear very dissimilar, and yet function identically, which is of course the goal of metamorphic camouflage. The assembly code on the left of the figure below was generated by Win32.Metaphor, and is used to find the address of the Kernel32.dll. All of those lines of code are testament to the power of Win32.Metaphor's metamorphic engine, as they could be replaced by the equivalent five lines of code on the right side.

This example illustrates an important point: no matter the form of the metamorphically altered code, there are certain "higher" actions that are always performed. A higher action is a phrase used to describe the purpose of a related group of instructions, and can, for example, be anything from locating the Interrupt Descriptor Table (a single instruction, SIDT), to hooking an API (usually a small series of instructions), or even decryption (variable, but often a large number of instructions). In the above example, two higher actions are performed: the construction of the "Kernel32" string, and the calling of the GetModuleHandle API. Depending on how much detail is required, it could be considered as a single, even higher, action: "locate the Kernel32 dll in memory".

The dissociation of the function of the virus from its actual code is commonly used by AV scanning software for heuristic analysis, but it becomes particularly useful when dealing with metamorphics. In particular, an AV scanner's heuristics must be capable of analysing the effects of multiple individual instructions and coalescing these effects into higher actions, as demonstrated here:

The scanner can then heuristically analyse these actions, completely disregarding their implementation. This effectively makes the literal instructions used irrelevant, and thus bypasses a significant portion of the power of metamorphic viruses: the junk insertion and instruction substitution techniques.

This method of heuristic analysis can be equally well applied to all known types of metamorphic virus, even those that recompile themselves, such as Win32.Apparition. This is because the method of metamorphism also becomes irrelevant once the core functionality of the virus (which never changes between infections) can be examined.

Another Observation

The other metamorphic technique used currently is code reorganisation, or permutation. As mentioned previously, the first PE virus to use unrestrained code reorganisation was Win95.Zperm. It has the ability to move variable pieces of its code to anywhere within its body, and then insert jumps, invert conditional branches, or simply change the relative offsets in existing jumps. Combining these features with the ability to insert limited simple junk code, and it is evident that this virus could never be reliably detected by template; nor would an emulator seem to be of much use - there is nothing to decrypt. It appears as though a specialised algorithmic detection is required. Or is it?

As noted in the above example with Win32.Metaphor, it is possible to ignore the instructions themselves, and just analyse the higher actions of the code. The idea that the instructions themselves are irrelevant also extends to the apparent ordering of the code, which does not really matter either; the same higher actions are going to be performed in the same order no matter how much the code jumps around. So by employing an emulator with heuristics capable of discerning higher actions, it is also possible to circumvent the metamorphic technique of code reorganisation also.

A Powerful Tool

Originally, emulators were designed to allow AV scanners to generically decrypt simple, complex, or polymorphic encryption. This allowed the scanner access to the decrypted code, thus relieving the burden of running many specialised algorithmic detections. However, it has been demonstrated that all known metamorphic techniques can be thwarted by the use of an emulator coupled with heuristics capable of coalescing the effects of multiple instructions into higher actions.

But what exactly should be done with the higher actions once they have been discerned? A common solution is to simply collect them and analyse them later, looking for particular static sets of actions that would indicate a virus or virus family. This type of analysis has been around for a long time, but is notoriously prone to both inaccurate virus family recognition and outright false alarms. The problem with inaccurate family recognition arises because many viruses share similar functionality (eg. infecting files), and the problem with false alarms arises as many legitimate programs also use similar functionality (eg. searching for files, and also writing to them).

Fortunately, the simplistic technique described above is not the only way to analyse higher actions. In fact, that technique discards important, implicit information regarding the higher actions - namely the chronological order in which they occurred. This "ordering" information can be crucial in discriminating between a sequence of actions which is viral, and a sequence of actions which is harmless. For example, consider the following sequence of higher actions:

  1. memory map file
  2. close memory map
  3. modify memory area file was mapped to

If a heuristic analysis was done of these higher actions, using a discrete set analysis technique, these actions may fall into an 'infect file' set of actions, as shown below:

However, these actions occurred as a chronological sequence of events, and should be seen as such, as shown here:

If these actions are stored as a sequence instead of as a set, then it would be apparent during the heuristic analysis that these actions are not indicative of viral activities. Thus a potential false alarm situation is avoided, solely due to the inclusion of the chronological ordering information in the heuristic analysis.


This heuristic analysis of chronologically ordered higher actions has proven useful in decreasing the susceptibility of heuristic analysis to false alarms, and it continues to demonstrate its effectiveness against all known forms of metamorphism in computer viruses. It is interesting that an answer to the seemingly infinite complexity of metamorphism is to disregard the smoke and mirrors, and simply examine the meaning.

[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka