Maximize
Bookmark

VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Metamorphism in practice or "How I made MetaPHOR and what I've learnt"

The Mental Driller
29a [6]
February 2002

8
[Back to index] [Comments]

Index

  1. Basics
    1. What's metamorphism?
    2. Structure of a metamorphic code
  2. Planning
    1. Mentalization: we code with macros!
    2. What we are going to do?
      1. Simplistic: permutation/substitution
      2. "Accordion" model (shrink/expand)
      3. Pseudo-assembly language
  3. Coding
    1. Disassembler/depermutator (deobfuscator)
    2. Shrinker/emulator
    3. Permutator
    4. Expander (obfuscator)
    5. Reassembler
  4. Known problems (and solutions)
    1. Debugging your engine
    2. API calls
    3. Memory
  5. Future
    1. Plugins
    2. Multi-platform cross-infection
    3. Reassembling to different processors
  6. Conclusion

Greets

My greets go for all 29A members, for Vecna and Z0MBiE for being pioneers in the little explored field of metamorphism and for Vecna again, for advertising me in some points of the structure of the article.

0) Basics

a) What's metamorphism?

:P Obligated question and answer in an article like this :).

Metamorphism is the art of extreme mutation. This means, we mutate everything in the code, not only a possible decryptor. Metamorphism was the natural evolution from polymorphism, which appeared to evade virus scanners. With metamorphism, the difficulty to detect a virus grows exponentially.

Then, why aren't there more metamorphic viruses? Simple: they are extremely difficult to make, as I show in this article (not only for tech used, but for the many fux0ring problems we can when we code something like that). Anyway, we'll try to see here that maybe the important thing is to have the correct ideas (something that coders like Vecna, Z0MBiE and others had - hello! :).

b) Structure of a metamorphic code

A metamorphic virus is like a 49cc motorbike with a spaceshuttle fuel deposit (if you can handle such vision :). In fact, the 90%+ of the code is the metamorphism engine to mutate the little part dedicated to infection, which is somewhat paradoxical. The engine has to be able to mutate itself and the attached code that allows the engine to travel alone (oh). It's far from the little polymorphic engines that made a 8Kb virus to mutate with a single 200 bytes code length engine: this time is just the opposite! That's because we can't make a whole disassembler in 200 bytes (well, maybe Super can ;). Then, the structure is :

Engine Virus

And the structure of the engine is (thypically):

Disassembler Shrinker Permutator Expander Assembler

Now in detail:

Disassembler
The very start of the engine. The disassembler will decode every instruction to know the length of it, the registers it uses and all the information related with itself. It has also to be able to decode IP changing instructions like JMP and CALL in x86 (or BSR in others, for example).
Shrinker
Also called compressor, this part will compress the disassembled code generated in the previous generation (i.e. the code in this generation). This is done to avoid the in-every-generation growth that will render in a many-mega-bytes virus in very few generations, although you use a non expansion technique of mutation. So, this part depends on the type of your metamorphic code, and it's also the most difficult to make: in fact, very few viruses has this part. Basically, it compresses in one instruction what the expander coded in many. It can be also an emulator, eliminating do-nothing and redundant instructions and compressing operations into a single one (for example, MOV Reg,1234/ADD Reg,4321 --> MOV Reg,5555).
Permutator
A basic part of the metamorphism engine, and the one that many virus authors has coded to make metamorphic viruses, altough it's not metamorphism at all, since you leave the instructions unchanged. It's normally combined with other forms of metamorphism like instruction substitution (XOR EAX,EAX for SUB EAX,EAX, for example, and etc.). It's very simple in its concept, but very powerful since it breaks all the scan strings that can be used to detect the virus.
Expander
This part only exists when a shrinker is present (well, there are some viruses that have it, but the code in them grows uncontrollably). This does what the shrinker undoes: it recodes a single instruction to many instructions that perform the same.
Assembler
It recodes what we constructed with the expander. It fixes JMPs, CALLs and all that instructions, instruction lengths, changes the registers, etc. Or, if you are using an internal pseudo-assembler, it reassembles that code into the target processor language.

OK, these are the normal parts. Anyway, decide what you decide to make, you need a planning!! Don't commit the stupidity (sorry :) of making a hard metamorphic code (light is OK ;) without planning what you want to do. If not, the most probable thing that would happen is that you never finish the code. When coding the MetaPHOR virus I was planning the code for about two months, and I wanted to have clear in my mind what I wanted to do exactly. And, believe me, it helped me alot and I saved a lot of work. So, let's do what I did for planning (just if it helps you to plan your code).

1) Planning

a) Mentalization: we code with macros!

This part is very important: forget that you are touching instruction at machine level. What you do is "move this value to this register" or "add the content of this register to this variable". This makes easier the way you look at coding metamorphism (in fact, it's the same than when you code polymorphism). All the instructions and groups of instructions are macros of the operation you really want to do. The objective of this is learning to see the code as a bunch of instructions that doesn't rely on the final code, but in an operation that must be performed to make a bigger operation.

b) What we are going to do?

i) Simplistic: permutation/substitution

If this is the way you want to do metamorphism (or you don't want to complicate your life making a 400 Kb engine) this is your choice. Plan the way you are going to permutate the code (JMP linking, single-instruction shifting and NOP padding, etc.). This must be present always you code your virus, because the whole code must be compatible with the type. For example, if you select NOP padding you must code all the instructions with equal size, padding them with NOPs. Since it's the most simple way of doing meta, it's also the way that requires less planning: just code directly, and after you have finished the first release (and before you test), just pad with NOPs. Other way can be a macro that does:

        db 10h dup (90h) ; NOPs
        org $-10h
        <instruction>
        .align 4
 

Or something similar. Code a macro if you are a normal person, or do it directly if you are crazy ;).

Another way is to get the length of every instruction in run-time and then garble them in a buffer. This requires disassembly to fix JMPs, CALLs and Jccs (conditional jumps), so maybe it's not as simple as it can seem.

ii) "Accordion" model (shrink/expand)

What a name for a technique! XD The power of this kind of metamorphism is the fact that code maybe doesn't permutate, but it's always different (and can be combined perfectly with permutation, which generates "absolute metamorphism"). For this kind of meta you have to decide if you are going to use a mini-emulator or just a disassembly into a pseudo-assembler, which is the solution I made in MetaPHOR.

For this, you first define your own assembly language (that can be based on x86 opcodes). Just keep in mind that the more this assembly language seems the x86 opcodes, the more easy will be its handling. So, let's merge the next section...

iii) Pseudo-assembly language

...and continue. The MetaPHOR internal pseudo-assembler follows the next rules:

  1. All the instructions are 16-bytes long (but this can change in the future to handle 64-bits processors, like the Itanium).
  2. The structure of the instruction is always the same for all them:

    General structure:

    16 bytes per instruction,

      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      OP *----- instruction data ----* LM *-pointer-*
    

    OP is the opcode of the instruction. Depending on the opcode we use an instruction data structure or other.

    LM is "Label Mark". Its value is 1 when a label is pointing to this instruction, and can be used for quite things, for example to know if two instructions can be shrinked or not (they can't if the second one has a label over it). It's at +0B in the instruction.

    The dword at +0C is a pointer that means "last code reference". On disassembly this means the EIP where this instruction is pointing to its original codification, but while we are advancing in the code treatment we store here references to the last situation of the instruction. This helps to make modifications to the table of labels, to recode the displacement instructions (JMP, CALL, etc.) and more.

Now the structures that the engine uses in the instructions:

Memory_address_struct:

+01First index
+02Second index, bits 7&6 are the multiplicator (00=*1,01=*2,10=*4,11=*8)
+03DWORD addition to indexes

Depending on the opcode (the operation to perform), the following means:

From this rules, now we use the next pseudo-opcodes:

00: ADD, 08: OR, 20: AND, 28: SUB, 30: XOR, 38: CMP, 40: MOV, 48: TEST

Set rules:

+00Reg,Imm
+01Reg,Reg
+02Reg,Mem
+03Mem,Reg
+04Mem,Imm
+808 bits operation

So, opcode 83 means ADD Mem,Reg using 8-bits operands, and so on.

50PUSH Reg
51PUSH Mem
58POP Reg
59POP Mem
68PUSH Imm
70-7FConditional jumps
E0NOT Reg
E1NOT Mem
E2NOT Reg8
E3NOT Mem8
E4NEG Reg
E5NEG Mem
E6NEG Reg8
E7NEG Mem8
E8CALL label
E9JMP label
EACALL Mem (used for API calls)
EBJMP Mem (used for obfuscation in API calls)
ECCALL Reg (obfuscation of API calls)
EDJMP Reg (idem)
F0SHIFT Reg,Imm
F1SHIFT Mem,Imm
F2SHIFT Reg8,Imm
F3SHIFT Mem8,Imm

For all SHIFTs:

+07Byte with the value of rotation/shifting
+08Operation performed: 0: ROL, 8: ROR, 20: SHL, 28: SHR
F4APICALL_BEGIN

Special operation meaning PUSH EAX/PUSH ECX/PUSH EDX that avoids the recoding of these registers, always remaining the same.

F5APICALL_END

The complementary of APICALL_BEGIN, it means POP EDX/POP ECX/POP EAX

F6APICALL_STORE
+01Memory address struct

This always means: MOV [Mem],EAX <-- Avoiding the recoding of EAX

F7SET_WEIGHT
+01Memory address struct
+07Byte with the weight identificator
+08Register 1 in code structure
+09Register 2 in code structure
F8MOVZX

Memory address struct is a 8-bits operand, while +07 is a 32 bit reg.

FCLEA
FERET
FFNOP

These are the opcodes we can find when disassembling. Additionally, I have have defined some more opcodes for internal operations:

4FExists only While shrinking, and means a MOV Mem,Mem, being:
+01Source memory address struct
+07Pointer to the instruction that holds the destiny memory address struct, which has the format:
+00FF (NOP)
+01Destiny memory address struct
+07Ptr to source memory address holder (the 4F instruction)

Since 4F is only a transition opcode, it's free for use after shrinking. There are three more opcodes used only for reassembly:

4EINC/DEC Register
+01Register
+070 if INC, 8 if DEC
4FINC/DEC memory address
+01Memory address struct
+070 if INC, 8 if DEC
FDLiteral byte
+01Byte to insert directly into the code

Thats the planning I made for the beginning. As you can see, the opcodes are very similar to the x86 ones, so they are quite easy to remember (in fact, I wrote this list only remembering it :).

Now, the second part of planning: the shrinker. How are we going to shrink? We'll see them in its section, because I'm not on the mood of writing things twince :P.

2) Coding

a) Disassembler/depermutator

The entrypoint of the engine is this! Before making anything, we must disassemble, of course. The disassembler is quite easy in concept, but a real drudgery. Since we are trying to do an absolute metamorphic code, we can't use hash tables although we code a routine to mutate that hash, something that can be harder. My solution was the first case, and moreover I combined the depermutator in an implicit way. The theory of decoding is as follows:

Given ESI = Entrypoint, we have a memory buffer as big as the code we are going to disassemble. This buffer (stored in the variable PathMarks) allows us to control the already-disassembled code. We have also two more tables: the LabelTable and the FutureLabelTable, also initialized. These two tables have a counter variable for each one, giving the number of elements in every table. We have also the DisassembledCode in EDI, just where we decode the instructions.

LabelTable is a table where each element is 2 DWORDs long. The first DWORD stores the real EIP where it points, where the second DWORD stores a pointer to the disassembled code. Then, when we decode a JMP, we set a a pointer to this table as the label. In this way, we can move as we want the internal pointers of the label and all the instructions referencing to that label are automatically updated.

FutureLabelTable is a buffer table that only exists on disassembly. It's used to store destinies of JMPs, CALLs, etc. pointing to code that we haven't disassembled yet. Every time we decode an instruction, we look if that address is stored in this table, and then we can complete all the instructions that referenced to that address if it exists there.

Given all this, let's see the algorithm:

  1. Initialize the PathMarks map (i.e. zeroing it) and the number of labes and future labels.
  2. Translate the current EIP (in ESI) directly onto the PathMarks map.
    • If it exists, store at ESI a JMP to the disassembled instruction (storing also a label pointing to that instruction if it hasn't one already).
    • If it doesn't exists, mark it as already disassembled and decode the instruction. Now, depending on the instruction, we act:
      • If it's JMP:
        • If it points to an already decoded address, write a JMP instruction, insert a label to the destiny and get a new EIP at FutureLabelTable. If the label already exists, use that label.
        • If not, then write a NOP (just in case a label points directly to this JMP) and load a new EIP (in ESI) with the destiny. In this way, we have eliminated a possible permutation JMP.
      • If it's Jcc (conditional jump):
        • If it points to an already decoded address, write the Jcc and insert a label to the destiny if the label doesn't exist (if not, use the label already inserted in the table).
        • If the destiny is not disassembled yet, then store it at FutureLabelTable and continue.
      • If it's CALL, act as if it were a Jcc.
      • If it's RET, JMP Reg or JMP [Mem] (a final leaf in the code tree), store the instruction and get a new EIP from FutureLabelTable.

When getting a new EIP from the FutureLabelTable, we check if the labels stored here are already decoded. If they are, then we insert the corresponding labels at the LabelTable and eliminate the entry in FutureLabelTable. If not, we get that new EIP (i.e. we load ESI with that new entrypoint), we insert the new label at LabelTable and continue.

As you can deduct, the disassembly will end when FutureLabelTable has no more entries, since this means that we came from an end leaf of the code flux. After performing such "emulation", we have:

  1. Eliminated the permutation and the permutation jumps (since we have eliminated the JMPs by changing directly the EIP in ESI).
  2. Eliminated all the code that can't be reached in any way.
  3. Decoded the whole code in our pseudo-assembler.
  4. Substituted the labels by pointers to table entries.

We have done it! We haven't to code a depermutator or an emulator to detect dead code zones, since we eliminate them implicity. An example of a depermutation performed here would be:

       CODE         PASSES
      ------        ------
        xxx1        1) Decode xxx1
        xxx2        2) Decode xxx2
        xxx3        3) Decode xxx3
        jmp @A      4) Change EIP to @A (don't store label)
        yyy1        5) Decode xxx7
        yyy2        6) Decode xxx8
    @B: xxx4        7) Decode xxx9
        xxx5        8) Change EIP to @B (don't store label)
        xxx6        9) Decode xxx4
        jmp @C     10) Decode xxx5
        yyy3       11) Decode xxx6
        yyy4       12) Change EIP to @C (don't store label)
    @A: xxx7       13) Decode xxx10
        xxx8       14) Decode xxx11
        xxx9       15) Decode JZ and store @D in FutureLabelTable
        jmp @B     16) Decode xxx12
    @D: xxx13      17) Decode RET, get @D from FutureLabelTable and
        xxx14        complete the JZ at pass 15 (@D = current EIP)
        RET        18) Decode xxx13
        yyy5       19) Decode xxx14
    @C: xxx10      20) Decode RET and get an item from FutureLabelTable.
        xxx11        Since it's empty, we have decoded everything, so finish.
        jz @D
        xxx12
        RET

The result of the disassembly would be:

        xxx1
        xxx2
        xxx3
        xxx7
        xxx8
        xxx9
        xxx4
        xxx5
        xxx6
        xxx10
        xxx11
        jz @D
        xxx12
        RET
  @D:   xxx13
        xxx14
        RET

I think it's more clear now. All garbage and hole-filling code (the yyy? instructions) are implicitly eliminated, so we haven't to look for them later.

The problem here is that, once shrinked, the code skeleton (already depermutated here) can be use for detection. Or maybe not, but we want even the skeleton to be different from generation to generation (the spanish expression is "rizar el rizo", or "to loop the loop" in spanglish :). Then, we insert what I call the "3D instructions". What?? Ehrmm... Well, it's not taking Quake and playing, it's just that it pictures exactly what it happens: let's imagine the dimensions where you make metamorphism: the first dimension is the current code (what you are disassembling now). The second dimension is what you are going to code (the result of this metamorphosis). The third dimension is what the next generations will decode.

So, first and second dimension are clear, but... what about third? Simple: just code some JMPs as CMP EAX,EAX/JZ @xxx, for example. Do you see the point? The shrinker (as we are going to see in the next part) must be able to compress the "CMP EAX,EAX/JZ @xxx" pair to "JMP @xxx", but it will happen in the next generation, not in this one (and there are some structures that will be compressed completely at more far generations). The only thing we have to take in account is that we can't put garbage instructions after this type of jump, because they'll be decoded also. Let's see the code above (eliminating the yyy? this time) and substituting the first JMP by a CMP/JZ pair:

          xxx1       Result of the disassembly:
          xxx2          xxx1
          xxx3          xxx2
          CMP X,X       xxx3
          JZ @A         CMP X,X
      @B: xxx4          JZ @A    --> This will be compressed to JMP @A on this
          xxx5      @B: xxx4         generation and then eliminated implicitly
          xxx6          xxx5         in the disassembly of next generation.
          jmp @C        xxx6
      @A: xxx7          xxx10
          xxx8          xxx11
          xxx9          jz @D
          jmp @B        xxx12
      @D: xxx13         RET
          xxx14     @A: xxx7
          RET           xxx8
      @C: xxx10         xxx9
          xxx11         jmp @B
          jz @D     @D: xxx13
          xxx12         xxx14
          RET           RET

This means ABSOLUTE metamorphism. The skeleton changes, but not the code algorithm. It's necessary to run some generations to reach the original code, but since other JMPs are mutated in the same way, you never reach it, and you need some generations to eliminate a permutation JMP, while other new ones are inserted. The good thing is that it's not infinite, and we reach a point where this code gets stable, but not in the first generations, where we'll see that our code grows due to these jumps.

Other thing that we can do is to use this disassembler to decode some instructions to its easy form: INC Reg --> ADD Reg,1, for example. In this way, we only have to treat one type of instruction, not all the instruction variants in the processor that perform the same operation (although it can be performed by the shrinker, but in this way we save pseudo-opcodes).

b) Shrinker/emulator (deobfuscator)

At this point, are we going to emulate or compress? The emulation is more advanced and powerful, but implies an outstandingly complex coding, and it has many problems, like the control of the values after a loop. The easier way of doing things, and also the way that gives better ratios in a quality/quantity comparision is compressing by known pairs and triplets of instructions: it's just undoes what the expander part does.

The planning must have this part. You have to make a table with all the possible single instructions, pairs/triplets of them, and decide what you are going to compress/expand and what not. The shrinker can be used to eliminate "intelligent garbage": do-nothing code that is inserted as if it were part of the algorithm. Now I show the possible single/pairs/triplets MetaPHOR can compress, and they're just the set of instructions that the expander can generate from a single one.

Legend:

Reg
A register
Mem
A memory address
Imm
Immediate

When in an instruction is Reg,Reg or something like that, both are the same register. If they are different, I write it as Reg,Reg2 (for example).

Transformations over single instructions:

XOR Reg,-1 --> NOT Reg
XOR Mem,-1 --> NOT Mem
MOV Reg,Reg --> NOP
SUB Reg,Imm --> ADD Reg,-Imm
SUB Mem,Imm --> ADD Mem,-Imm
XOR Reg,0 --> MOV Reg,0
XOR Mem,0 --> MOV Mem,0
ADD Reg,0 --> NOP
ADD Mem,0 --> NOP
OR Reg,0 --> NOP
OR Mem,0 --> NOP
AND Reg,-1 --> NOP
AND Mem,-1 --> NOP
AND Reg,0 --> MOV Reg,0
AND Mem,0 --> MOV Mem,0
XOR Reg,Reg --> MOV Reg,0
SUB Reg,Reg --> MOV Reg,0
OR Reg,Reg --> CMP Reg,0
AND Reg,Reg --> CMP Reg,0
TEST Reg,Reg --> CMP Reg,0
LEA Reg,[Imm] --> MOV Reg,Imm
LEA Reg,[Reg+Imm] --> ADD Reg,Imm
LEA Reg,[Reg2] --> MOV Reg,Reg2
LEA Reg,[Reg+Reg2] --> ADD Reg,Reg2
LEA Reg,[Reg2+Reg2+xxx] --> LEA Reg,[2*Reg2+xxx]
MOV Reg,Reg --> NOP
MOV Mem,Mem --> NOP (result of a compression of PUSH Mem/POP Mem, with pseudoopcode 4F)

The instructions that are eliminated (the ones that mean NOP) are be used as garbage along the executable code. Since every NOP instruction can be expanded (for example, MOV Reg,Reg can be set as PUSH Reg/POP Reg, and every PUSH and POP also can be expanded, and so on) you can't know what's garbage and what's not until you have compressed everything.

The pairs of instructions that MetaPHOR can compress are:

PUSH Imm / POP Reg --> MOV Reg,Imm
PUSH Imm / POP Mem --> MOV Mem,Imm
PUSH Reg / POP Reg2 --> MOV Reg2,Reg
PUSH Reg / POP Mem --> MOV Mem,Reg
PUSH Mem / POP Reg --> MOV Reg,Mem
PUSH Mem / POP Mem2 --> MOV Mem2,Mem (codificated with pseudoopcode 4F)
MOV Mem,Reg/PUSH Mem --> PUSH Reg
POP Mem / MOV Reg,Mem --> POP Reg
POP Mem2 / MOV Mem,Mem2 --> POP Mem
MOV Mem,Reg / MOV Reg2,Mem --> MOV Reg2,Reg
MOV Mem,Imm / PUSH Mem --> PUSH Imm
MOV Mem,Imm / OP Reg,Mem --> OP Reg,Imm
MOV Reg,Imm / ADD Reg,Reg2 --> LEA Reg,[Reg2+Imm]
MOV Reg,Reg2 / ADD Reg,Imm --> LEA Reg,[Reg2+Imm]
MOV Reg,Reg2 / ADD Reg,Reg3 --> LEA Reg,[Reg2+Reg3]
ADD Reg,Imm / ADD Reg,Reg2 --> LEA Reg,[Reg+Reg2+Imm]
ADD Reg,Reg2 / ADD Reg,Imm --> LEA Reg,[Reg+Reg2+Imm]
OP Reg,Imm / OP Reg,Imm2 --> OP Reg,(Imm OP Imm2) (must be calculated)
OP Mem,Imm / OP Mem,Imm2 --> OP Mem,(Imm OP Imm2) (must be calculated)
LEA Reg,[Reg2+Imm] / ADD Reg,Reg3 --> LEA Reg,[Reg2+Reg3+Imm]
LEA Reg,[(RegX+)Reg2+Imm] / ADD Reg,Reg2--> LEA Reg,[(RegX+)2*Reg2+Imm]
POP Mem / PUSH Mem --> NOP
MOV Mem2,Mem / MOV Mem3,Mem2 --> MOV Mem3,Mem
MOV Mem2,Mem / OP Reg,Mem2 --> OP Reg,Mem
MOV Mem2,Mem / MOV Mem2,xxx --> MOV Mem2,xxx
MOV Mem,Reg / CALL Mem --> CALL Reg
MOV Mem,Reg / JMP Mem --> JMP Reg
MOV Mem2,Mem / CALL Mem2 --> CALL Mem
MOV Mem2,Mem / JMP Mem2 --> JMP Mem
MOV Mem,Reg / MOV Mem2,Mem --> MOV Mem2,Reg
OP Reg,xxx / MOV Reg,yyy --> MOV Reg,yyy
Jcc @xxx / !Jcc @xxx --> JMP @xxx (this applies to (Jcc & 0FEh) with (Jcc | 1)
NOT Reg / NEG Reg --> ADD Reg,1
NOT Reg / ADD Reg,1 --> NEG Reg
NOT Mem / NEG Mem --> ADD Mem,1
NOT Mem / ADD Mem,1 --> NEG Mem
NEG Reg / NOT Reg --> ADD Reg,-1
NEG Reg / ADD Reg,-1 --> NOT Reg
NEG Mem / NOT Mem --> ADD Mem,-1
NEG Mem / ADD Mem,-1 --> NOT Mem
CMP X,Y / != Jcc (CMP without Jcc) --> NOP
TEST X,Y / != Jcc --> NOP
POP Mem / JMP Mem --> RET
PUSH Reg / RET --> JMP Reg
CALL Mem / MOV Mem2,EAX --> CALL Mem / APICALL_STORE Mem2
MOV Reg,Mem / CALL Reg --> CALL Mem
XOR Reg,Reg / MOV Reg8,[Mem] --> MOVZX Reg,byte ptr [Mem]
MOV Reg,[Mem] / AND Reg,0FFh --> MOVZX Reg,byte ptr [Mem]

Maybe there are more, but this set is sufficient, at least for our proposits. What we do know is scan the code for this situations and then we substitute the first instruction by their equivalent and we overwrite with NOP the second, so the instructions are compressed.

But there are more: the triplets:

MOV Mem,Reg
OP Mem,Reg2
MOV Reg,Mem
--> OP Reg,Reg2
MOV Mem,Reg
OP Mem,Imm
MOV Reg,Mem
--> OP Reg,Imm
MOV Mem,Imm
OP Mem,Reg
MOV Reg,Mem
--> OP Reg,Imm (it can't be SUB)
MOV Mem2,Mem
OP Mem2,Reg
MOV Mem,Mem2
--> OP Mem,Reg
MOV Mem2,Mem
OP Mem2,Imm
MOV Mem,Mem2
--> OP Mem,Imm
CMP Reg,Reg
JO/JB/JNZ/JA/JS/JNP/JL/JG @xxx
!= Jcc
--> NOP
CMP Reg,Reg
JNO/JAE/JZ/JBE/JNS/JP/JGE/JLE @xxx
!= Jcc
--> JMP @xxx
MOV Mem,Imm
CMP/TEST Reg,Mem
Jcc @xxx
--> CMP/TEST Reg,Imm
Jcc @xxx
MOV Mem,Reg
SUB/CMP Mem,Reg2
Jcc @xxx
--> CMP Reg,Reg2
Jcc @xxx
MOV Mem,Reg
AND/TEST Mem,Reg2
Jcc @xxx
--> TEST Reg,Reg2
Jcc @xxx
MOV Mem,Reg
SUB/CMP Mem,Imm
Jcc @xxx
--> CMP Reg,Imm
Jcc @xxx
MOV Mem,Reg
AND/TEST Mem,Imm
Jcc @xxx
--> TEST Reg,Imm
Jcc @xxx
MOV Mem2,Mem
CMP/TEST Reg,Mem2
Jcc @xxx
--> CMP/TEST Reg,Mem
Jcc @xxx
MOV Mem2,Mem
AND/TEST Mem2,Reg
Jcc @xxx
--> TEST Mem,Reg
Jcc @xxx
MOV Mem2,Mem
SUB/CMP Mem2,Reg
Jcc @xxx
--> CMP Mem,Reg
Jcc @xxx
MOV Mem2,Mem
AND/TEST Mem2,Imm
Jcc @xxx
--> TEST Mem,Imm
Jcc @xxx
MOV Mem2,Mem
SUB/CMP Mem2,Imm
Jcc @xxx
--> CMP Mem,Imm
Jcc @xxx
PUSH EAX
PUSH ECX
PUSH EDX
--> APICALL_BEGIN
POP EDX
POP ECX
POP EAX
--> APICALL_END

And again, maybe there are more cases, but these are the ones I use. The mechanism we follow is the same as with pairs: we check if the three instructions under our pointer construct a defined triplet, and then we compress it overwriting the two last instructions with two NOPs.

Once we have defined the singles, pairs and triplets of instructions, we are going to see the algorithm of compression, since we cannot apply this directly if we expanded them recursively (this means, maybe we coded PUSH Imm/POP Reg, but PUSH Imm can be coded as MOV Mem,Imm/PUSH Mem, so the instructions would be "MOV Mem,Imm/PUSH Mem/POP Reg", and the POP Reg could be expanded also. The algorithm of compression is:

          CurrentPointer = FirstInstruction
       @@Loop:
          if ([CurrentPointer] == MATCHING_SINGLE)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          if ([CurrentPointer] == MATCHING_PAIR)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          if([CurrentPointer] == MATCHING_TRIPLET)
          {
             Convert it
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             if (CurrentPointer != FirstInstruction) call DecreasePointer
             goto @@Loop
          }
          do (CurrentPointer++) while [CurrentPointer] == NOP
          if(CurrentPointer != LastInstruction) goto @@Loop

      DecreasePointer:
          do (CurrentPointer--) while (([CurrentPointer] == NOP) &&
                                       ([CurrentPointer.Label == FALSE))
          return

We don't need to pad with NOPs the last instruction to avoid that garbage from a previous disassembly generate something undesired, since no compression is defined that takes a RET, JMP or an instruction like that as the first instruction of a pair/triplet, and we are SURE - capitalized - that the code finishes with one of this instruction. Also take in account that we ignore NOPs always!

Also take care of labelled instructions: we MUST NOT compress instructions if one of them (apart from the first one) has a label pointing to it (that's the reason of having a LABELLED flag embedded into the instruction structure that we defined while planning the engine). A label over it means that a jump, call, etc. points to the instruction, and if we merge it with the one above we are corrupting the code.

The memory addresses that we eliminate with the compression are just temporary variables that we reserved in reassembly to hold the values to perform that operation. So, we must be sure that our true variables (the memory addresses where we store important things) must not match any pair or triplet, because if it does it will be eliminated and the engine won't work anymore. With a little care, we don't worry about that memory variables and we don't have to check if they are important variables or only transition ones, which allows us to redefine freely the position of the variables along the code and mix them with the temporary ones.

An example of what the shrinker does:

  Original code:

  MOV [Var1], ESI  * PUSH ESI       * MOV EAX,ESI
  PUSH [Var1]      * nop              nop
  POP EAX            POP EAX        * nop
  PUSH EBX           PUSH EBX         PUSH EBX        =====>
  POP [Var2]         POP [Var2]       POP [Var2]
  ADD EAX,[Var2]     ADD EAX,[Var2]   ADD EAX,[Var2]




             MOV EAX,ESI        MOV EAX,ESI   * LEA EAX,[ESI+EBX]
             nop                nop             nop
             nop                nop             nop
  ====>    * MOV [Var2],EBX   * ADD EAX,EBX   * nop
           * nop                nop             nop
             ADD EAX,[Var2]   * nop             nop

After passing the algorithm above to the pseudo-assembler, we obtain a code with lots of NOPs along the code, but it doesn't matter because, on expansion, we'll ignore this NOPs, performing a true optimization (although that achieved optimization is broken by the expander, but who cares...).

c) Permutator

Using an internal assembler allows to perform this pass in an easy way, and we haven't to keep all the instructions of the same size and things like that, because the reassembler will calculate that for us.

The easiest way of doing a permutation over the code is to define "code frames": we construct a table where we define portions of code, giving an initial and a final offset, in this way:

    ESI = Initial address of instructions
    EDI = Address of last instruction

    Given ESI = 00000000h,
          EDI = 00000060h

    while(ESI < EDI)
       Store ESI
       ESI += Random(8)+8
       Store ESI
       if((ESI+0F > EDI)
          Store ESI,EDI
          break;
       end if
    end while
       
    Result (for example):
          DD    00000000h,0000000Ah
          DD    0000000Ah,00000017h
          DD    00000017h,00000023h
          DD    00000023h,00000032h
          DD    00000032h,0000003Dh
          DD    0000003Dh,00000049h
          DD    00000049h,00000052h
          DD    00000052h,00000060h

Now we make a shuffle of the elements of this array. Shuffling is easy (there are lots of algorithms for doing it). After doing it, we obtain:

          DD    00000032h,0000003Dh
          DD    00000023h,00000032h
          DD    0000000Ah,00000017h
          DD    00000000h,0000000Ah
          DD    00000017h,00000023h
          DD    00000052h,00000060h
          DD    0000003Dh,00000049h
          DD    00000049h,00000052h

While doing it, we keep the track of the first frame, which is the entrypoint of the code. If the first frame is not the entrypoint, we insert a JMP to that frame. Since the destiny of the linking JMPs is not known, we store them into a table for, after copying all the instructions, complete them. So, the first action we make is to insert a JMP for the entrypoint, and then we begin to copy the instructions. The first frame tell us that we must copy from instruction 32h to instruction 3Dh, and then insert a JMP to the next frame, and so on.

After performing this, we'll get a permutated code. At this point we only have to finish the JMPs we have stored for later fixing, and all is done.

Also try different things here: for example, if you leave the NOPs that the shrinker generated it will derive in a more random code distribution, since the NOPs will be included also in the permutation, but eliminated in the next code processings.

d) Expander (obfuscator)

The expander is the part that undoes what the shrinker did. It performs exactly the reverse operation (in a random way, of course). Just take the defined singles, pairs and triplets and code an alternative for every instruction an alternative (apart from their direct codification).

For this, we are going to code it all recursive. For example, when we find a 50h opcode (PUSH Reg) we'll call MakePUSHReg(), which will decide randomly if we code directly the PUSH or we make a workaround using a defined pair or triplet. So, the function decides to make it as MOV Mem,Reg / PUSH Mem, so it calls MakeMOVMemReg() and MakePUSHMem(). But, what a casuallity! The function MakeMOVMemReg() can be called with MakePUSHReg()+MakePOPMem(), which again calls MakePUSHReg(). This can make the code grow too much, so we put a recursivity control, i.e. a variable that increments when a recursive-procedure-using function is called and decreased when it leaves. We check that variable to see if it has increased to a certain number, and if we reach it we code the instruction directly (in this case, 50h+Reg and EIP++).

The expander is cooler if we use our internal assembly language also, so we haven't to deal with the final codification. That work will be done by the reassembler (next section). To be sure that what's resulting on the code expansion, we are going to generate the pseudo-assembler as similar to the final code as we can (that's why I have opcodes such 4E and 4F to code INC and DEC in the list of opcodes). So, if we find CMP EAX,0 on the expander, we use it to code OR EAX,EAX or TEST EAX,EAX in pseudoassembler, so the reassembler only has to code the instruction directly, without having to decide anything more than some randomness on opcode generation.

The expander should make some more operations, such as:

Register translation
We select a new register to translate the register that we were using until now. The easier way is to put the sequence 0,1,2,3,5,6,7 in a list and shuffle it, and then translate every register by that number. In this way the registers are never the same when performing operations, and even on memory access, since if we use ESI->EBP translation (for example) the way these memory references are assembled to x86 are different (or can be different). It's something that we could see in Vecna's Regswap.
Variable re-selection
To do memory variables we have to own a memory buffer (reserved with VirtualAlloc, .bss section in the host itself, etc.). If you have the variables we use to store important things at a fixed location, forget it! Put them in a buffer where all them are together. In this way, we only have to detect that a memory address is a variable (looking if it's a memory address of the form [DeltaRegister+12345678h]), store it in a buffer like we did with the code labels (but this time with memory addresses), shuffle them and reassign an address for every variable. Doing this we don't use fixed variables anymore. The only inconvenient is that we must supply some values from other places, such from the decryptor (if we use one).

Also I'll take a case for LEAs: better if we use them only for recoding MOV Reg,Value, ADD Reg,Value, MOV Reg,Reg and ADD Reg,Reg. Why? Because if you look a LEA closer, means that it arrived here because it's a shrink of some more instructions, like MOV EAX,EBX / ADD EAX,12345678h. So, if we avoid the coding of the LEA and we disgregate it in simpler operations (randomly), we have an embedded swapper module. An example:

     MOV EAX,ECX                                              
     ADD EAX,3   -> (shrink) -> LEA EAX,[ECX+3] -> (expand) -> MOV EAX,3
                                                               ADD EAX,ECX

                                                   (expand) -> MOV EAX,ECX
                                                               ADD EAX,3

The thing is that if we did the effort of coding a swapper in a part of the engine, we'll see that these cases are contemplated, so we coded things that are redundant in the final result. Anyway, a swap can be performed on expansion, but take care, because many times a swap between two elements like the next ones without controlling EVERYTHING can corrupt the code:

     MOV EAX,1234
     MOV EBX,2345  <-- check if a label is pointing to this!

     MOV EAX,1234
     MOV EBX,[EAX] <-- check if the second instruction uses the elements of
                       the first instruction

     MOV EAX,[EBX]
     MOV EBX,1234  <-- check if the first instruction uses the elements of
                       the second instruction

     MOV EAX,[EBX]
     MOV [ECX],EDX <-- EBX and ECX has the same value? We can't know this
                       without total emulation

And MANY MANY more. My experience is that there are so many factors that the code can crash even with the most little, innocent change that the swapper makes although it seems to be perfectly valid. The only ones we can mutate with security are the ones that are inserted in LEAs and unrolled from them later. Moreover, we avoid the coding of another huge routine :).

e) Reassembler

The reassembler is the end edge of the engine. This piece of code will generate the instructions the processor can understand. If we followed the philosophy of expanding the code in pseudo-assembler, this is a piece of cake, because we have done this plenty of times when we coded polymorphic engines. Moreover, the vision of seeing your engine finished animates you a lot, believe me :).

The reassembler is a kind module that likes to be helped from other ones, such from the expander. We can code it to take literally the pseudoassembler and write what it's expressed there, with no care about what it means.

But while we code it, we see that there are some things that aren't as easy as they seem, for example the EIP displacement instructions. How can we code foward JMPs, CALLs and Jccs? We must use (again) a table, and store here all the instructions we must fix when the pseudocode is all assembled.

But a real pain can be the foward short JMPs/Jccs. We can do it without it, but it's not fair ;). My solution (very neat) is to look if the instruction points to a maximum length of 11 or 12 instructions foward, so if it's below that mark we can decide if we code it short or long (randomly, ofcoz). Where we haven't any problem is with backwards jumps, because we always know the length. CALLs doesn't need to be decided, but we must resolve them also when they are fowards, because the code isn't assembled yet.

The randomness in this part of the engine is made in the jumps (when we can decide if we use short or long ones) and in the opcode reassembly when we have several possibilities for the same opcode. The list that follows shows some instructions that can be done in this way:

  B0+Reg
  C6 C0+Reg   --> MOV Reg8,Value

  B8+Reg
  C7 C0+Reg   --> MOV Reg,Value

  50+Reg
  FF F0+Reg   --> PUSH Reg

  58+Reg
  8F C0+Reg   --> POP Reg

  40+Reg
  FF C0+Reg   --> INC Reg

  48+Reg
  FF C8+Reg   --> DEC Reg

This is an example. Other ones can be the instructions that can uses EAX (both in EAX-exclusive opcode or using the generic opcode), the opcodes that takes a sign-extended byte-to-dword operand (opcode 83, and using -80 > value < 7F), and much more. All them are only random at opcode level, because all the alternate ways of doing things (the complex movements, and the like) are made with the expander.

Hey, it's done! We have a reassembled code! Now it comes the hard part: DEBUGGING.

3) Known problems (and solutions)

a) Debugging your engine

Debugging metamorphic code is a hell, and if you never have coded such an engine, you'll notice it. It's fine while you debug the code at first generation because you can see what you did, but the problem comes when you must debug the generated code. The assembler is soooo obfuscated that you maybe get crazy, so the solution is: debug from the beginning! Don't code the shrinker/expander until all the other parts work perfectly!

Other thing you must consider is that the debugger uses INT 3 to patch the return from CALLs. If you code the disassembler in a separated function, take care when tracing/stepping over, because:

  MOV EDI,[VirusEntry]
  CALL Disassembly
  MOV EAX,12345678  ;--> The disassembler will see:  
                                                    MOV EDI,[VirusEntry]
                                                    CALL Disassembly
                                                    INT 3
                                                    JS @xxx
                                                    ...
 

You can see that the debugger corrupts the code. This is good, because the code is implicitly anti-debugging :), but it's bad because if you don't notice it you believe you're wrong in some point and lose lots of time with a checking you didn't need. The solution is:

  1. Enter the call and trace all the debugging (unacceptable), or
  2. Use hardware breakpoints (OOOOOH! clap clap clap clap)

So, take care of where you press the "Run to here" key, since it will be OK if you use it after you have runned the disassembler, but it will corrupt the code if you use it before the disassembly and you point to code that will execute after it. Using hardware breakpoints on "fetch instruction" will eliminate the problem, but it's more annoying (more keys to press, etc.). Make it in a macro.

b) API calls

API calls are a real pain in the ass to detect. Since all the calls haven't the same number of parameters, we don't know at what point the parameters start (but we know that the API ends with the CALL, ah, now we can rest in peace :P). This type of calls are delicate because we can't change the registers around them as we want, since the return value is given always at EAX, and they always modify ECX and EDX. So, in a normal code, we happily set values in registers without using these three, but we can't make it when we are translating registers "a go-go" (or we can play only with registers > EDX, but then we limit the possibilities). We also have to detect the moment when EAX is used to check the return value, and lots of things that comes from the fixed use of these registers, things that collide with our engine.

The easy solution is to use a "code mark": they're a sequence of certain instructions with certain operands that doesn't repeat along the code unless the thing they are marking is present. The ones I use are the instructions called APICALL_BEGIN, APICALL_END and APICALL_STORE.

APICALL_BEGIN is, simply PUSH EAX/PUSH ECX/PUSH EDX. It's detected by the shrinker, since every instruction of the structure can be expanded as well as with other instructions. The shrinker will detect this sequence of instructions and will change them by the pseudoopcode F4, which is a mark for the expander to code PUSH EAX/PUSH ECX/PUSH EDX. The registers must be always EAX, ECX and EDX. In this way we assure the saving of these ones before an API call.

In the same way, APICALL_END is POP EDX/POP ECX/POP EAX. This is also translated by the shrinker, giving a pseudoopcode F5, which is also a mark for the expander to code POP EDX/POP ECX/POP EAX. This signalizes the end of an API call and then the restoration of what it was saved. So, for the registers values it's like if the API call was never performed.

APICALL_STORE is another pseudoinstruction. The shrinker will detect it when it gets a MOV [Mem],EAX just after the CALL of the API. This is done to avoid the translating of EAX, so this instruction will always be codificated as MOV [Mem],EAX, regardless the register EAX must be translated to. The memory address is the variable where the return value is stored, and can be retrieved after doing the APICALL_END.

The next example illustrates the use of that technique:

  PUSH EAX
  PUSH ECX
  PUSH EDX            -------------------> APICALL_BEGIN
  MOV EAX,[EBP+AddressOfNewDirectory]
  PUSH EAX
  CALL DWORD PTR [EBP+RVA_SetCurrentDirectoryA]
  MOV [EBP+ReturnValue],EAX -------------> APICALL_STORE [ReturnValue]
  POP EDX
  POP ECX
  POP EAX             -------------------> APICALL_END
  MOV EAX,[EBP+ReturnValue]
  CMP EAX,EDX          ; Get the return value and check it
  JZ @X
  ...
 

Now change the registers:

  PUSH EAX
  PUSH ECX
  PUSH EDX            -------------------> APICALL_BEGIN
  MOV ESI,[EBX+AddressOfNewDirectory]
  PUSH ESI
  CALL DWORD PTR [EBX+SetCurrentDirectoryA]
  MOV [EBX+ReturnValue],EAX   -----------> APICALL_STORE [ReturnValue]
  POP EDX
  POP ECX
  POP EAX             -------------------> APICALL_END
  MOV ESI,[EBX+ReturnValue]
  CMP ESI,ECX          ; Get the return value and check it
  JZ @X
  ...
 

For obvious reasons, you must not use EAX, ECX or EDX as the Delta register in the recodification of the virus because, if you use it, the call to the API function will overwrite it and the return value will be wrote anywhere, with a 99% of throwing an exception. We can use ths fact (Delta is not EAX, ECX or EDX) to code the expansion of CALL DWORD PTR [Mem], which we planned to be MOV Reg,Mem/CALL Reg: we can use EAX, ECX or EDX for Reg perfectly and without saving anything, because the register values are going to be destroyed by the API call.

For Linux (for the ones that want to make metamorphism under this system) the API call is more complicated, since we pass the parameters in the registers (EAX, EBX, ECX, etc.). Well, we can define a structure in this way:

  MOV [EBP+Parameter1], XXX
  MOV [EBP+Parameter2], YYY
  MOV [EBP+Parameter3], ZZZ
  PUSH EAX
  PUSH EBX
  PUSH ECX
  PUSH EDX
  PUSH EBP       --------------> LINUX_SYSCALL_BEGIN DeltaReg
  MOV EAX,[EBP+Parameter1]
  MOV EBX,[EBP+Parameter2]
  MOV ECX,[EBP+Parameter3]
  MOV EDX,[EBP+Parameter4]  ---> LINUX_SYSCALL_LOADPARAM
  INT 80h
  POP EBP
  MOV [EBP+ReturnValue],EAX  --> LINUX_SYSCALL_STORE DeltaReg,[ReturnValue]
  POP EDX
  POP ECX
  POP EBX
  POP EAX        --------------> LINUX_SYSCALL_END
 

It's larger, but I'm not God :).

There's something that also applies to this, although it's not an API call: the SET_WEIGHT pseudoinstruction. The instruction corresponds to the next code structure:

  PUSH Reg1
  MOV Reg1,WEIGHT_IDENT
  MOV Reg2,xxyyzztt
  MOV [Mem],Reg2
  POP Reg1
 

This is compressed to SET_WEIGHT [Mem],IDENT,Reg1,Reg2. This instruction is used to transmit the data here from one generation to another. We can only transmit this because we obtained the values before making the total reasm, so we can't use it for substituting parameters passed by stack.

The weights expressed here are part of a little genetic algorithm that makes the virus to be "adaptative", since the "natural selection" will make that the survivors have the best options to be in the wild (type of infection, structure of the decryptor, infection rate. etc.). The algorithm is not very advanced, but it does its work.

c) Memory

Yes, metamorphism by-the-hard-way requires lots of memory to store tables, the disassemblies, the temporal code changes, execution path marks, the local variables, and lots of etc. etc. etc. The using of ESP is a dirty trick that can be useful to store not very big things, but all changes when you realize that you need about 3 or 4 Mb to do a decent job (just see z0mbie's Mistfall: reserves 32Mb!). The only solution is to use VirtualAlloc and reserve the quantity you need.

Since we reserve memory, nothing stop us of copying the code to memory, as if we were dinamically generated code. Moreover, for an antivirus emulator it's a copy, not a decryption (if you decrypt while copying). For this, we have to code a polymorphic engine to generate at least a "copier" (just if you want to leave the code unencrypted).

The copy to memory allow us to make more nice things: we can copy our code anywhere inside that memory frame, and have our memory variables at the offset we want. Since we use a Delta register, we don't care where the code is! The problem to this is to get our internal offset (just notice that the memory address where we are is dinamically got by the operating system). We can recode the addresses of all buffers and tables while reassembling, and then even the "shape" of the final code will be different! The problem here is that we must supply the address where the code is stored, the data division inside the reserved virtual allocated memory, etc., so we must supply them from the unique thing we must vary as we want: the decryptor/copier, making, for example, pushes of the values and then poping them once in the engine (look the source code of MetaPHOR).

4) Future

This section is optional. It expresses ideas that I'm thinking to increase the functionality of the virus MetaPHOR, and that maybe are useful for you to implement.

a) Plug-ins

That's a thing one day I'll implement in my virus for sure (I hope!). I realized that with this type of metamorphism the plug-ins are very easy to implement (although "easy" doesn't mean "short and quick-coding"). The plug-ins can be implemented with "code marks", in the same manner we signalize an API call. What we do is to make an information header (variables it uses, etc.) and a routine made in pseudoassembler (our assembler). A code mark that marks a plug-in can be "PUSH Reg/MOV Reg,12345678/POP Reg" (a do-nothing code that we don't eliminate with the shrinker). The number passed in that register is, in fact, a plug-in identificator and a version number, (using hi-word and lo-word), which allows us to compare the version of the plug-in currently installed with the version of a new plug-in, and to determine if the plug-in must substitute and old one or must be added to the execution. The mark would be repeated twice (one for begin and once for end), with the value in the last one with the high-bit to 1, so we can substitute the current plug-in by the new one. The shrinker must detect this structure and set two new instructions called "PLUGIN_BEGIN Version" and "PLUGIN_END Version". When the shrinker has performed it's job, and just after the permutator, the plug-in injector comes to work: it searches for new plug-ins anywhere, decrypts them and check their signature (like Vecna does in his Hybris). The plug-ins are not needing to be encrypted, and are also nearly impossible to modify or put one that is not yours if you use a public-private key signature system.

Depending on the type of plug-in and the version, we can substitute directly the plug-in entry (if type coincides and the version is greater) overwriting the CALL to the last plug-in. This is a 3D-coding technique that I mentioned nearly at the beginning of the article: the code of the old plug-in remains and it's assembled forming part of the new virus, but when in the next generation the virus get disassembled, that part won't be reached by the disassembler part, so the olg plug-in is eliminated (and meanwhile the code gets garbled and mixed with code that does absolutely nothing :).

All the whole virus can be made of plug-ins, or at least have plug-in identificators at all the routines, even at the plug-in downloader. In that case, if one day I make support for Linux (I think it'll be before the plug-ins get finished :) I only have to code the plug-in and put it where it corresponds.

Also we should have to have a field to specify the zone where we want to put a new plug-in (just in case we want to insert a plug-in just after the permutator but before the expander, to make something new, or just if we forgot to code a function and we want to put it now). It'll be on the plug-in identificator, and it's simple: just make a unique identifier for each plug-in in the original virus that be quite separated from the one before, for example: Disassembler: 10000001, Shrinker 20000001, etc. (all them with the identificator and the version). If we want a new function between the disassembler and the shrinker, we use the identificator 18000001, and so on.

Let's see if one day I do it :).

b) Multi-platform cross-infector

That's something that I didn't do on the release 1.0 of MetaPHOR because I was short of time to finish it for 29A#6, but it's quite easy (in fact, we have seen the proof of concept in Benny's Winux). We only have to call one API function or another, depending on the system we are running, and have a function to attach our mutated code to a PE or to an ELF. The APIs can be used in the same way (for example, I code a single function called MapFile that returns me a file mapping for Windows using CreateFileMapping and MapViewOfFile, and for Linux using the pertinent int 80h function).

c) Reassembling to different processors

That's something that I don't know if I'm going to code it before plug-ins (in which case I'll program a plug-in rather than a new whole virus).

If you have realized, the pseudoassembler is quite general, so it can be adapted to any processor we want! Only if they use a 32-bits architecture, of course. The 64-bits architecture requires a change in all the management of the internal pseudoassembler, but not very complicated (only I have to extend the instruction length from 16 bytes to 32, which gives me space for the QWORDs and some new fields that won't remain in blank).

The fact is that I thought about it and can be done for sure, and the first pass would be use Alpha assembler and the WinNT under Alpha processors. Why? Because I don't have to change NOTHING, only the Disassembler and the Assembler I use, since the infection methods and the algorithm would be exactly the same! (since we use ring-3). The shrinker, permutator, expander, etc. would remain untouched.

The only thing we have to do is redefine some instructions and expansions under this assembler, since the TEST instruction, for example, doesn't exist in all the processors (I think it doesn't exist in Itaniums).

The very first thing to do is to code a new disassembler (but keeping the current). So, we only have to see if our code is Alpha or x86 to call to one routine or other. After this, since we are using only our internal assembler, we are in a common part (in algorithm, but not in codification) until the reassembler, which will be selected (reassembler for Alpha or for x86) depending on the target processor specified in the PE header. The different processors we can use are quite alot (and then also means a lot of work!): x86, Alpha, Itanium, 680x0, PowerPC, PA-RISC, etc. (all the platforms that can run WinNT or Unix/Linux, or even MacOS).

The great thing is that the code is completely changed: it's not a virus with several parts, one for every processor, as it was MrSandman's Esperanto (with respect, Sandy ;), it was another type of code). The whole virus will reassemble to the new processor's assembly language, so an antivirus that wants to intercept the virus for x86 processors have to deal with a possible conversion to Alpha assembler and then, once in an Alpha, reconvert to x86, jumping over the antivirus completely (for example, in an infected server). And since it's metamorphic, they can't apply strings to detect it. Yummy! :)

5) Conclusion

Metamorphism: the strongest viric technique ever ideated, ever created.

Any more to say?

[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! vxer.org aka vx.netlux.org
deenesitfrplruua