VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum
[Previous] [Index] [Next]

Statistical Executable Trash Generator

Released in 29a magazine #8

Author's doc

Sometimes we need to generate some "trash" or "garbage" code. This is useful in poly/meta-morphic engines and in obfuscation techniques.

There exists ETG (executable trash generator) engine. It has important feature: you can specify characterictics of the code to be generated. In other words, it can generate x86 assembly code of some required properties, such as

  1. set of possible instructions
  2. set of possible source registers
  3. set of possible destination registers

For example, you say ETG(ADD|XOR|SUB, EAX|EBX, EAX|EBX|ECX|EDX) and in the output there will appear only these three instructions which will read only EAX..EDX registers and store results into EAX or EBX. This can be useful when you want to generate polymorphic decryptor which will use only ESI/EDI/ECX/EDX registers, or generate a small set of trash instructions which will not modify some specific registers.

  etg_engine(0x12345678,                            // user-parameter
             ETG_ALL&(~(ETG_JMPS|ETG_SEG|ETG_REP)), // set of instructions
             REG_ALL,                               // src: all registers
             REG_ALL&(~REG_EDX),                    // dst: all but edx
             &bufsize,                              // [out] ptr to output size
             1000,                                  // max # of instructions
             sizeof(buf),                           // max output buffer size
             buf,                                   // output buffer
             my_random);                            // user-specified randomer


  movzx   edi, bp
  bsr     ebp, edi
  bt      ebp, 6Bh
  sbb     al, 0BBh
  bswap   esi
  adc     eax, edx
  mov     ecx, edi
  dec     al
  mov     esi, 63F2CDDh
  test    edx, edi
  inc     ebx
  mov     al, dh
  imul    esi, ecx, 0B9AE0794h
  bsr     eax, ebx
  shld    edi, esi, 5

As you can see, this code has one property: there is no any relations between instructions, and so it is very easy to detect it.

Now, we can introduce some classification of executable code, using relations-between-instructions as main criteria:

  1. typical garbage used in polymorphic engines: no relations at all
  2. "real" or generated code: almost all instructions are interrelated

But, if we will merge these two types of code, we will get

  1. relations between neighbour instructions only.

For example, if you see PUSH EBP, it will be followed by MOV EBP,ESP with high probability. But, if you see ADD EAX,EBX it will almost never be followed by SUB EAX,EBX.

So, we can somehow calculate some big matrix (its just an example):

this  next-> push ebp   mov ebp,esp   add eax,ebx  sub eax,ebx   ...
push ebp       0            100            2            2        ...
mov ebp,esp    1             0             2            2        ...
add eax,ebx    2             2             10           0        ...
sub eax,ebx    2             2             0            10       ...
...           ...           ...           ...          ...

As such, for each known instruction we can have a set of following instructions with some "weigths" - this can be just a numbers or probabilities (normalized numbers).

From the matrix above we can see that PUSH EBP is never used twice, but can be followed by MOV EBP,ESP with probability of 100/(0+100+2+2) = 0.96

When we have a set of "weights" and "items", we may need some special algorithm of random items selection, such that items will be choosen according to their wights.

To build database of "instruction chains", i.e. a set of (count,instruction1,instruction2,...) we need to disassemble lots of executable files; this can be done using mistfall engine, see calcchains.cpp && gen_db.bat.

resulting db looks like this (n_chains.txt):

00089375 C20000
00041587 C3
00031113 C9 C20000
00030164 5D C20000
00029763 5F 5E 5B
00029095 55 8BEC 83EC00
00024393 FF7500 FF7500 FF7500
00019356 5E 5B C9
00018668 5B C9 C20000
00016724 50 8D4500 50
00014922 53 56 57
00014249 5D C3
00000655 85C0 7500 8B0D00000000
00000654 7400 F6450000 7400
00000652 C1E000 50 E800000000
00000652 8B4000 85C0 7400
00000652 894500 8B4600 894500
00000652 85C9 7400 8B5500

While disassembling, we use some form of "stemming" technique: we partialy fill instructions with zeroes: for example "mov eax, 1" and "mov eax, 2" becomes "mov eax, 0"; i.e. we set to zero "address" and "immediate" parts of the each instruction - this can be done using XDE engine.

However, there can be some higher levels of stemming, for example to reorder register numbers within instruction chains, i.e. always convert something like "mov r1,r2 | add r1,r3 | sub r1,r4" to "mov eax,ecx | add eax,edx | sub eax,ebx". Also we can use some kind of "deoptimization", i.e. to split instruction chains into independent subchains, because optimizing compilers usually mix instructions. This all can help in building better knowlegde about statistical relations between instructions.

When instruction data base is created, you can generate some code using script.

Note that generated instructions are not real ones; all immediate/address values are set to zero. To "fill" instructions, use setg2.cpp.

Resulting random code looks like this:

 mov ecx,[ecx]
 add eax,edx
 add esi,edx
 mov dx,[edi]
 add esi,byte +0x6c
 cmp eax,[ebp-0x6c]
 jz ...
 push dword [ebp+0x64]
 push dword [ebp+0x3c]
 call edi
 test eax,eax
 jz near ...
 cmp edi,eax
 jnz ...
 pop edi
 pop esi
 ret 0x46e4
 mov eax,[eax]
 call near [eax]
 pop eax
 pop ebp
 ret 0x6320
 xor ebx,ebx
 push eax
 push ebx
 push esi

In case when we need code of special properties, as in ETG engine, we can do the following: when building database, simply filter out all the instructions we dont need; so, we will get statistics about relations between only required instructions, and generated code will consist of correct instructions only.


setg.zip34261SETGDec 2004MD5 sum 357f60481810996d35981606babb610b

By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka