Virtual Machine based obfuscation: An Overview
I’ve written this article for Hackcyom, which is my current company. Please feel free to read the write up on their website: Virtual Machine based obfuscation: An Overview
Virtual Machine based Obfuscation: An Overview
An introduction and overview to what are VM-based obfuscation, with a particular focus on design philosophy and future research.
Virtual Machine-based (VM-based) obfuscation is a technique in the field of software security focused on altering the readability and structure of code to protect it from analysis. This approach involves modifying straightforward, executable code into a format that mimics the instruction set of a given CPU, making it difficult for anyone to reverse engineer the software.
The significance of VM-based obfuscation lies in its contribution to enhancing the protection of intellectual property. It serves as a deterrent by complicating the process of code analysis through both static and dynamic analysis and understanding, thus requiring more time and resources from potential attackers.
This document is intended as a resource documenting the concept, implementation, and implications of VM-based obfuscation. We assume the reader has a basic understanding of software security principles. Our discussion will span from the foundational aspects of this technique to its impact on the future of secure software design and potential areas for research.
Basics of VM-based Obfuscation
In order to build up step-by-step to the hardening techniques and research directions, we start with more basic core components. This section dives deeper into the foundational components of VM-based obfuscation: VM entry/exit, VM Dispatcher, and the Handler Table, with a focus on their roles, functioning, and implementation.
VM entry/exit
VM entry and exit are critical for transitioning between the native execution environment and the virtual machine context. They ensure that the execution state is preserved and restored appropriately.
- VM Entry: The entry process saves the current native context (e.g., CPU registers, flags) and initializes the VM context. This can include setting up a virtual stack, registers, and other necessary state information. ```c typedef struct { uint64_t registers[16]; // Example for x86_64 architecture uint64_t flags; } VMContext;
void vm_entry(VMContext *vm_context, NativeContext *native_context) { // Save native context memcpy(&vm_context->native_context, native_context, sizeof(NativeContext)); // Initialize VM registers and flags memset(vm_context->registers, 0, sizeof(vm_context->registers)); vm_context->flags = 0; // More initialization as needed }
1
2
3
4
5
6
7
- VM Exit: This process restores the native context from the VM context once the execution within the VM is complete, ensuring that the program can continue running in its original environment.
```c
void vm_exit(VMContext *vm_context, NativeContext *native_context) {
// Restore native context
memcpy(native_context, &vm_context->native_context, sizeof(NativeContext));
// ...
}
VM Dispatcher
The VM Dispatcher is the control unit of the VM, responsible for the fetch-decode-execute
cycle. It emulates a physical CPU’s instruction cycle but in the context of the virtual machine, parsing and executing virtual instructions.
1) Fetch and Decode Instruction: Retrieve the next instruction from the virtual instruction set and decode it to understand its operation. 2) Forward Virtual Instruction Pointer: Move the instruction pointer to the next instruction in sequence. 3) Look up Handler: Utilize the opcode from the decoded instruction to look up the corresponding handler in the Handler Table. 4) Invoke Handler: Execute the handler which processes the instruction according to the virtual machine’s rules.
1
2
3
4
5
6
7
8
9
10
11
12
void vm_dispatcher(VMContext *vm_context) {
while (vm_context->running) {
// Fetch
Instruction instr = vm_fetch(vm_context->ip, vm_context);
// Decode
DecodedInstruction decoded = vm_decode(instr, vm_context);
// Execute
vm_execute(decoded, vm_context);
// Update instruction pointer (IP)
vm_context->ip += sizeof(Instruction);
}
}
Handler Table
The Handler Table defines the semantics of the individual virtual machine instruction set architecture (ISA) instructions. This is essentially a table of function pointers, indexed by opcode, where each pointer references a handler dedicated to one virtual instruction. Handlers are responsible for decoding operands and updating the virtual machine’s context according to the semantics of their respective instructions. Instruction handlers can operate in stack-based architectures (e.g., JVM, CPython), register-based architectures (e.g., Dalvik, Lua), or hybrid forms, manipulating data through a virtual stack or registers accordingly.
1
2
3
4
5
6
7
8
9
typedef void (*Handler)(Args);
Handler handler_table[256]; // Example for one-byte opcodes
void initialize_handlers() {
handler_table[OP_ADD] = &handle_add;
handler_table[OP_SUB] = &handle_sub;
// More handlers...
}
As for the handlers, these functions are the core of the VM’s execution model. Each handler modifies the VM’s state (e.g., registers, memory) according to the semantics of its corresponding virtual instruction.
1
2
3
void handle_add(VMContext *vm_context, DecodedInstruction instr) {
vm_context->registers[instr.dest] = vm_context->registers[instr.src1] + vm_context->registers[instr.src2];
}
Hardening Techniques of VM Systems
The security of a VM-based system can be enhanced through various hardening techniques aimed at obfuscating the internal workings and structure.
Obfuscating individual VM components
Handlers are conceptually simple. Apply various obfuscation techniques to the handlers, such as control flow flattening, instruction substitution, and the insertion of opaque predicates (conditions whose truth values are known to the author but difficult for an attacker to determine).
1
2
3
4
5
6
7
8
9
10
11
void obfuscated_handler_addition(VMContext *vm_context, DecodedInstruction instr) {
// Opaque predicate
if ((time(NULL) & 1) == 0) {
// Misleading path (never taken but complicates analysis)
perform_dummy_operations(vm_context);
}
// Substitution with equivalent but less obvious operations
int result = some_calculation(vm_context->registers[instr.src1], vm_context->registers[instr.src2]);
vm_context->registers[instr.dest] = result;
}
Duplicating VM handlers
Given that a handler table typically allows for a fixed number of entries (e.g., 256 for one-byte indexing), create multiple versions of the same handler, each differing slightly in its implementation (to disrupt code similarity analysis), and randomly assign these when populating the handler table.
1
2
3
4
5
6
7
void duplicate_handlers(Handler handler_table[]) {
Handler original_handlers[] = {&handle_add, &handle_sub}; // Original set
for (int i = 0; i < 256; i++) {
// Duplicate handlers with slight variations
handler_table[i] = create_variant(original_handlers[i % NUM_ORIGINAL_HANDLERS]);
}
}
No central VM dispatcher
Avoid using a single central dispatcher, as it offers a fixed point for attackers to target. Instead, integrate dispatching logic directly into each handler, making the control flow more complex and less predictable.
No explicit handler table
Instead of maintaining a visible, directly accessible handler table, incorporate the next handler’s address within the VM instruction itself, complicating static analysis and understanding of the control flow. Use dynamic techniques to determine the mapping between opcodes and handlers, such as computing the handler’s address at runtime or using encrypted opcodes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
typedef struct {
// Other VM context members...
void* handler_map;
uint64_t key; // Encryption/decryption key for opcodes
} VMContext;
typedef enum {
OP_ADD, // Assume other opcodes follow
// ...
} Opcode;
typedef void (*Handler)(VMContext*, DecodedInstruction);
typedef struct {
Opcode opcode;
Handler handler;
} OpcodeHandlerMapping;
Handler dynamic_lookup(void* handler_map, Opcode opcode) {
// handler_map is a hashtable where keys are opcodes and values are handler functions
HashTable* ht = (HashTable*)handler_map;
if (hash_table_contains(ht, opcode)) {
return (Handler)hash_table_get(ht, opcode);
} else {
// Fallback / error handling
return &default_handler;
}
}
void dynamic_handler_mapping(VMContext *vm_context, EncryptedInstruction enc_instr) {
Opcode decrypted_opcode = decrypt_opcode(enc_instr.opcode, vm_context->key);
Handler handler = dynamic_lookup(vm_context->handler_map, decrypted_opcode);
DecodedInstruction decoded = decode_instruction(enc_instr);
handler(vm_context, decoded);
}
In this example, dynamic_lookup
uses a hash table to retrieve handlers. This setup allows handlers to be mapped in a non-linear and non-obvious way, increasing the difficulty of statically predicting the control flow.
Blinding VM bytecode
Make static analysis more challenging by using flow-sensitive instruction decoding mechanisms, such as instruction “decryption” that varies based on specific runtime conditions. This approach also complicates bytecode patching by attackers, as modifications would necessitate correct re-encryption.
Research Directions
Future research in VM-based obfuscation should focus on creating designs that are resilient against sophisticated attacks such as symbolic execution, program synthesis, and various forms of static and dynamic analysis. As we look to the horizon, the future of VM-based obfuscation is filled with potential. Here are some pioneering directions that could redefine the landscape of software protection:
Complex and target-specific instruction sets
Develop VM instruction sets that are specifically tailored to the application’s logic, making the obfuscated code more resilient to automated deobfuscation tools and attacks based on pattern recognition. By designing complex and unique virtual instruction sets (VIS) for each application, obfuscators can create a layer of protection that is intricately tied to the application’s specific functionality and logic. This customization increases the difficulty for attackers to generalize their analysis across different applications or even different parts of the same application.
Suppose a VM is designed to execute a specialized set of instructions that directly map to high-level business logic operations (e.g., performTransaction, validateUser) rather than generic operations like add or subtract. The mapping between these high-level operations and their underlying implementation could vary between applications, or even change dynamically at runtime.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Opcode definitions specific to our financial application
typedef enum {
OP_VALIDATE_TRANSACTION, // Validates a transaction
OP_UPDATE_BALANCE, // Updates the balance of an account
OP_CHECK_USER_CREDIT, // Checks the credit score of a user
// ...
} FinancialOpcode;
typedef struct {
int transactionId;
double amount;
char fromAccount[10];
char toAccount[10];
} FinancialTransaction;
typedef struct {
FinancialOpcode opcode;
void* args;
} FinancialInstruction;
void handle_validateTransaction(VMContext* ctx, FinancialTransaction* transaction) {
// Implementation-specific logic to validate the transaction
// For example, check if the 'fromAccount' has sufficient funds
if (check_funds(ctx, transaction->fromAccount, transaction->amount)) {
// Update transaction status in the context
update_transaction_status(ctx, transaction->transactionId, VALID);
} else {
update_transaction_status(ctx, transaction->transactionId, INVALID);
}
}
void handle_updateBalance(VMContext* ctx, FinancialTransaction* transaction) {
// Logic to update the balance of both accounts involved in the transaction
if (get_transaction_status(ctx, transaction->transactionId) == VALID) {
decrement_account_balance(ctx, transaction->fromAccount, transaction->amount);
increment_account_balance(ctx, transaction->toAccount, transaction->amount);
}
}
void handle_checkUserCredit(VMContext* ctx, char* userId) {
// Logic to check the credit score of the user (simplified)
int creditScore = get_user_credit_score(ctx, userId);
update_user_credit_status(ctx, userId, creditScore);
}
In this setup, the VM’s instruction set directly corresponds to high-level business operations rather than low-level computational steps. This approach makes the VM’s operation highly specific to the application’s domain, thereby increasing the difficulty for an attacker to apply generic analysis tools or techniques effectively. By embedding domain-specific logic directly into the VM’s architecture, the obfuscation becomes inherently more complex and intertwined with the application’s unique functionality.
Intertwining VM components
Traditional VM designs often separate concerns: dispatchers fetch and decode instructions, while handlers execute them. By intertwining these components, the boundaries between different stages of execution blur, making static and dynamic analysis much more challenging. One could implement a system where each handler, upon completing its operation, directly calculates and jumps to the address of the next handler based on both the current state of the VM and the result of its own execution. This approach removes the predictability of a central dispatcher and merges control flow decisions with operational logic.
A research direction would be developing techniques for automatic intertwining of VM components based on program analysis and identifying optimal intertwining strategies that maximize obfuscation while maintaining performance.
Conclusion
These research directions represent the frontier of VM-based obfuscation research. They aim not only to enhance the protection of intellectual properties but also to anticipate and counteract future developments in reverse engineering and software analysis.