1. Technical Field
The present disclosure relates to software protection and more specifically to obfuscating branches in software using trees representing return address and destination value pairs.
2. Introduction
Obfuscation is making compiled computer code more difficult to understand while still retaining the same or substantially the same functionality. Obfuscation generally frustrates reverse engineers, and can push reverse engineering attempts away from static analysis and into more costly dynamic analysis. Any increase in obfuscation complexity adds another layer of difficulty and cost to reverse engineering attempts.
One obfuscation approach converts branches to calls to a branching function. When a call instruction is executed, the branching function pushes the address of the next instruction on the stack and transfers execution to the called function. Upon completion of the called function, the return address is popped from the stack so that execution can resume at the instruction immediately following the call instruction. On the other hand, an unconditional jump simply transfers execution to the target destination of the jump instruction. Because the call and unconditional jump instructions function differently, when a jump instruction is replaced with a call, the called function has to know how to transfer execution to the target destination instead of the instruction immediately following the call instruction.
In this obfuscation approach, the branch function identifies the target destination through a table lookup. A table maps the return address to the destination by absolute address or by a displacement from the return address. The table is organized such that the hash of the return address yields the index into the table, i.e. destAddr=T[h(retAddr)]. When the branch function executes, it computes the destination address by hashing the return address on the stack. The return address on the stack is then replaced with the computed destination address. When the branch function completes, the function returns to the target destination instruction instead of the instruction following the call instruction. Such obfuscation techniques remove obvious control flow from the static binary, thereby forcing attackers to resort to more computationally expensive dynamic analysis.
However, this obfuscation technique has a number of drawbacks. A traditional hash table stores the key value pair in order to resolve collisions. In the software protection context, storing the key and value together is undesirable because an attacker can more easily discover the exact information the obfuscation attempts to hide. One method to eliminate this problem is to use a perfect hash function that guarantees each key to hash to a unique table index. However, a perfect hash function is not always practical due to cost, storage space, or computing power constraints. A perfect hash function is also specific to the data set, so each unique data set requires recalculating a perfect hash function. Additionally, a perfect hash function can only be generated once all of the jump, destination pairs have been identified. Because of this, the branch function, which is executed at run time, must be modified to use the perfect hash function to compute the destination address. Such modifications may not always be possible.