Obfuscation is a transformation that prevents or delays software tampering by complicating the reverse engineering, copying or tampering (hereinafter tampering) of the software code. In many instances delaying the tampering of a software code is sufficient, especially if the software is an application that protects a financial transaction, which usually only take a few seconds to be accomplished. In the instance of copyrighted materials, obfuscation succeeds by making the tampering process long enough that the tampering process becomes prohibitively expensive when compared to the cost of a genuine copy of the software.
Software tampering includes two major kinds of attacks: Static attacks and dynamic attacks. Static attacks involve analyzing the software statically without running it, for instance using a partial evaluator. Dynamic attacks involve monitoring and lifting the code as it executes in memory in order to capture the lifted portions and re-construct the code.
In dynamic attacks, function calls, call sites, and entry and exit points are strategic targets for attackers for analyzing the control-flow of a program and retrieve its call graph.
Existing control-flow obfuscation methods are primarily applied to local control-flow including branches and jumps. This is limited to the function scope. A greater threat of intrusion is an attacker's ability to discover the call-structure such that the code can be lifted or re-implemented. Traditional calling conventions are well understood, making function call boundaries an easy point of attack.
Existing self-modifying code techniques are primarily applied to straight-line instruction blocks which perform data operations. While this may help conceal operations, it does little to hide the macro control level of the application.
With the broader use of higher level abstract languages such as C++, applications typically have more functions and deeper call-trees than their lower level language equivalents. This means that the function boundaries of applications are now at a greater risk.
For instance, PCT Application Publication No. 2008/074483 A1, Eker et al. which is incorporated herein by reference in its entirety, describes obfuscating a computer program, but fails to address the code lifting attacks, and the dynamic and step attacks, for example, using a debugger.
Eker et al. disclose a method which modifies the function call system by changing the way the address is calculated. The modified function call is computed by an algebraic expression at run-time. The result is a call-by-pointer function call with the function's address determined at run-time.
The method does not have the ability to protect a call-graph from a code lifting attack. For example, the function definition body is never modified. It can be easily statically lifted and used in another program as an exploit. Furthermore, in a dynamic attack where a debugger or monitoring program is used, the function call sequence can be followed in a step-by-step manner to find the called function of interest for code lifting and/or tampering.
Additionally, Eker et al describe that static call-sites are replaced by call-by-pointer. Nevertheless, they are still call-sites. Any call-site can be identified by its unique instruction characteristic as a useful breakpoint for an attacker. If the attacker were to break on all call-sites, then run the program, they can retrieve call-graph information through a dynamic means.
A publication entitled “Application Security through Program Obfuscation” by Matias Madou, published in 2007, which is incorporated herein by reference in its entirety, describes in chapter five thereof, a method of Trace Obfuscation which combines several techniques which occur at the instruction level by changing data operations. These techniques include: inserting diverse code, code factoring, and inserting obfuscating predicates.
Inserting diverse code is used for overwriting an instruction with one of multiple equivalent instructions based on a path taken toward the basic block in which the instruction resides.
Code factoring is a technique to merge two conditional blocks of code differing by only one instruction. In the conditional paths leading to the merged block of code, the single instruction is overwritten to provide the correct behavior just before it is executed.
The third technique includes constructing obfuscating predicates and inserting these into the code in an effort to create diversity. The predicate has a condition which will sometimes evaluate to false and sometimes to true. The successors of the predicate have equivalent, but diverse code.
Madou combines all of the three techniques described above for performing trace obfuscation of the program. However, the diversity techniques proposed by Madou are restricted to modification of data instructions. The insertion of obfuscation predicates involves only the insertion of branches whose behavior is pre-determined.
Furthermore, with the system of Madou, the sequence, order, time, and manner in which functions are called and executed remain the same. Therefore, dynamic attacks may still be successful on software protected by the method of Madou.
Moreover, the method of Madou does not protect the program against static attacks. Isolated functions can still be lifted in their entirety, and continue to behave in their original way after being lifted.
It is, therefore, desirable to provide a method and system for control flow obfuscation against static and dynamic attacks that performs a comprehensive transformation of the call graph of a program.