Software tampering is an attack which has the purpose of altering the way a piece of software operates in such a way that it brings illegitimate benefits to the attacker. The objectives of tampering could be to side-step copy protection or security mechanisms, to extract secret or copyrighted material, to introduce malicious code such as computer viruses, or the like.
In many situations, the illegitimate benefits may involve substantial financial disadvantages for software producers. Consequently, both attackers and software vendors may be expected to make significant efforts to break and improve protection mechanisms against software tampering, respectively. In the context of mobile phones, protection of the SIM-lock and other sensitive software components, e.g. Digital Rights Management (DRM), are of particular interest. However, tamper protection of other software entities may also be beneficial.
In order to modify a software component, an attacker typically has to acquire at least a partial understanding of how the software component functions. Software tampering may thus be delayed if not prevented by making reverse engineering more difficult. Transformations, which make the software harder to analyze are useful to this end; such transformations are generally referred to as obfuscation.
Techniques for reverse engineering software may roughly be divided into two groups of techniques: Static (or “offline”) code analysis and dynamic (or “live”) code analysis. When performing dynamic analysis, the software is observed as it is executing. In contrast, static analysis is usually limited to an examination/analysis of some representation of the program code, without actually executing it.
Typically, the executable code or machine code of the computer program is the only representation available to an attacker, i.e. the attacker has typically not access to the source code. Consequently, a typical initial step of reverse engineering a computer program includes the creation of a higher-level representation of the executable code using static code analysis.
In this context, function calls are an interesting target for a reverse-engineering attack, since a correct identification of the relation between call sites (i.e. the program points from which function calls are made) and the entry point of each function (i.e. the program points to which the calls are made) is useful for an attacker in order to the understand a computer program. This information is commonly represented in the form of a program's call graph, in which each function constitutes a node and each function call a directed edge from the caller function to the called function. Constructing the call graph is thus a frequently applied first step of a reverse-engineering attack.
Consequently, it is generally desirable to complicate if not prevent the construction of a call graph or a similar analysis of the relations between function calls and function entry points by an attacker, so as to make static analysis and reverse engineering of a computer program more difficult. Generally, even if the analysis of a computer program may not be completely prevented, a delay in such an analysis by attackers alone may cause a significant delay in any tampering attempt, thus extending the period of time during which a computer program may be supplied and used without the risk of misuse.
Previous attempts to make reverse engineering by static analysis more difficult include attempts to encrypt the executable code. This technique thus requires decryption of the code before it can be executed. Such attempts based on encryption techniques include software-based and hardware-based techniques.
In software-based techniques the keys and the algorithms used for decryption are typically embedded in the code. Consequently, as long as a skilled attacker can read the executable code and observe its execution, it is possible to use the program to decrypt itself. In this way, an attacker can relatively easily arrive at the original representation of the program, which then can be analyzed by existing tools. Therefore, the transparency of software-based encryption is a weakness of such approaches.
Hardware-based techniques, on the other hand, perform the combined decryption and execution by specific hardware. Even though properly implemented hardware-based decryption techniques can offer good protection, this protection is achieved at the price of additional, specific hardware.
Generally, obfuscation is the process of transforming a computer program into a semantically equivalent one which is harder to understand (by humans and/or analysis tools) and thus impedes static analysis. Some obfuscating transformations mainly target the control flow of the program, others mainly target data.
The article “Watermarking, Tamper-proofing, and Obfuscation—Tools for Software Protection” by Christian S. Collberg and Clark Thomborson, IEEE Transactions on Software Engineering, 28:6 (June 2002) discloses a number of obfuscation techniques.
So-called anti-disassembler transformations have the purpose of confusing disassemblers—i.e. tools that convert executable code (machine code) into a text representation (assembly language)—so as to complicate static analysis of a computer program.
Existing transformations for obfuscating the flow of control of a computer program focus on control flow that is local to functions. The task of determining the overall program structure and constructing the call graph is thus not made significantly more difficult.
Hence, it remains a general problem to provide efficient methods of obfuscating program code so as to make it more difficult to analyse the call graph of the program, e.g. in order to identify the overall program structure.