Obfuscation is a prevalent practice aiming at protecting some functionalities or properties of a program. Yet, while its legitimate final goal is link to intellectual property protection, obfuscation is widely used for malicious purposes. The transformations applied to a program aim at hiding the real program behavior. While approaches such as virtualization or junk insertion make instructions more complex to understand, other approaches directly hide the legitimate instructions of the programs, thereby making a reverser (or a disassembler) missing essential parts of the code while wasting its time in dead code. The latter category includes for example code overlapping, self-modification, opaque predicates and call stack tampering. Therefore, software deobfuscation is a crucial task in reverse-engineering, especially for malware analysis.
Standard disassembly approaches are essentially divided into “static methods” and “dynamic methods”. On one hand, static (or syntactic) disassembly tools such as the known IDA or Objdump ones have the potential to cover the whole program. Nonetheless, they are easily fooled by obfuscations such as code overlapping, opaque predicates, opaque constants, call stack tampering and self-modification. On the other hand, dynamic analysis covers only a few executions of the program and might miss both significant parts of the code and crucial behaviors. While standard static and dynamic disassembly approaches suffer from those well-known short-comings (i.e. standard program analysis techniques cannot deal with dynamic code), an interesting alternative named “Dynamic Symbolic Execution” (DSE) has recently been proposed as being more robust than static analysis and more complete by covering more instructions than dynamic analysis. The following references relate to DSE:
B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” in CCS 2015, ACM, 2015.
B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” in SP 2015, May 2015.
While the authors use dynamic and symbolic execution in order to discover more parts of the code under analysis, and can deal with dynamic code (assembly code, executable code, javascript, etc.), these approaches cannot prove infeasibility.
Dynamic disassembly methods only address reachability issues, namely feasibility questions, verifying that certain events or setting can occur, e.g. that an instruction in a code is indeed reachable. However, many issues or questions arising during reversing tasks are infeasibility questions, e.g. detecting protection schemes such as opaque predicates which fall into the category of infeasibility questions. The infeasibility issues are currently a blind spot of both standard and advanced disassembly methods.
Dynamic analysis and DSE do not address this issue because they only consider a finite number of paths in the control-flow graph of a program to be disassembled, while infeasibility is about considering all paths. Recovering the most accurate control-flow graph of a program under analysis, i.e. recovering all instructions and branches, is the first step of deobfuscation. This step is already challenging for non-obfuscated codes due to tricky low-level constructs like indirect control flow (computed jumps, jmp eax) or to the interleaving of code and data. This operation gets largely worst in the case of obfuscated codes. And currently, only dynamic analysis and DSE are robust enough to address heavily obfuscated codes.
Moreover at first sight, infeasibility could be considered as a simple mirror of feasibility. However from an algorithmic point of view they are not the same. Indeed, since solving feasibility questions on general programs is undecidable, practical approaches have to be one-sided, favoring either feasibility (i.e., answering “feasible” or “don't know”) or infeasibility (i.e., answering “don't know” or “infeasible”). While there currently exist robust methods for answering feasibility questions on heavily obfuscated codes, no such method exist for infeasibility questions.
There is thus a need for a solution to address the problem of infeasibility conditions or events in programs, and particularly to address infeasibility questions encountered during reversing tasks of obfuscated code. The present invention offers a solution to this need.