1. Field of the Invention
The present invention relates to computer software analysis systems, more specifically to software decompilers.
2. Description of the Related Art
Generally speaking, the term “software decompiler” refers to a computer program, or set of program instructions, that parses a second, compiled computer program presented in executable code (e.g., binary) form and provides as an output a set of human-readable program instructions that represent the functions of the compiled program. Compiled software is generally presented in machine executable code (binary) form, without comments or other human-readable content included. As is well-known in the art, compiled or executable computer instructions comprise the microprocessor specific codes that cause a microprocessor to execute its own built-in functions.
The general purpose of a decompiler is to take executable code and return it back into a human-readable representation that allows a programmer to analyze the functions of the software and, in particular, its flaws and vulnerability to exploitation and/or hacking. Decompilers can also be used to analyze software for compliance with various standards, such as the widely publicized Year 2000 (Y2K) potential vulnerability.
In preparing a human-readable representation of compiled software code, a decompiler must determine both the control flow of the program as well as the data flow. “Control flow” refers to the logical execution sequence of program instructions beginning, logically, at the beginning, traversing various loops and control-transferring statements (branches), and concluding with the end or termination point of the program. “Data flow” refers to the process within the program whereby variables (or data storage elements, i.e., data that is stored in program memory either dynamically or statically on some external memory unit, such as a hard drive) are read from and/or written to memory. Data flow includes the process whereby variables or data inputs or outputs are defined by name and content and used and/or modified (i.e., redefined) during the execution of the program. Programmers of ordinary skill in the art will of course realize that many high-level languages require some sort of definition or typecasting of each variable before its first use. The data flow analysis portion of the decompilation process is not, however, concerned with initial declaration of data type but rather determining when and where variables are defined, how they are parsed, and whether they are local to a particular process or sub process, or globally available (“global”) for use throughout the program.
One shortfall seen in prior art decompilers is that, while they seek to provide a representation of the original compiled and executable software, they often fail to provide a complete model of the software of that program such that the model could be itself recompiled into a functional equivalent of the original compiled and executable program. Furthermore, prior art decompilers are known to use imprecise and incomplete statement modeling tools, resulting in incompletely defined data flow and/or control flow. These shortcomings result in code models that do not sufficiently represent the complete control flow and data structures of the targeted compiled, executable code. In such incomplete models, security vulnerability and forensic analysis is often infeasible or (at best) inaccurate.
What is needed is a nanocode level decompiler that provides a sufficiently accurate model of software operation for complete security vulnerability analyses and forensic study of failed, malfunctioning, or suspect code. “Nanocode” refers to individual processor instructions that have been decomposed into their semantic meaning (to the processor) at their lowest (near-electrical) level. “Nanocode level” refers to the level of coding that represents these fundamental steps and structures. What is also needed is a complete decompiling process and toolset that allows a full representation of the control and data flows of a target program such that all instructions and internal processes are fully represented at the nanocode level.