Memory related corruption, that is, memory modification or disclosure can result in unintended consequences and/or abuse in a software program. Manipulation (exploitation) of memory data during execution of a software program can result in privilege abuse or escalation in the program. Since privilege abuse and escalation are fundamental steps during hacking of computer systems, prevention of exploitation of memory corruption is an important task.
A Control Flow Graph (CFG) of a program is a construct used in many kinds of software tools such as compilers, binary rewriters, runtime binary instrumentation engines, binary translator engines, runtime code generators and other software tools. The CFG describes the software execution path as intended by the programmer. It contains vertices corresponding to the source and destination instructions of control flow transfers in the program and directed edges between a valid source and destination pair. Conventionally, the CFG can be defined by analysis, such as, source-code based analysis, intermediate representation based analysis, binary analysis, or execution profiling. A CFGs can have a direct control flow graph, that is, a construct representing the direct flow of function calls, and an indirect control flow graph, (ICFG), that is, a construct representing in direct calls, indirect jumps, function returns, signal delivery, exception delivery and other forms of indirect control flow. For computer security and/or performance optimization purposes indirect control flow of a program can expose vulnerabilities or efficiency related issues in the program.
For instance, an exemplary use of the ICFG for computer security purposes is the tag based fine-grained Control Flow Integrity (FG-CFI) method. FG-CFI is the state of also known in the art as return-oriented programming (ROP), return-into-library based attack or jump-oriented programming (JOP), counterfeit object-oriented programming (COOP) or other variants. FG-CFI is a transformation method that protects indirect control transfers in the form of indirect calls, indirect jumps, and returns by tagging control transfers and valid destinations with identifier values, and further inserting tag checks before indirect control transfers, and further detecting and reacting to tag mismatches at runtime. In FG-CFI a first pair of control transfer and valid destination is assigned a unique identifier that is different from the identifier of a second pair of control transfer and valid destination if and only if the control transfer of the first pair is different from the control transfer of the second pair and if the valid destination of the first pair is different from the valid destination of the second pair.
Another exemplary use of the ICFG for performance optimization purposes is a transformation method that converts indirect function calls into direct function calls. In particular, if it can be determined that based on the ICFG an indirect call or indirect jump has only one valid target then such indirect call or indirect jump can be transformed into a direct call or direct jump which improves performance by eliminating erroneous branch predictions and/or helping speculative execution in the processor at runtime.
One particular use of this transformation is known in the art as devirtualization that converts virtual class member function calls into direct class member function calls. Another aspect of this transformation further reduces the performance impact of FG-CFI. In particular, by design FG-CFI does not insert tag checks for direct control flow graphs which can be transformed into direct function calls.
It is important that the ICFG be as complete and correct as possible. In particular, if the ICFG is missing one or more vertices and/or edges then it may result in incorrectly converting an indirect call into a direct one which in turn may result in incorrect program behavior and in omitting an otherwise needed tag check in an FG-CFI method. If the ICFG has false vertices and/or edges then it can result in not converting an indirect call into a direct call or it can cause the insertion of an otherwise unnecessary tag check in an FG-CFI method.
Constructing a correct and complete ICFG for a program is not a trivial task. To generate an ICFG conventional methods such as Link Time Optimization (LTO), Link Time Code Generation (LTCG), Whole Program Analysis (WPA) or Class Hierarchy Analysis (CHA) require the analysis of all the code that constitutes the program and all its dependent libraries. However, these conventional methods do not scale to complex software such as entire operating systems or web browsers. Another limitation of conventional methods is that they are unable to generate ICFGs in a dynamically loaded code environment and/or require to statically link the program to construct the ICFG or require complexity that may be impractical in runtime environment.
Thus, what is needed are methods, systems, and techniques that can efficiently generate a scalable ICFG that can overcome the above mentioned limitations. Furthermore, such methods, systems, and techniques should be able to operate in a dynamically loaded code environment without requiring to statically link the program to construct the ICFG.