1. Technical Field
A “Demand-Driven Pointer Analyzer” (DDPA) provides a “demand-driven” field-sensitive pointer analysis process that rapidly and accurately identifies alias sets for selected pointers in a software program or other computer source code.
2. Background Art
Type information is typically not readily available for dynamic data of system programs developed in native code. The lack of type information makes it extremely difficult to perform certain tasks on a program's memory such as checking kernel integrity and debugging crash dumps.
There are a variety of conventional techniques that attempt to locate and type dynamic data in a memory snapshot while using information relating to the dynamic data for memory analysis and debugging. Unfortunately, such conventional techniques are generally not adequate for robustly and quickly analyzing memory snapshots of large-scale programs such as modern operating system (OS) kernels with high data coverage.
For example one well-known process, referred to as the “Kernel Object Pinpointer” (KOP), types dynamic data in a kernel memory snapshot with very high coverage, but is lacking in terms of robustness and performance. In fact, using a typical computing device, KOP may take several days to identify candidate types for generic pointers in a large-scale program such as an OS. Further, KOP is relatively slow in typing dynamic data in a memory snapshot. Further, the utility of KOP is fairly limited in that it was originally designed to analyze memory crash dumps for a particular OS, and was not capable of operating on real-world crash dumps that contain information relating to third party drivers. Such issues limit the utility of processes such as KOP.
Other processes have introduced the concept of transforming program analysis problems to graph-reachability problems. One such process applied this idea to demand-driven points-to analysis for Java. In general, this process presented a refinement-based algorithm for demand-driven context-sensitive analysis for Java. However, given that Java's memory model is much simpler than languages such as C/C++, there is no real “memory alias” (where two variables reside in the same location) and any heap access goes through a field.
A related process provides a demand-driven alias analysis algorithm for C. This process makes use of an exploration process so that the language of the grammar is “accepted” by a hierarchical state machine. In general, this process traverses a program expression graph (PEG) and appears to terminate as soon as the query can be answered, thus the query is of type alias? (p,q) and returns true/false. In other words, rather than return a complete alias set for particular pointers, this process merely answers the question of whether two particular pointers (i.e., (p,q)) are aliases of each other. Unfortunately, this process is neither field nor context sensitive.
Other conventional tools perform dynamic heap type inference by using type information embedded in debug symbols in an attempt to assign a compatible program-defined type to each heap block by checking type constraints. If a block cannot be typed, such tools use it as a hint for heap corruptions and type safety violations. Unfortunately, such tools do not scale to large programs such as typical OS kernels.
Several other schemes have attempted to solve the problem of identifying dynamic data and their types without access to source code and type definitions. Some such schemes use Bayesian unsupervised learning to infer data structures and their instances. Other such schemes operate by recognizing dynamic data and their types when they are passed as parameters to known APIs at runtime. Yet other such schemes operate by reverse engineering data type abstractions from binary programs based on type reconstruction theory and is not limited to a single execution trace. Such reverse engineering tools are more effective for analyzing small to medium scale programs than for large-scale programs like OS kernels. Unfortunately, high data coverage cannot typically be achieved without access to source code when analyzing kernel memory snapshots.
Finally, kernel integrity checking has been studied in a large body of work. Various integrity checking schemes operate by leveraging type definitions and manual annotations to traverse memory and inspect function pointers. Unfortunately, without dealing with generic pointers, such schemes suffer from relatively sparse coverage. Related schemes operate to discover OS kernel rootkits by detecting modifications to kernel data. Instead of memory traversal, one such scheme identifies kernel data and their types by taking advantage of the slab allocation scheme used in Linux. It provides per-type allocations and enables direct identification of kernel data types. Unfortunately, such schemes are not applicable to the more general class of operating systems that do not use slab allocation.