1. Technical Field
The present invention relates to computer system verification and more particularly to system and method for pointer analyses for scalable flow and context sensitive pointer aliases.
2. Description of the Related Art
Static analysis has been shown to be a viable technique for detecting potential bugs in large-scale real-life programs. However, the accuracy and scalability of most static error detection methods strongly hinges on the precision and efficiency of the underlying pointer analysis, especially for C programs. Successful static analysis techniques have been devised for detecting data race deadlocks, memory leaks and buffer overflows, among others. To be effective such static analyses must satisfy two key conflicting criteria, i.e., accuracy and scalability.
Static analysis works on heavily abstracted versions of the given program which may potentially lead to many bogus warnings. A key challenge in effectively applying static analysis to find bugs, therefore, is to reduce the number of bogus warnings while keeping the analysis scalable. However, the accuracy and scalability of most static error detection methods strongly hinges on the precision and efficiency of the underlying pointer analysis, especially for C programs. This makes an accurate as well as scalable pointer analysis desirable for such applications.
For example, without a precise context sensitive alias analysis, it is hard to compute accurate must-aliases for lock pointers that are required to compute locksets for static data race detection. This greatly increases the bogus warning rate thus impacting the utility of such an analysis.
Most of the scalable flow and context sensitive analysis for C programs have been context-insensitive or flow insensitive. B. Steensgaard, in “Points-to Analysis in Almost Linear Time”, POPL, 1996 (hereinafter Steensgaard), was the first propose a unification based highly scalable flow and context-insensitive pointer analysis. The unification based approach was later extended to give a more accurate one-flow analysis that has one-level of context-sensitivity. The one-flow analysis was intended to bridge the precision gulf between Steensgaard's and Andersen's analysis. Inclusion-based algorithms have been explored to push the scalability limits of alias analysis.
For many applications where flow-sensitivity is not important, context-sensitive but flow-insensitive alias analyses have been explored. There is also substantial prior work on context sensitive flow sensitive alias analysis.
Representing pointer analysis as a logic programming problem allows it to be formulated using sets of datalog rules which can then be used to compute BDDs for a context-sensitive alias analysis with limited flow sensitivity. This approach has been shown to be successful for Java7 where the number of pointers is much lesser as compared to a similar sized C program with less complex aliasing relations.
The classical approach to data race detection involves three steps. The first and most critical step is the automatic discovery of shared variables, i.e., variables which can be accessed by two or more threads. Control locations where these shared variables are read or written determine potential locations for data races to arise. In fact, data races usually arise if a common shared variable is accessed at simultaneously reachable program locations in two different threads where disjoint sets of locks are held.
Since locks are typically accessed via pointers, to determine these locksets at program locations of interest, in the second step a must-pointer alias analysis is carried out. A main drawback of static race detection techniques is that since such techniques work on heavily abstracted versions of the given program the analysis is sound but not guaranteed complete. A consequence is that a lot of bogus warnings may be generated which impacts effectiveness. Key to reducing the false warning rate is the precision of the may-pointer aliasing analysis for shared variable discovery and the must-pointer alias analysis for generating lock aliases needed for computing locksets. Indeed, the most sensitive factor governing the accuracy of a static shared variable discovery routine is the automatic discovery of shared variables. Wrongly labeling a variable as “shared” renders all warnings generated for the variable bogus thereby increasing the false warning rate. On the other hand, if reporting a variable as shared is missed, then generation of warnings fails for a genuine data race involving this variable.
In typical Linux code, for example, data which is global to a thread is usually stored in structures with a large number of fields. Of these, a very small number of the fields are used to store data that is truly shared across different threads with the rest of the fields being used for bookkeeping purposes. Such structures are accessed via pointers. An inaccurate may-alias analysis can produce a large number of aliases for pointers to these global structures thereby resulting in the relevant fields of each of the structures pointed to by the aliases as “global” even if they are accessing a local structure. This may result in a large number of local variables being labeled as shared thereby greatly increasing the false warning rate.