1. Field of the Invention
This invention generally relates to the field of object-oriented programming, specifically to the analysis and optimization of object-oriented programs, and in particular to the field call graph construction algorithms such as RTA (Rapid Type Analysis).
2. Description of the Related Art
A key task that is required by most approaches to whole-program optimization is the construction of a call graph approximation. Through use of a call graph, methods can be removed that are not reachable from the main method, dynamically dispatched method calls can be replaced with direct method calls, methods calls can be inlined when there is a unique target, and more sophisticated optimizations can be performed such as interprocedural constant propagation, object inlining, and transformations of the class hierarchy. In the context of object-oriented languages with dynamic dispatch, the crucial step in constructing a call graph is to compute a conservative approximation of the set of methods that can be invoked by a given virtual (i.e., dynamically dispatched) method call.
Call-graph construction algorithms have been studied intensively in the 1990s. While their original formulations use a variety of formalisms, most of them can be recast as set-based analyses. The common idea is to abstract an object into the name of its class, and to abstract a set of objects into the set of their classes. For any given call site e.m( ), the goal is then to compute a set of class names Se that approximates the run-time values of the receiver expression e. Once the sets Se are determined for all expressions e, the class hierarchy can be examined to identify the methods that can be invoked.
Most call graph construction algorithms differ primarily in the number of sets that are used to approximate run-time values of expressions. Examples:
NUMBER OF SETS USED TOAPPROXIMATE RUN-TIMEVALUES OF EXPRESSIONSALGORITHM NAMENo SetsClass Hierarchy Analysis (CHA)9,10One set for the whole programRapid Type Analysis (RTA)5,6One set per expressionO-CFA (Control-Flow Analysis)17,33Several sets per expressionk-CFA, k > 017,33
Intuitively, algorithms that use more sets compute more precise call graphs, but need more time and space to do the construction. In practice, the scalability of the algorithms at either end of the spectrum is fairly clear. The CHA and RTA algorithms at the low end of the range scale well and are widely used. The k-CFA algorithms (for k>0) at the high end seem not to scale well at all.17 The scalability of 0-CFA remains doubtful, mostly due to the large amounts of space required to represent the many different sets that arise. Recent work by Fähndrich et al. give grounds for optimism, although their recent results are obtained on a machine with 2,048 Megabytes of memory.37 In the case of Java, another complicating factor for 0-CFA is that sets of class names need to be computed for locations on the run-time stack. Those locations are unnamed, and to facilitate 0-CFA, it seems necessary to first do a program transformation that names all the locations in some fashion, as done in various recent work.21, 38, 41 Such transformations introduce both time and space overhead. With the investigation of the scalability of 0-CFA still pending, there is a need in the prior art to address the following:    Are there interesting design points in the space between RTA and 0-CFA?    Can better precision be achieved than RTA without analyzing values on the run-time stack?Prior Art Algorithms
The following prior art algorithms progressively take more information into account when resolving virtual method calls.
1. Name-Based Resolution (RA)
Reachability Analysis (RA) is a simple algorithm for constructing call graphs that only takes into account the name of a method. (A slightly more advanced version of this algorithm relies on the equality of method signatures instead of method names. Variations of RA have been presented in many places and used in the context of tree-shakers for Lisp. 16, 34 
RA can be defined in terms of a set variable R (for “reachable methods”) that ranges over sets of methods, and the following constraints, derived from the program text:    1. main {grave over ( )}R (main denotes the main method in the program)    2. For each method M, each virtual call site e.m ( . . . ) occurring in M, and each method M′ with name m:(M{grave over ( )}R)υ(M′{grave over ( )}R).
Intuitively, the first constraint reads “the main method is reachable,” and the second constraint reads “if a method is reachable, and a virtual method call e.m( . . . ) occurs in its body, then every method with name m is also reachable.” It is straightforward to show that there is a least set R that satisfies the constraints, and a solution procedure that computes that set. The reason for computing the least R that satisfies the constraints is that this maximizes the complement of R, i.e., the set of unreachable methods that can be removed safely.
Class Hierarchy Analysis (CHA)
The constraint system for RA can be extended to also take class hierarchy information into account. The result is known as class hierarchy analysis (CHA).9,10 The following notation StaticType(e) is used to denote the static type of the expression e, SubTypes(t) to denote the set of declared subtypes of type t, and the notation StaticLookup(C, m) to denote the definition (if any) of a method with name m that one finds when starting a static method lookup in the class C. Like RA, CHA uses just one set variable R ranging over sets of methods. The constraints:    1. main {grave over ( )}R (main denotes the main method in the program)    2. For each method M, each virtual call site e.m( . . . ) occurring in M, and each class C{grave over ( )} SubTypes(StaticType(e)) where StaticLookup(C, m)=M′:(M{grave over ( )}R)υ(M′{grave over ( )}R).
Intuitively, the second constraint reads: “if a method is reachable, and a virtual method call e.m( . . . ) occurs in the body of that method, then every method with name m that is inherited by a subtype of the static type of e is also reachable.”
Type Analysis (RTA)
CHA can be further extended to take class-instantiation information into account. The result is known as rapid type analysis (RTA).5, 6 RTA uses both a set variable R ranging over sets of methods, and a set variable S which ranges over sets of class names. The variable S approximates the set of classes for which objects are created during a run of the program. The constraints:    1. main {grave over ( )}R (denotes the main method in the program)    2. For each method M, each virtual call site e.m( . . . ) occurring in M, and each class C{grave over ( )} SubTypes(StaticType(e)) where StaticLookup(C, m)=M′:(M{grave over ( )}R) . . . (C{grave over ( )}S)υ(M′{grave over ( )}R).    3. For each method M, and for each virtual “new C( )” occurring in M:(M{grave over ( )}R)υ(C{grave over ( )}S).
Intuitively, the second constraint refines the corresponding constraint of CHA by insisting that C{grave over ( )}S, and that the third constraint reads: “S contains the classes that are instantiated in a reachable method.”
RTA is easy to implement, scales well, and has been shown to compute call graphs that are significantly more precise than those computed by CHA.6 There are several whole-program analysis systems that rely on RTA to compute call graphs (e.g., the JAX application extractor40). In the Section entitled “Results”, RTA is used as the baseline against which the new call graph construction methods and process technique is compared, according to the present invention.
Related Work
Propagation-Based Algorithms
The idea of doing a propagation-based program analysis with one set variable for each expression is well known. This so-called monovariant style of analysis can be done in O(n3) time where n is the number of expressions. When the goal is to construct a call graph approximation in object-oriented or functional languages, then that style of analysis is known as 0-CFA, and when the goal is to do points-to analysis for C programs, then that style of analysis is often referred to as “Andersen's analysis”.3, 31 0-CFA has been implemented for a variety of languages, including dynamically-typed object-oriented languages, functional languages, and statically-typed object-oriented languages, including Java.1,3,11,19,21,28,29,31,33,38 The experience has been that the effectiveness of the approaches is language-dependent, and perhaps even programming-style dependent.
The idea of polyvariance is to associate more than one set variable with each expression, and thereby obtain better precision for each call site. Polyvariant analysis was pioneered by Sharir and Pnueli, and Jones and Muchnick.25, 32 In the 1990s the study of polyvariant analysis has been intensive. Well known are the k-CFA algorithms of Shivers, the poly-k-CFA of Jagannathan and Weeks, and the Cartesian product algorithm of Agesen.1, 2, 24, 33 A particularly simple polyvariant analysis was presented by Schmidt.30 Frameworks for defining polyvariant analyses have been presented by Stefanescu and Zhou, Jagannathan and Weeks, and Nielson and Nielson.23,26,36 Successful applications of polyvariant analysis include the optimizing compiler of Chambers et al, and of Hendren et al, and the partial evaluator of Consel.,8,12,17 The inventors are not aware if these polyvariant approaches have been tried on programs of 100,000+ lines of code.
Algorithms not Based on Propagation
Calder and Grunwald investigated a particularly simple approach to inlining based on the unique name measure, that is, inlining in cases where there statically is a unique target for a given call site.7, 20, 35 
A variation of 0-CFA is the unification-based approach, also known as the equality-base approach. This approach was pioneered by Steensgaard in the context of paints-to analysis for C.20, 35 A comparison of Andersen's analysis and Steensgaard's analysis has been presented by Shapiro and Horwitz.31 The unification-based approach is cheaper and less precise than the 0-CFA-style approach.
A broader comparison was given by Foster, Fähndrich, and Aiken; they compared both polymorphic versus monomorphic and equality-based versus inclusion-based points-to analysis.15 Their main conclusion is that the monomorphic inclusion-based algorithm is a good choice because 1) it usually beats the polymorphic equality-based algorithm, 2) it is not much worse than the polymorphic inclusion-based algorithm, and 3) it is simple to implement because it avoids the complications of polymorphism.
An experimental comparison of RTA and a unification-based approach to call graph construction was carried out by DeFouw, Grove, and Chambers.11 Their paper presents a family of algorithms that blend propagation and unification, thereby in effect dynamically determining which set variables to unify based on how propagation proceeds. Members of the family include RTA, 0-CFA, and a number of algorithms with cost and precision in between. These above algorithms although useful do not use static criteria to decide which set variables are to be merged and do not avoid analysis of the run-time stack.
Ashley also presented an algorithm that blends unification and propagation, in the setting of Scheme.4 
Accordingly, there is a need in the prior art to overcome the above problems of (i) analysis of the stack; (ii) high computational overhead; and (iii) high complexity to implement, and to provide, a new process or method, computer readable medium and system to overcome the above problems.