This invention generally relates to software compilers and, in particular, to a method and apparatus for determining the relationships and useful lifetime of objects of a program using linear escape analysis.
Legacy programming languages typically require a programmer to embed instructions within their code to manage memory resources to support the proper execution of the program. This manual management of memory resources is prone to error, the result of which is the all too familiar memory xe2x80x9coverflowxe2x80x9d and xe2x80x9cout of memoryxe2x80x9d conditions leading to the premature conclusion of an executing program.
To alleviate this problem, more advanced programming languages have been developed including the Java(trademark) language by Sun Microsystems, the Standard ML (SML) language of Bell Laboratories (now Lucent Technologies), and the well known LISP programming language, each of which include automatic memory management facilities. In these advanced object-oriented programming languages, memory for objects (software xe2x80x9cbundlesxe2x80x9d of variables and related methods) is explicitly allocated by the program and implicitly reclaimed by the runtime environment (e.g., the Java(trademark) Virtual Machine) when no longer needed.
When discussing the memory resources of a system, programmers often talk in terms of static memory, the stack and the heap. It is to be appreciated that these are merely separately identifiable segments of a common memory system, delineated by what they store and how they are managed. A stack, for example, is typically created in accordance with (and has the lifetime of) execution of a thread or a method (using the well-known object-oriented programming parlance). The heap is created at runtime and is available to and shared among all threads throughout execution of the program. Accordingly, a stack is useful for storing local variables and partial results used within a thread and or method, while the heap is required for program elements that traverse threads.
In the advanced object-oriented programming languages identified above, a compiler identifies threads, objects, methods and static and dynamic variables comprising the program and allocates memory resources to each based on the functional lifetime, or xe2x80x9cboundsxe2x80x9d, of the element. However, because it is often difficult to determine the exact bounds of an object, especially if it is returned from multiple methods across threads, the convention is to treat all objects the same, i.e., allocating them on (and subsequently reclaiming from) the heap.
The process of implicitly reclaiming memory resources from the heap is colloquially referred to as xe2x80x9cgarbage collectionxe2x80x9d and is a function of the runtime environment for the language. A number of techniques for garbage collection exist, each having their own advantages and disadvantages. Common to all of the techniques, however, is that they all consume processor resourcesxe2x80x94resources that might otherwise be used in support of the executing program, rather than a function of the runtime environment. Thus, while the garbage collection feature simplifies the software development process by relieving the programmer of manual memory management, it comes at a price.
In addition to the automatic memory management features, advanced programming languages typically support the simultaneous execution and automatic synchronization of multiple threads. The synchronization feature of these languages requires the compiler to provide for allocation of memory resources for pointers, counters and the like to maintain the synchronization. To support object synchronization in Java, for example, every object is conceptually created with a lock that may be used to ensure that each method executes atomically by acquiring and releasing the lock. Accordingly, if another thread (which is to be synchronized with the present thread) has the lock, the present thread cannot acquire the lock and therefore must wait for the other thread to release it. Although the synchronization features of these advanced languages relieve the programmer from xe2x80x9cmanuallyxe2x80x9d maintaining synchronization, this, too, comes at a price. As above, without knowing the bounds of an object, it is impossible to know whether synchronization is required. Consequently, as for memory management above, prior art compilers employ the conservative approach of assuming that all objects programmed to ensure atomic access by synchronization must perform all synchronization operationsxe2x80x94an approach which is costly in terms of consumed memory and garbage collection overhead.
In an attempt to limit the overhead required to support heap allocation and object synchronization, researchers have worked to develop techniques that would enable a compiler to identify the bounds of objects of a program, thus alleviating the need for the assumptions leading to excessive heap allocation and synchronization support. One approach to identifying object bounds is commonly referred to as escape analysis. Escape analysis attempts to identify whether an object escapes from or is returned from a method or thread. A number of articles have been published describing the efficacy of escape analysis including, for example, Escape Analysis. Correctness Proof, Implementation and Experimental Results, by Bruno Blanchet and published in the Proceedings of Principles of Programming Languages (1998), the text of which is hereby incorporated by reference as background material on escape analysis. A common limitation among all of the prior art escape analyses, including the Blanchet analysis, is that the complexity of the analysis increases super-linearly with the size of the program. Specifically, the complexity bound of the Blanchet analysis is mathematically represented as O(n log2 n). The Blanchet analysis algorithm is moderately difficult to understand and implement. Other prior art escape analysis algorithms have a higher worst-case complexity. Thus, for very large programs, incorporating a prior art escape analysis technique into a compiler would increase the compile time super-linearly with the size of the program.
Accordingly, a compiler with improved relationship and boundary identification is required that does not inordinately increase compile time. Just such a solution is provided below.
This invention concerns a method and apparatus for determining the relationships and functional lifetime of objects in a program.
According to the teachings of the present invention, an innovative method for identifying the bounds of an object within a program is presented, the method comprising receiving program code including the object in a suitable language, and analyzing each statement of code according to a set of rules defining an escape analysis with a complexity linear in time and space with the size of the program plus the program""s call graph.
The set of rules employed in the analysis transform the received program code into a set of type constraints having a set of properties, the solution of which identifying the relationships and useful lifetime of an object. It will be appreciated that unlike the computationally burdensome analyses that characterize the prior art, the linear nature of the escape analysis disclosed herein facilitates implementation in, for example, a compiler for an advanced programming language. The linear escape analysis can be utilized, for example, to improve stack allocation of objects, reduce unnecessary synchronization of objects, and a host of additional features which will become apparent from the discussion to follow.