The present invention relates generally to efficient use of computer memory in carrying out program instructions, and specifically to methods and apparatus for garbage collection, i.e., for automatic reclamation of unused memory.
Programming languages such as Java relieve the programmer of the burden of explicit memory management through the use of automatic garbage collection (GC) techniques that are applied xe2x80x9cbehind the scenes.xe2x80x9d Unused data objects, which are not reachable by the running program via any path of pointer traversals, are considered xe2x80x9cgarbage.xe2x80x9d GC automatically reclaims computer storage assigned to such objects, in order to free the storage for reuse. This makes programming in garbage-collected languages significantly easier than in C or C++, for example, in which the programmer must include an explicit xe2x80x9cfreexe2x80x9d statement in order to reclaim memory. It allows many run-time errors to be avoided and naturally supports modular programming. A survey of basic GC techniques is presented in an article by Wilson, entitled xe2x80x9cUniprocessor Garbage Collections Techniques,xe2x80x9d in the proceedings of the 1992 International Workshop on Memory Management (Springer-Verlag Lecture Notes in Computer Science), which is incorporated herein by reference.
Run-time GC, as is performed by a Java Virtual Machine (JVM), does not (and in general cannot) collect all the garbage that a program produces. GC typically collects objects that are no longer reachable from a set of root references. However, there are some objects that the program never accesses again, even though they are reachable. Failure to reclaim memory at the proper point may lead to memory xe2x80x9cleaks,xe2x80x9d with unreclaimed memory accumulating until the program terminates or memory space is exhausted. Such leaks may lead to performance slowdown and to the program running out of memory space. Some Java programmers try to circumvent these memory leaks by rewriting their source code, i.e., explicitly assigning xe2x80x9cnullxe2x80x9d to object references that are known not to be used again. Such solutions may lead to erroneous (or even slower) programs and may even be eliminated by optimizing compilers.
For example, a standard Java implementation of a stack data structure is shown in Table I below. (Certain lines in the program are marked xe2x80x9csxe2x80x9d, xe2x80x9cSxe2x80x2xe2x80x9d and xe2x80x9csxe2x80x3xe2x80x9d for later reference.) After a successful xe2x80x9cpop,xe2x80x9d the current value of xe2x80x9cstack[top]xe2x80x9d is not subsequently used. Current garbage collection techniques fail to identify memory leaks of this sort; thus, storage allocated for elements popped from the stack may not be freed in a timely manner.
A typical solution to avoid these memory leaks is to explicitly assign xe2x80x9cnullxe2x80x9d to array elements that are no longer needed. For example, a stack implementation, which avoids these leaks, is shown in Table II below, wherein xe2x80x9cnullxe2x80x9d is explicitly assigned to xe2x80x9cstack[top].xe2x80x9d
Explicit solutions of the type shown in Table II have the following drawbacks:
Explicit memory management complicates program logic and may lead to bugs; by trying to avoid memory leaks, a programmer may inadvertently free an object prematurely.
GC considerations are not part of the program logic; thus, they are surely not a good programming practice. In fact, the whole idea of GC-aware programs defeats some of the purposes of automatic GC.
Aiding the memory management task may require knowledge of the GC algorithm, which is implementation-dependent. This may lead to programs that depend on a particular GC algorithm.
The solution of explicitly assigning xe2x80x9cnullxe2x80x9d may slow the program, since such xe2x80x9cnullxe2x80x9d assignments are performed as part of the program flow. For example, consider the method xe2x80x9cremoveAllElementsxe2x80x9d of class xe2x80x9cjava.util.Vector,xe2x80x9d taken from a Java utilities package called xe2x80x9cjava.util,xe2x80x9d offered by Sun Microsystems and shown in Table III below. The only reason for the loop in the method is to allow GC to free the array elements.
An optimizing compiler may deduce that xe2x80x9cnullxe2x80x9d assignment statements have no effect, thus eliminating them.
The above limitations lead to the conclusion that programmers should be freed from dealing with these memory management considerations and that the leaks should be detected by automatic means, such as by compiler analyses.
xe2x80x9cLiveness analysisxe2x80x9d is a method of data flow analysis that has been developed for use in software optimization and validation. It is described, for example, by Nielson et al., in Principles of Program Analysis (Schloss Dagstuhl, Germany, 1998), which is incorporated herein by reference. The analysis is typically performed by tracing the program flow backwards, from end to beginning. A variable is considered xe2x80x9clivexe2x80x9d before a program point if there exist an execution sequence in the program including the program point and a use of the current value of the variable, such that the program point occurs before the use of the current value of the variable, and the variable is not reassigned before the use.
The use of liveness analysis for garbage collection in Java is proposed by Agesen et al., in an article entitled xe2x80x9cGarbage Collection and Local Variable Type-Precision and Liveness in Java(trademark) Virtual Machines,xe2x80x9d in the proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Montreal, June, 1998), which is incorporated herein by reference. This liveness analysis is applied to references held by local variables and leads to a reduced root set, enabling more memory to be reclaimed. The main benefit found by the authors in using liveness analysis for GC was in xe2x80x9cpreventing bad surprises.xe2x80x9d There is no suggestion in the article as to how liveness analysis might be applied to arrays of objects.
Analysis of relationships between variables has been used to analyze array accesses, generally for purposes of parallelizing software compilers and array reference analyses. A precise method for this purpose is described by Cousot et al., in an article entitled xe2x80x9cAutomatic Discovery of Linear Restraints among Variables of a Program,xe2x80x9d in the Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages (Tucson, Ariz., January, 1978), which is incorporated herein by reference. Cousot""s method automatically identifies linear relationships between variables by scanning the data flow graph of a program in a forward direction. Analysis of relationships between variables can be extended to detect the minimal and maximal values used as array indices, thus allowing the removal of checks for array bounds violations.
For similar purposes, certain programming languages, such as CLU, provide built-in dynamic arrays that can be used to implement stacks and vectors. The number of elements such arrays contain can grow or shrink as required by the program, and bounds checking is performed on every array operation that needs it. CLU also has GC facilities, which takes into account the dynamic nature of the arrays. The CLU language is described, for example, by Liskov, in xe2x80x9cA History of CLU,xe2x80x9d published in HOPL Preprints (1993), which is incorporated herein by reference. Most languages, among them Java, do not offer this capability.
It is an object of some aspects of the present invention to provide methods and programs for analyzing liveness of array elements.
It is a further object of some aspects of the present invention to provide improved methods and programs for garbage collection, and particularly for run-time garbage collection in Java.
In preferred embodiments of the present invention, a software processing tool analyzes software program code so as to detect dead array elements used in the code. Typically, the array elements comprise object references contained in an array of such references. The dead object references (or other elements) are detected based on a liveness analysis of the elements in the array performed at selected points in the program, combined with an analysis of relationships among program variables and the array elements to find mutual constraints among the object references at those points. Most preferably, the constraints are in the form of relations governing one or more program variables used to index the object references in the array and indicate a range of values of the variables that may occur at a given point in the program. The program variable relations and liveness analysis are used together to determine, at each of the selected points in the program, one or more ranges of array elements, typically denoted by corresponding ranges of indices of the object references in the array, that must be dead at that point.
Information regarding the liveness or deadness of the various ranges of array elements is used to improve the efficacy of memory management used in running the software program code. Preferably, the dead ranges are reported to a garbage collection function, which accordingly traces only those elements that may be alive at any given point. Thus, when a range of object references in the array is found to be dead, the corresponding objects are possibly removed from the garbage collector""s inventory of objects reachable from the array. Thus, if the objects are not reachable via some other path of pointer traversals, then the memory assigned to them can be reclaimed.
Thus, unlike programming techniques and tools known in the art, preferred embodiments of the present invention enable array memory leaks to be detected and prevented automatically. Although dynamic arrays, such as those used in the CLU language, offer a partial solution to prevention of memory leaks, they cannot be used for cyclic queues and still be advantageous in this respect. In any event, dynamic arrays are not a standard part of Java or other common programming languages. The method suggested by Agesen et al. for analyzing liveness of references in local variables, as noted in the Background of the Invention, cannot usefully be applied to arrays, since an array represents a set of references, wherein every array element is a potential reference, while a reference variable represents only one potential reference.
In some preferred embodiments of the present invention, the software processing tool comprises a compiler or is used together with a compiler for code optimization purposes, particularly for preventing memory leaks. Most preferably, the compiler is a Java language compiler, operating either on Java source code or on Java byte code as a Just-In-Time (JIT) compiler (or other run-time compiler). Alternatively, the tool may be integrated as an embedded component in other programming environments, preferably as part of a garbage collection library for use with a language, such as C or C++, that does not normally offer garbage collection.
Alternatively or additionally, the tool is used in software testing, to verify that software code is free of memory leaks.
In preferred embodiments of the present invention, the software processing tool is used to analyze and optimize code at the Java class level. The tool can similarly deal with xe2x80x9cprivatexe2x80x9d arrays within any encapsulated portion of a program. Although the principles of the present invention can be applied to analyze and optimize large programs, as well, the analysis is best performed on smaller, self-contained portions, such as classes, in order to reduce the time required to analyze the code. Furthermore, due to the capability of Java to load classes in run-time, not all of the code is necessarily available even when a program starts running. Proper use of encapsulation in coding a larger program, as is known in the art, ensures that the tool will have an opportunity to discover most or all of the leaks in the larger program. When a class is extended, in accordance with normal Java practice, optimization as described herein is applied only so long as the conditions that made possible the optimization of the original class still hold, i.e., as long as the liveness analysis continues to identify the same ranges of array elements as being alive or dead as in the original class.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for memory management in execution of a program by a computer having a memory, including:
identifying in the program an array of array elements;
determining at a given point in the program a range of the elements within the array such that none of the elements in the array outside the range is alive at the point; and
passing information regarding the determined range to a memory management function, so that memory locations are associated with the array elements responsive to the determined range.
Preferably, identifying the array includes identifying an array of elements which is indexed by a program variable, and determining the range of the elements includes finding a relation with respect to values of the program variable. Most preferably, finding the relation includes finding an inequality relationship governing possible values of the program variable at the given point in the program, wherein finding the inequality relationship includes defining a constraint graph that determines a bound on permitted values of the program variable.
Preferably, finding the relation includes performing an analysis of relations of the program variable in the program in a forward direction relative to an execution sequence of the program, thus determining one or more relationships between the program variable and a constant and/or another program variable. Further preferably, determining the range of the elements includes performing a liveness analysis of the program in a backward direction relative to an execution sequence of the program. Most preferably, performing the liveness analysis includes determining at the given point in the program that a first element in the array, indexed by a first value of the program variable, is not alive at the given point, and performing the analysis of program variable relations includes finding a relation at the given point in the program, such that if the first element is not alive at the given point, then at least one second element in the array, indexed by a second value of the program variable related by the program variable relations to the first value, is also determined not to be alive at the given point. Preferably, passing the information includes informing the memory management function that both the first and the second elements are not alive.
Preferably, performing the analysis of program variable relations includes defining a constraint graph that determines a bound on permitted values of the program variable at the given point in the program, and performing the liveness analysis includes adding a constraint to the constraint graph with respect to liveness of the elements to which the program variables correspond.
Preferably, finding the range of the elements includes performing a flow analysis of the program so as to identify one or more live ranges of the array, wherein an element within the one or more ranges is alive at the given point in the program if there is an execution sequence using the element in the program following the given point, and there is no assignment of the memory location assigned to the element intermediate the given point and the use of the element by the execution sequence. Most preferably, identifying the one or more live ranges includes performing a data flow analysis of the program in a backward direction relative to the execution sequence.
In a preferred embodiment, the array elements include references to objects used in the program, and the program includes a program module, which is called by another program module, wherein the references are indexed by program variables that are substantially encapsulated within the module.
Preferably, passing the information includes identifying elements in the array that are outside the range so that memory locations assigned thereto can be reclaimed by a garbage collection function.
In a preferred embodiment, identifying the elements outside the range includes providing an explicit assignment of one or more of the elements to null responsive to the identification, so that the memory locations assigned to the one or more elements can be reclaimed during run time of the program.
In another preferred embodiment, identifying the memory locations includes providing to the garbage collection function an identification of the range of the elements, so that the garbage collection function will not trace the elements outside the range. Preferably, the program includes a program module, which is called by another program module, and passing the information includes storing information identifying the range of the elements in a data field associated with the module. Most preferably, the program module includes a Java class, and passing the information includes setting a flag in a class data structure associated with the class to indicate that the information is available.
Preferably, determining the range of the elements includes analyzing the program module to find a range of live elements in the array while compiling the program module, wherein the program module includes a Java program module, and wherein analyzing the program is performed by a Java compiler. Preferably, the Java compiler receives and compiles Java source code or, alternatively or additionally, Java byte code.
There is further provided, in accordance with a preferred embodiment of the present invention, a method for program execution, including:
receiving code corresponding to a program module, which includes a data field containing information that identifies a range within an array of elements used in the program module such that at a specified point in execution of the program module, none of the elements in the array outside the range is alive;
running the code so as to assign memory locations to the elements in the array; and
reclaiming during run time of the code the memory locations assigned to at least some of the elements in the array that are outside the range, whereby the reclaimed locations may be assigned to other elements and a memory leak is prevented in the execution of the program.
Preferably, receiving the code includes receiving code in which a range of values of a program variable that indexes the elements in the array is identified.
Further preferably, reclaiming the memory locations includes actuating a garbage collection function.
There is also provided, in accordance with a preferred embodiment of the present invention, a method for software verification, including:
creating an array of vectors, each vector comprising an array of vector elements;
adding a new vector to the array;
adding a new vector element to the new vector, such that a memory location is assigned to the element;
removing the element without explicitly assigning the element to null;
repeating the steps of adding a new vector, adding a new vector element to the new vector, and removing the element a given number of times, whereby a memory error that occurs due to repeating the steps is detected.
There is additionally provided, in accordance with a preferred embodiment of the present invention, programming apparatus, including:
a memory, which stores program code including an array of elements; and
a processor, coupled to read from and write to the memory, which finds at a given point in the program a range of the elements within the array such that none of the elements in the array outside the range is alive at the point, responsive to which range the processor manages the assignment of memory locations to the elements in the array during execution of the program code.
Preferably, the program includes a Java program, and the processor analyzes the program to find the range of elements while compiling the program.
There is moreover provided, in accordance with a preferred embodiment of the present invention, apparatus for program execution, including:
a memory, which stores code corresponding to a program module, which includes a data field containing information that identifies a range within an array of elements used in the program module, such that at a specified point in execution of the program module, none of the elements in the array outside the range is alive; and
a processor, which runs the code while assigning locations in the memory to the elements in the array, such that during run time, the memory locations assigned to at least some of the elements in the array that are outside the range are reclaimed, whereby the reclaimed locations may be assigned to other elements so that a memory leak is prevented in the execution of the program.
There is furthermore provided, in accordance with a preferred embodiment of the present invention, apparatus for software verification, including:
a memory, having a given size; and
a processor, coupled to read from and write to the memory, which creates an array of vectors, each vector comprising vector elements, and successively adds new vectors to the array and adds and then removes new elements to the new vectors a given number of times without explicitly assigning the removed elements to null, thereby to determine whether a memory error occurs due to an array memory leak.
There is also provided, in accordance with a preferred embodiment of the present invention, a computer software product for detection of memory leaks in computer code defining a computer program, the product including a computer-readable medium in which are embedded computer-readable instructions in an executable file, which when read by a computer, cause the computer to identify in the program an array of elements and to find at a given point in the program a range of the elements within the array such that none of the elements in the array outside the range is alive at the point, and to pass information regarding the determined range to a memory management function, so that memory locations are associated with the array elements responsive to the determined range.
Preferably, the memory locations assigned to at least some of the array elements outside the range are reclaimed by a garbage collection function.
In a preferred embodiment, the computer-readable instructions are associated with a compiler, which compiles the computer code. Preferably, the computer code includes Java source code or alternatively or additionally, Java byte code optionally, the computer-readable instructions belong to an embedded memory usage reduction component in a compilation library. Preferably, responsive to the information regarding the determined range, the compiler optimizes execution of the program, which may include a Java class. Most preferably, the instructions cause the computer to provide an output indicative of possible memory leaks in the program.
In a further preferred embodiment, the instructions cause the computer to insert null assignments in the computer code so as to prevent memory leaks.
There is additionally provided, in accordance with a preferred embodiment of the present invention, a computer software product for use in execution of computer code defining a program module, which module includes a data field containing information that identifies a range within an array of elements used in the program module such that at a specified point in the execution of the program module, none of the elements in the array outside the range is alive, the product including a computer-readable medium in which are embedded computer-readable instructions in an executable file, which when read by a computer, cause the computer to reclaim during the execution of the code the memory locations assigned to at least some of the elements in the array that are outside the range, whereby the reclaimed locations may be assigned to other elements and a memory leak is prevented in the execution of the program.
Preferably, the memory locations of the at least some of the elements outside the range are reclaimed by a garbage collection function.
In a preferred embodiment, the product includes a Java run time facility, and the program module includes a Java class. Preferably, the product includes an embedded component in a Java Virtual Machine.
There is moreover provided, in accordance with a preferred embodiment of the present invention, a computer software product, including a Java language class embedded in a computer-readable medium, the class containing computer-readable data, including a data structure in which information specific to the class is stored, the data in the structure including a data field containing information that identifies a range within an array of elements used in the class, such that at a specified point in execution of the class, none of the elements in the array outside the range is alive, and such that when a computer on which a Java Virtual Machine is running reads the data in the class, the information in the data field causes the computer to reclaim memory locations assigned to at least some of the elements in the array that are outside the range, whereby the reclaimed locations may be assigned to other elements and a memory leak is prevented in execution of the program.