The present invention relates, in general, to the tracking of variables, and more particularly, to the tracking of local variables when subroutines are called. In particular, a method and apparatus are disclosed for tracking local variable types following subroutine invocation.
In most language implementations, a stack is used for tracking method calls and returns, and for holding the values of local variables and intermediate results. For languages that support multiple threads, a stack is created for each thread that is created. The stacks are divided into frames, with each frame pushed onto the stack when a method is invoked and popped from the stack when a method returns. Each frame, except the current frame, contains the location of the next instruction to be executed when normal control returns to that frame. For the current frame (the frame for the method that is currently being executed) the location of the current instruction being executed is kept in the frame or in a machine-dependent location (e.g. a machine-register). Each frame thus pinpoints the instruction currently being executed by the method represented by the frame (hereafter referred to as the execution point). Local variables of a method are assigned to different slots within the frame that was created when that method is executed. A more detailed description of stack frames as used in a Java(trademark) Virtual Machine is described in xe2x80x9cThe Java(trademark) Virtual Machine Specificationxe2x80x9d, Lindholm, Yellin, Section 3.6, 1996, herein incorporated by reference in its entirety.
The Java(trademark) Virtual Machine (JVM) is different from other execution models in that during distinct execution portions of a method, a local variable may store values of different types. For example, during one part of the execution of a method, a local variable may hold a value of type integer (int). However, in another part of the execution of that same method, the same local variable may hold a reference (pointer) to a heap-allocated object.
During the execution of a program, it may be advantageous to track and determine precisely the types of the values stored in each of the local variables. Two exemplary applications for such type tracking are dynamic storage reclamation (garbage collection), and debugging. For dynamic storage reclamation, it is desirable to precisely determine which local variables contain references to heap-allocated objects.
For debugging, it is desirable to precisely identify the types of values stored in local variables when displaying the values stored in the local variables to the user. Given the type of data stored in a given local variable, the debugger then can display that data correctly to the user. Specifically, given a value stored in a local variable and not knowing the type of value stored in the variable, the debugger will not know whether that value is a pointer to the value or is the actual value itself. Further, even if the value is known to be a non-pointer, or primitive, value the debugger must know the specific data type to display the value properly to the user. For example, integer data, floating-point data, and character data are stored using different internal representations. Given only the value represented in internal format, the debugger must know the type of the value to translate and display that value to the user correctly. Displaying the values and types stored in local variables enables the user to define his or her debugging strategy accordingly. In the context of this application, the term xe2x80x9cprimitivexe2x80x9d refers to any non-pointer data type, including but not limited to character, integer, long, short, float, and bit or flag.
The Java(trademark) Virtual Machine Specification, incorporated by reference above, specifies rules to which the bytecode of a Java(trademark) class file must conform to be considered valid. A verifier can be used before execution to ensure that a Java(trademark) class file conforms to these rules. One Java(trademark) rule specifies that at any given point within a method, regardless of the code path leading to that point, a local variable can only be accessed if it is known to contain a value of the type expected by the entity accessing the local variable. For example, if at a point in a method, a local variable is to be accessed as a reference, then all code paths to that point must assign a reference or a null to that variable. Likewise, if a local variable is being accessed as a float, then all code paths leading to that access must assign a float type to that local variable. This rule means that for many Java(trademark) class files, a simple static analysis of the bytecode for each Java(trademark) method will identify the type status for all local variables at all locations of interest within the method. The analysis is said to be xe2x80x9cstaticxe2x80x9d because it does not consider the specific control flow path taken through the method during execution.
A static analysis of the bytecode may not work if the bytecode contains JSR (jump subroutine) instructions. The JSR instructions supported by Java(trademark) are unlike subroutine calling instructions supported by other languages in that the JSR instruction places no new frame on the stack. Thus, the JSR instruction makes no new xe2x80x9ccopiesxe2x80x9d of local variables. Instead, the subroutine targeted (hereinafter the xe2x80x9ctarget subroutinexe2x80x9d or the xe2x80x9cJSR-subroutinexe2x80x9d) by the JSR instruction accesses the same local variables at the same memory locations as does the calling program. Further, the JSR-subroutine can store new values in the local variable, or can alter the type of value stored in the local variable. Accordingly, depending on which particular JSR instruction called the JSR-subroutine, a given local variable could store data having different types. Other languages and the Java(trademark) method call create a new set of local variables on a new stack frame. Thus, in such other languages type determination is a simple matter of analyzing the copy of the local variables as they exist on the stack.
In a JSR-subroutine, the determination of the type may depend on the flow of execution through the method. A JSR instruction is used to jump to a subroutine (a section of code within the method). The execution of a JSR instruction places the address of the instruction that appears immediately after the JSR instruction on the top of the stack. This is the return location for the JSR-subroutine. During the execution of the JSR-subroutine, the return location is saved in a local variable. The execution of the subroutine completes with the execution of a xe2x80x9cretxe2x80x9d instruction that specifies the local variable containing the return location. Also, a JSR-subroutine can also terminate abruptly as described below.
Several different points in the bytecode can have JSR instructions that specify the same target JSR-subroutine. After execution of the subroutine, the execution continues with the instruction that follows the JSR instruction that jumped to the subroutine, as stored in the local variable. Local variables of the method are definable and accessible within the JSR-subroutine, subject to the same rules on uses of local variables as is the rest of the method.
The JVM rules allow a local variable to hold values of different types when different JSR instructions are executed that lead to the same JSR-subroutine. According to this rule, the local variable may not be accessed within the JSR-subroutine, but its existence adds to the complexity of determining precisely the types of values held in local variables for execution points within the JSR-subroutine. A simple static analysis does not consider or determine which JSR instruction was executed to jump to the JSR-subroutine, and accordingly does not track the type of value currently stored in local variables before jumping to the JSR-subroutine.
Note that a JSR-subroutine may also terminate abruptly if an exception occurs during it execution, in which case execution control is transferred to an appropriate exception handler, such as a Java(trademark) xe2x80x9ccatchxe2x80x9d block. This abrupt termination is mentioned only for completeness. The invention pertains to the determination of the types of values held in local variables during the execution of a JSR-subroutine. Whether the subroutine completes normally or abruptly is not relevant to the invention.
A JSR-subroutine can in turn include a JSR instruction jumping to another JSR-subroutine. Such nesting of JSR instructions within JSR-subroutines means that for a given execution point within a JSR-subroutine, the types of values held by the local variables may depend on which set of JSR instructions have been executed to reach that point. Recursion, either direct or indirect, is not permitted.
One approach to determining the types of the locals variables at various execution points within JSR-subroutines is to rewrite the bytecode prior to execution to prevent a given local variable from storing values of different types when different JSR instructions branching to the same JSR-subroutine are executed. Rewriting the bytecode might involve duplicating the JSR-subroutines so that there is one version of the JSR-subroutine for each type of value stored in a given local variable at the time of the different JSR instructions. One drawback to this approach is that it increases the size of the bytecode. Further, if JSR-instructions are nested, the number of copies of the individual JSR-subroutines may grow exponentially, further swelling the bytecode.
Another approach to rewriting the bytecode involves splitting a local variable into multiple local variables if it is determined to be holding different types at the time of different JSR instructions calling the same JSR-subroutine. During garbage collection, local variables that hold values of any non-reference type can be grouped as one type, allowing the local variable to be split just two ways. This grouping for garbage collection reduces the splitting of local variables, but does not eliminate it. However, during debugging, the reference/non-reference dichotomy is not available, and the local variable must be split once for each different type of value it may hold before a given point-of-interest. Agesen, et al., proposed this splitting approach in 1998.
The variable-splitting approach involves the introduction of new local variables to the bytecode. In Java(trademark) bytecode, local variables are referenced by numbers. Local variable 1 is a separate local variable from local variable 5. Bytecode is tuned for space-efficiency such that references to local variables with very small numbers take less space than references to local variables represented by larger numbers. The variable-splitting approach means that bytecodes that had referred to local variable X may now need to refer to local variable Y. Accordingly, variable-splitting may increase the size of the resulting bytecode.
Thus, both rewriting approaches can increase the size of the resulting bytecode. Because the JVM imposes an upper limit on the size of the bytecode for a method, both rewriting approaches may fail for some large methods. Also, because the JVM imposes an upper limit on the number of unique local variables defined by a method, the variable-splitting approach suffers a further disadvantage.
In a computer program having a subroutine which is executed following a subroutine invocation, types of local variables are tracked. A base map indicating types of the local variables is generated for the subroutine invocation outside of the subroutine. For a point-of-interest within the subroutine, a delta map indicating changes to any of the local variables is generated. An execution path towards the point-of-interest is determined to identify the base map as corresponding to the subroutine invocation. The base map is merged with the current delta map which is obtained from the delta map in accordance with the execution path to determine types of the local variables.