The present invention generally relates to computer programming, and, in particular, to methods and apparatus for a compiler (either static or dynamic), programming development environment or tool, or programmer to enable transformations of a program that involve code motion across instructions that may throw an exception, while strictly preserving the precise exception semantics of the program or parts of the program that existed before the transformation.
The present invention describes methods and apparatus for a compiler, programming development environment or tool, or programmer to enable transformation of a program or parts of a program written in some machine language so as to eliminate or reduce the impact of precise exceptions on optimizations that require instruction reordering.
Some programming languages, for example, Java(trademark) (e.g., see J. Gosling, B. Joy, G. L. Steele, xe2x80x9cThe Java Language Specification (Java Series),xe2x80x9d Addison-Wesley Publishing Company, Reading, Mass. 1996), Ada, Modula-3, and C++ support exceptions, which represent abnormal execution conditions arising, for instance, out of violations of the semantic constraints of the language. We shall refer to an exception as being thrown or raised at a point where it occurs, and as being caught at the exception handler, which is the point to which control is transferred when the exception is thrown. The actual mechanism for determining where the control is transferred in case of an exception being thrown is part of the specification of a programming language. For example, Java(trademark) defines try-catch blocks that govern control flow when an exception is thrown by a statementxe2x80x94execution proceeds to the closest dynamically enclosing catch block that handles the exception. An exception may be thrown explicitly, such as via a throw statement in Java(trademark), or implicitly, based on checks defined by the programming language. For instance, Java(trademark) defines some runtime exceptions such as NullPointerException, IndexOutOfBoundsException, and ArithmeticException, and many instructions such as array-access instructions, object field-access instructions, and integer division instructions may throw one of the predefined runtime exceptions. We refer to an instruction that may throw an exception as a potentially excepting instruction (PEI in short).
In some languages, like Java(trademark), exceptions are defined to be precise, which implies that:
1) exception(s) must be thrown in the same order as specified by the original program; and
2) when an exception is thrown, the program state observable at the entry of the corresponding exception handler must be the same as in the original program.
Compilers often use a representation called the Dependence Graph to represent constraints on code motion and instruction reordering. The nodes in a dependence graph typically represent statements, and edges represent dependence constraints. Compilers for languages supporting precise exceptions satisfy the precise exception requirement by imposing the following dependence constraints, described further in J.-D. Choi, D. Grove, M. Hind, and V. Sarkar, xe2x80x9cEfficient and precise handling of exceptions for analysis of Java programs,xe2x80x9d ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, September 1999:
1) dependences among PEIs, referred to as exception-sequence dependences, which ensure that the correct exception is thrown by the code, and
2) dependences between writes to non-temporary variables and PEIs, referred to as write-barrier dependences, which ensure that a write to a non-temporary variable is not moved before or after a PEI, in order to maintain the correct program state if an exception is thrown. These dependences hamper a wide range of program optimizations in the presence of PEIs, such as instruction scheduling, instruction selection (across a PEI), loop transformations, and parallelization. This impedes the performance of programs written in languages like Java(trademark), in which PEIs are quite common.
A reference to a variable is said to be live at a program point if the value of the variable is used after that program point on some control flow path to the exit before it is redefined.
Let us use the Java code segment shown in FIG. 3 as an example. FIG. 4 shows the low-level intermediate representation (LIR) of the Java code in FIG. 3. Compilers frequently use LIR representation of the input program, similar to the LIR in FIG. 4, during the analysis and optimization of the program. FIG. 5 shows the dependence graph of the LIR in FIG. 4. Compilers frequently use dependence graphs, similar to FIG. 5, for analysis and optimization of programs. We define the dependence locus of a node Ni in the dependence graph to be the set of nodes that are transitively reachable from Ni via dependence edges. Each node in the dependence graph corresponds to a statement (i.e., line) in the LIR. The column titled Dependence Graph Node in the LIR shows the corresponding node in the dependence graph of the statement.
In the dependence graph in FIG. 5, exception-sequence dependences are shown as dashed lines, while write-barrier dependences are shown as dotted edges, such as from n9 to P10. The longest path of the graph shown in the figure has a length of 10: P1, P5, n6, P7, n8, n9, P10, n11, P12, n13, and n14.
A program with more than one thread (locus of control) is said to be multithreaded. In a multithreaded application, the shared program state when an exception is thrown can be visible not only to the exception handler, if any exists, of the exception-throwing thread, but also potentially to any other threads that can access the program state. Furthermore, an exception not caught by a handler terminates only the exception-throwing thread, and other threads can still access the shared program state affected by the terminating thread and continue their execution.
Allowing for uncontrolled accesses (read or write) to shared program state by multiple threads usually renders a program incorrect or, at best, hard to understand and develop. Most languages, therefore, provide mechanisms for controlling accesses to shared program state, i.e., shared variables. A synchronized region, such as a synchronized block or method in Java, is used in which to access a shared variable. Some languages go even further to specify that for a program to be correct, accesses to shared variables should be controlled xe2x80x9cproperly,xe2x80x9d usually implying that the program should obey the CREW protocolxe2x80x94Concurrent Read, Exclusive Write. In a CREW protocol, there can be as many concurrent read accesses as long as there is no concurrent write access, but no read or write accesses can be concurrent with another write access to the same variable. We refer to languages that force parallel programs to obey the CREW protocol or which do not specify any constraints on the ordering of data accesses in regions not obeying the CREW protocol, as languages supporting weak consistency. We refer to other languages, which do impose constraints on the ordering of data accesses even in regions not obeying the CREW protocol, as supporting strong consistency.
Many compilers use a representation called a call graph to analyze an entire program. A call graph has nodes representing procedures, and edges representing procedure calls. We use the term procedure to refer to subroutines, functions, and also methods in object-oriented languages. A direct procedure call, where the callee (called procedure) is known at the call site, is represented by a single edge in the call graph from the caller to the callee. A procedure call, where the callee is not known, such as a virtual-method call in an object-oriented language or an indirect call through a pointer, is represented by edges from the caller to each possible callee. It is also possible that given a particular (callee) procedure, all callers of it may not be known. In that case, the call graph would conservatively put edges from all possible callers to that callee.
A topological sort order enumeration of nodes in a graph refers to an enumeration in which, if the graph contains an edge from node x to node y, then x appears before y. If a graph has cycles, then such an enumeration is not guaranteed for nodes involved in a cycle. A reverse topological sort order lists nodes in the reverse order of a topological sort.
Prior art for a similar goal of allowing instruction reordering in the presence of exceptions can be found in the papers: D. August, D. Connors, S, Mahike, J. Sias, K. Crozier, B.-C. Cheng, P. Eaton, Q. Olaniran, and W.-M. Hwu, xe2x80x9cIntegrated predicated and speculative execution in the IMPACT EPIC architecture,xe2x80x9d Proceedings of 25th International Symposium on Computer Architecture, July 1998; P. Chang, S. Mahlke, W. Chen, N. Water, and W.-M. Hwu, xe2x80x9cIMPACT: An architectural framework for multiple-instruction-issue processors,xe2x80x9d Proceedings of 18th International Symposium on Computer Architecture, pages 266-275, 1991; K. Ebcioglu, xe2x80x9cSome design ideas for a VLIW architecture for sequential natured software,xe2x80x9d Parallel Processing, pages 3-21, M. Cosnard et al. (editors), North Holland, 1988; K. Ebcioglu and G. Silberman, xe2x80x9cAn architectural framework for supporting heterogeneous instruction-set architectures,xe2x80x9d IEEE Computer, 26(6), pages 39-56, June 1993; K. Ebcioglu and E. R. Altman, xe2x80x9cDAISY: Dynamic compilation for 100% architectural compatibility,xe2x80x9d Proceedings of 24th International Symposium on Computer Architecture, pages 26-37, Denver, Colo., June 1997; S. Mahlke, W. Chen, R. Bringmann, R. Hank, W.-M. Hwu, B. Rau, and M. Schlansker, xe2x80x9cSentinel scheduling: A model for compiler-controlled speculative execution,xe2x80x9d ACM Transactions on Computer Systems, 11(4):376-408, November 1993; M. Smith, M. Lam, and M. Horowitz, xe2x80x9cBoosting beyond static scheduling in a superscalar processor,xe2x80x9d Proceedings of 19th International Symposium on Computer Architecture, pages 344-354, May 1990; and in U.S. Pat. No. 5,799,179 to K. Ebcioglu and G. Silberman entitled xe2x80x9cHandling of exceptions in speculative instructions,xe2x80x9d issued on Aug. 25, 1998. These methods differ from the method described in this invention, at least, in that they require special hardware support to ensure that the results of speculatively executed instructions that raise an exception are not committed prematurely. Furthermore, these methods do not attempt to reduce the program state that must be preserved at a possible exception point.
Prior art for aggressive code motion of instructions, including PEIs, can be found in K. Ebcioglu, xe2x80x9cSome design ideas for a VLIW architecture for sequential natured software,xe2x80x9d Parallel Processing, pages 3-21, M. Cosnard et al. (editors), North Holland, 1988. The method presented in this Ebcioglu article has the drawback of requiring special hardware support for extra instruction opcodes to indicate a non-speculative or speculative version, and an extra bit in the registers to denote a xe2x80x9cbottomxe2x80x9d result from an instruction that causes an exception.
Prior art for aggressive code motion of instructions, including PEIs, can be found in M. Smith, M. Lam, and M. Horowitz, xe2x80x9cBoosting beyond static scheduling in a superscalar processor,xe2x80x9d Proceedings of 19th International Symposium on Computer Architecture, pages 344-354, May 1990. The method presented in this Smith et al. article has the drawback of requiring expensive hardware support in the form of shadow register files and shadow store buffers to hold the result of speculative instructions.
Prior art for aggressive code motion of instructions, including PEIs, can be found in P. Chang, S. Mahlke, W. Chen, N. Water, and W.-M. Hwu, xe2x80x9cIMPACT: An architectural framework for multiple-instruction-issue processors,xe2x80x9d Proceedings of 18th International Symposium on Computer Architecture, pages 266-275, 1991. The method presented in this Chang et al. article has two drawbacks: (i) it requires hardware support for silent exceptions, i.e., for ignoring exceptions thrown by speculatively executed instructions, and (ii) it may fail to throw an exception that is thrown by the original, unoptimized program, which would constitute a violation of program semantics in a language like Java(trademark).
Prior art for aggressive code motion of instructions, including PEIs, can be found in S. Mahlke, W. Chen, R. Bringmann, R. Hank, W.-M. Hwu, B. Rau, and M. Schlansker, xe2x80x9cSentinel scheduling: A model for compiler-controlled speculative execution,xe2x80x9d ACM Transactions on Computer Systems, 11(4):376-408, November 1993. This method also has the drawback of requiring special hardware support in the form of extra bit on registers to record whether a speculative instruction caused an exception, as well as an extra bit in the instruction opcode to distinguish between speculative and non-speculative instructions.
Prior art for aggressive code motion of instructions, including PEIs, can be found in K. Ebcioglu and G. Silberman, xe2x80x9cAn architectural framework for supporting heterogeneous instruction-set architectures,xe2x80x9d IEEE Computer, 26(6), pages 39-56, June 1993 and K. Ebcioglu and E. R. Altman, xe2x80x9cDAISY: Dynamic compilation for 100% architectural compatibility,xe2x80x9d Proceedings of 24th International Symposium on Computer Architecture, pages 26-37, Denver, Colo., June 1997. The methods presented in these Ebcioglu articles have the drawback of requiring special hardware support for non-architected registers to hold the results of operations executed out of order.
Prior art for aggressive code motion of instructions, including PEIs, can be found in U.S. Pat. No. 5,799,179 to K. Ebcioglu and G. Silberman entitled xe2x80x9cHandling of exceptions in speculative instructions,xe2x80x9d issued on Aug. 25, 1998. The method presented in this Ebcioglu et al. patent has the drawback of requiring special hardware support in the form of extra bits in registers for exception tracking and recovery.
Prior art for aggressive code motion of instructions, including PEIs, without requiring any special hardware support, can be found in B. C. Le, xe2x80x9cAn out-of-order execution technique for runtime binary translators,xe2x80x9d Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151-158, October 1998. However, this method requires the generation of check-pointing code, which contributes to the overhead of executing extra instructions even when no exception is thrown. This overhead of check-pointing can potentially be high.
Prior art for aggressive program transformations in the presence of exceptions can be found in A. Aiken, J. W. Williams, and E. L. Wimmers, xe2x80x9cSafe: A semantic technique for transforming programs in the presence of errors,xe2x80x9d ACM Transactions on Programming Languages and Systems, 17(1):63-84, January 1995. This method defines a higher-order function called Safe, which is used to annotate parts of the program that are guaranteed not to produce errors or exceptions. This method is not applicable to many computations, where it cannot be guaranteed that exceptions will not take place, even if exceptions are rare.
Prior art for aggressive program transformations in the presence of exceptions can be found in U.S. Pat. No. 6,343,375, issued on Jan. 29, 2002 to M. Gupta, S. Midkiff, and J. Moreira, and entitled xe2x80x9cMethod for optimizing array bounds checks in programs.xe2x80x9d This method performs a program transformation to create safe regions in which no exception may take place. However, this method is only applicable to a restricted class of computations using arrays, and can only be used to handle out-of-bounds array index exceptions and null-pointer exceptions. For many computations, no such safe regions can be created using this method.
The present invention provides methods and apparatus to analyze and transform a computer program so as to enable program transformations which require code motion across potentially excepting instructions, while strictly preserving the precise exception semantics of the original program. Examples of program transformations that benefit from our method include instruction scheduling, instruction selection across a PEI, loop transformations to improve data locality and instruction-level parallelism, and program parallelization. Thus, the transformations enabled by this invention are important to improve the performance of the program being transformed, and also to decrease the execution time of other programs executing on the system.
In a first aspect of the invention, a method of optimizing a computer program written in a language that supports exceptions, comprises the steps of: (i) identifying in the computer program a statement that writes into a variable and a statement corresponding to a potentially excepting instruction; and (ii) removing a constraint on moving the write statement across the potentially excepting instruction when the variable written by the write statement is determined to be not live at an exception handler that catches the exception when thrown.
In a second aspect of the invention, a method of optimizing a computer program written in a language that supports exceptions, comprises the steps of: (i) transforming each code region to be optimized by performing optimizations without any constraints on ordering between potentially excepting instructions; and (ii) generating compensation code which executes only if an exception is thrown by the region transformed in the transforming step.
A preferred method of the present invention first performs a static analysis of the program to identify the types of exceptions possibly thrown in the program, and to determine the liveness of variables at entry to different exception handlers in the program. The method then performs two transformations on the program, which deal respectively with overcoming write-barrier dependences and exception sequence dependences involving PEIs.
The first transformation adds a parameter to each procedure being optimized, which encodes the information about liveness of variables at each dynamically enclosing exception handler for the given procedure call. This information is used at runtime to select one of two versions of the procedure generated by the compiler as follows. The first version is applicable when no variable written in the procedure, other than the implicit exception object, is live at any exception handler (dynamically enclosing the procedure call) which could catch an exception thrown during the procedure call; the second version is selected otherwise. The first version of the procedure, which we shall refer to as the specialized version, is optimized by the compiler while completely ignoring all write-barrier dependences involving PEIs in code regions inside it without any local exception handler. The program optimizations in the second version, which is the original procedure and is referred to as the normal version, remain constrained by the write-barrier dependences due to PEIs. Thus, in cases where the runtime information obtained using our method allows the specialized version of the procedure to be selected during execution, better performance can be obtained due to more effective optimizations.
For multithreaded programs, our preferred method performs the following additional analysis. If the programming language supports weak consistency and if the code region being optimized is part of a synchronized region, no further write-barrier dependences are imposed. However, if either the programming language supports strong consistency or if the code region is inside a synchronized region, our method performs analysis to detect thread-local variables. In this case, write-barrier dependences are honored between PEIs and writes of variables that are not thread-local.
The second transformation is performed, in a preferred method of the invention, on the specialized version of a procedure. This transformation further creates two regions out of the code region to which it is applied. The first region, called the optimized code, is obtained by allowing optimizations that completely ignore all exception-sequence dependences in the original region. The second region, called the compensation code, intercepts any exception thrown in the optimized region, and throws the correct exception that would have been thrown by the original, unoptimized code. This compensation code is executed only if and when an exception is thrown during the execution of the optimized code. Hence, in the common case where the execution of the program leads to no exception being thrown, this transformation allows the program to be optimized, completely ignoring all exception-sequence dependences, without the overhead of executing any part of the compensation code.
An alternative embodiment of the method uses a different analysis to overcome the write-barrier dependences due to PEIs. In this embodiment, no extra parameter is added to procedures. Instead, the analysis to record the liveness information of variables at enclosing exception is done at compile time, using the call graph representation of the program. The information about exception handlers, if any, surrounding a procedure call from A to B is propagated to B and to all procedures reachable in the call graph from B (representing all procedures transitively called from B). In this embodiment, there is no need to create a specialized version of the procedure. All write-barrier dependences related to PEIs that do not have a non-trivial enclosing exception handler, are ignored. In the special case where the static analysis shows that a procedure cannot have any dynamically enclosing exception handler with live variables, those regions of code in that procedure which do not have an exception handler block within the procedure, can be viewed as write-barrier-free regions. These regions are optimized while ignoring all write-barrier dependences.
Another alternative embodiment of the method performs the transformation to eliminate exception-sequence dependences on both the specialized version (without write-barrier dependences) and the normal version (with write-barrier dependences) of the procedure, in order to get the benefits of overcoming exception-sequence dependences regardless of the need to preserve the values of variables at potential exception points in the procedure.
Other embodiments of the method perform the analysis and transformations at run time, in a dynamic compiler. A variant of this embodiment does not create the compensation code, which ensures that the correct exception is thrown, during code generation time. It generates the compensation code only on demand, if an exception is thrown in the optimized code. Similarly, another variant of the method does not initially generate two versions of the procedure code in order to overcome the write-barrier dependences. It generates code for only the version (write-barrier-free or the original code) that will be executed, based on the expected information about liveness of variables at enclosing exception handlers for the given procedure call, and generates code for the other version on demand, if needed.
Another embodiment of the method computes and uses information about the set of live variables at entry to each exception handler at a finer granularity. This embodiment imposes write-barrier dependences for each PEI separately, based on information about the set of variables that are live at the exception handler(s) for the exception possibly thrown by that PEI.