1. Field of the Invention
The present invention relates to computer programming. In particular it is a method and variant of the method for a compiler (either static or dynamic), programming development environment or tool, or programmer to transform a program or part of a program so as to reduce the overhead of lock operations by either removing the lock operations or replacing them with simpler operations, while strictly preserving the exact semantics of the program or parts of the program that existed before the transformation.
2. Background Description
Multithreaded programming languages, for example Java(trademark) [5] and programs built from sequential languages but executing in a multithreaded environment, for example Posix [2] often use locking operations. These locking operations are used to enforce the constraint that only one thread of execution may have access to some resourcexe2x80x94data, hardware, code, etc. at a time. A thread is a locus of control in a computing environment. In an object-oriented Language such as Java(trademark), the lock is typically associated with an object and is used to ensure mutual exclusion in accessing that object. In those cases, the lock is regarded as a part of that object. (Java is a trademark of Sun Microsystems, Inc.)
Each lock has associated with it some storage that is used to implement the lock. This storage provides a flag to indicate if the lock has been acquired by anyone else. The lock can also provide a queue. The queue provides a place for a thread that attempts to acquire a lock that has already been acquired by another thread to wait for the lock to become free. A thread that is on a queue is quiescent, i.e. it is not actively executing. If a lock provides a queue, it must provide some mechanism for threads on the queue to be notified that they can exit the quiescent state and again attempt to acquire the lock. This mechanism is referred to as the notify operation. Locks may also have side effects associated with them. For example, in the Java(trademark) programming language, or various run-time and hardware systems that implement a release consistency programming model [1], a locking operation must update the globally accessible copy of a variable if required by the semantics of programming language or release consistency model being implemented. Other side effects could include updating tables indicating which locks are held by the program, or providing a point for a breakpoint operation in a debugger. Locking operations, and the locks needed to support the locking operation, have a cost in both an increased execution time of the program and in the amount of computer storage necessary for the program to execute.
Many compilers use a representation called a xe2x80x9ccall graphxe2x80x9d to analyze an entire program. A call graph has nodes representing procedures, and edges representing procedure calls. The term xe2x80x9cprocedurexe2x80x9d is used to refer to subroutines, functions, and also xe2x80x9cmethodsxe2x80x9d in object-oriented languages. A direct procedure call, where the callee (called procedure) is known at the call site, is represented by a single edge in the call graph from the caller to the callee. A procedure call, where the callee is not known, such as a xe2x80x9cvirtual methodxe2x80x9d call in an object-oriented language or an indirect call through a pointer, is represented by edges from the caller to each possible callee. It is also possible that, given a particular (callee) procedure, all callers of it may not be known. In that case, the call graph would conservatively put edges from all possible callers to that callee.
Within a procedure, many compilers use a representation called the xe2x80x9ccontrol flow graphxe2x80x9d (CFG). Each node in a CFG represents a xe2x80x9cbasic blockxe2x80x9d and the edges represent the flow of control among the basic blocks. A basic block is a straight-line sequence of code that has a single entry (at the beginning) and a single exit (at the end). A statement with a procedure, call does not disrupt a straight-line sequence of code. In the context of languages that support xe2x80x9cexceptionsxe2x80x9d, such as Java(trademark), the definition of a basic block is relaxed to include statements which may throw an exception. In those cases, there is an implicit possible control flow from a statement throwing an exception to the block of code handling the exception. The basic block is not forced to end at each such statement, and instead, such a basic block bb is said to have a flag bb.outEdgeInMiddle set to true.
A topological sort order enumeration of nodes in a graph refers to an enumeration in which, if the graph contains an edge from node x to node y, then x appears before y. If a graph has cycles, then such an enumeration is not guaranteed for nodes involved in a cycle. A reverse topological sort order lists nodes in the reverse order of a topological sort.
Prior art for a similar goal of reducing the synchronization costs of a program by a compiler or programming tool or environment can be found in the papers [3,4,5,6,8,9,10,11,12,13]. These methods do not perform a class of optimizations, that of removing synchronization from objects acted upon by mutual exclusion, or mutex locks, based on the scope in which the lock is accessed. Furthermore, these methods do not handle programs with explicit constructs for multithreading and exceptions (e.g. xe2x80x9ctry-catchxe2x80x9d constructs in Java(trademark)).
Prior art for reducing locking or synchronization operations by a compiler, programming tool or environment can be found in the papers [5,6,8,9,10,11,12,13]. The techniques described in these papers look at advance/wait, post/wait/clear and full/empty and extended full/empty or counter based synchronization. All of these synchronization methods enforce ordering (i.e., producer/consumer) synchronization, and the goal of these techniques is to transform programs so as to reduce the amount of ordering synchronization. Ordering synchronization is typified by post/wait/clear synchronization. A post operation on locks involves acquiring a lock on a key K, setting K to a known value (usually 1), and releasing the lock on K. A wait operation on key K involves repeatedly examining the value of K until it reaches the known value. A clear operation first acquires the lock on K, and then sets the value of K to another known value (usually 0). The clear operation is used to initialize K. Thus, by using clear/post/wait, an order can be enforced among the statements that pre cede the post and follow the wait. In particular, all statements before the post can be made to execute before any statement after the wait. All of the techniques described above use the ordering information. In particular, they determine what orders enforced by some ordering synchronization operations are enforced by other ordering synchronization operations, and eliminate the former synchronization operations. In some cases, a reduced number of new operations are introduced to eliminate all of the old operations [9], and in other cases the old state of keys are known after wait operations to reduce the number of initializing clear operations [6].
Prior art for reducing synchronization for mutex locks can be found in the papers [3,4]. In [3], the number of lock operations is reduced by a coarse-graining transformation, which leads to a single lock ensuring mutual exclusion for a coarser grain region, rather than multiple locks ensuring mutual exclusion for various finer-grain regions. While this reduces the number of lock operations, it leads to the problem of false exclusion, where operations that do not need mutual exclusion are also carried out in mutual exclusion. Therefore, the reduction in the number of lock operations comes at a price of potentially increased contention due to the lock. This transformation can sometimes degrade the performance of the program. In [4], the program is transformed so that multiple lock operations on the same object are replaced by a single lock operation on that object. While eliminating lock operations, this method has to retain at least one lock operation that achieves the same synchronization as the eliminated lock operations, i.e., it cannot eliminate mutex locks entirely from computation.
It is therefore an object of the invention to provide a method to reduce the number of locking operations that are performed in a computer pro,gram, by performing analysis to determine when these transformations can be applied, while preserving the strict semantics of the untransformed program.
It is another object of the invention to provide a method to simplify locking operations that cannot be completely removed by performing analysis to determine when these transformations can be applied, all while preserving the strict semantics of the untransformed program.
It is another object of the invention to provide a method for a compiler, programming development environment or tool, or programmer to transform a program or parts of a program written in some machine language so as to reduce the number of locking operations specified in the program or part of a program that must be performed when the program executes, or to reduce the cost of these operations with respect to increased execution time of the program and in the amount of computer storage necessary, or all of the above.
According to the invention, the locking operation is divided a into two parts:
1. Those parts of the locking operation necessary to provide mutual exclusion, and to enqueue and dequeue threads that have attempted to acquire a lock already held by one or more other threads. These parts of the locking operation are referred to as the xe2x80x9csynchronizationxe2x80x9d operation for mutual exclusion or alternatively, the xe2x80x9cacquirexe2x80x9d and xe2x80x9creleasexe2x80x9d parts of the locking operation.
2. Those parts of the locking operation that do other than synchronization for mutual exclusion. These parts of the locking operation are called the xe2x80x9cside-effects.xe2x80x9d In Java(trademark), for example, the side-effects of a locking operation are the actions that update from the Java(trademark) working memory to the Java(trademark) main memory those values of shared variables that have changed since the last locking operation in this thread.
The present invention analyzes a program and uses the information formed by that analysis to transform the program, thereby reducing the number of locking operations that are performed in a program. If a locking operation cannot be completely removed from the program, the method substitutes a simpler operation for the locking operation. The ability to perform these transformations is important for two reasons:
1. Locking operations increase the execution time of the programs. In some programs they may take a majority of the execution time.
2. The necessity of enforcing the side effects can be significant, both in compile time and in increased memory traffic, I/O operations, and computer bus traffic. These costs can affect not only the performance of the program performing the locking operation but also other programs executing in the computer system.
Thus, the transformations described in this invention are important to decrease the execution time and storage requirements of the program being transformed, and to increase the performance of other programs executing on the system.
A locking operation with both synchronization for mutual exclusion and side effects is called as a xe2x80x9ccomprehensivexe2x80x9d locking operation, and a locking operation with only synchronization for mutual exclusion is called a xe2x80x9csimplexe2x80x9d locking operation.
The preferred method of the present invention first analyzes all possible references to every object on which a locking operation is performed. The result of this analysis is, for each object, information about whether a locking operation can be performed on it in more than one thread. If the object is not accessed in multiple threads, and its locking operations are simple, the locking operations can be removed, and a process is performed to cause this transformation of the program. Next, the program is analyzed to determine for each comprehensive locking operation whether its side effects are required. When this analysis has been performed, the results are used to determine for each comprehensive locking operation, along with the results of the earlier analysis, whether
1. the object is not locked in multiple threads, and the side effects of the locking operation are not needed, and the locking operation can be removed from the program;
2. the object is not locked in multiple threads, but the side effects of the locking are needed, therefore only the side effect part of the locking operation needs to be performed, and the locking operation is replaced with a simpler form that does only the side effects;
3. the object is locked in multiple threads, but the side effects of the comprehensive locking operation are not needed, therefore only the acquire and release parts of the locking operation needs to be performed, and the locking operation is replaced with a simpler form that does only the acquire and release parts of the locking operation; or
4. the object is locked in multiple threads, and side effects of the comprehensive locking operation side effects are needed, thus the lock will not be optimized;
An alternative embodiment of the method performs the analysis of the procedure to determine the number of threads an object can be locked in, and removes simple locking operations that are on an object accessed in only a single thread. For comprehensive locking operations, if the object being locked is accessed in only a single thread, then the locking operation is; replaced by a simpler operation that only performs the locking operation side effects.
Another alternative embodiment of the method performs the analysis of the procedure to determine the number of threads an object can be locked in, and removes simple locking operations that are on an object accessed in only a single thread. For comprehensive locking operations, if the object is being accessed in only a single thread, and further analysis of the procedure determines that the side effects of the locking operation are not needed, the locking operation is removed. Other locking operations with side effects are unoptimized.
Another alternative embodiment of the method performs the analysis of the procedure to determine the number of threads an object can be locked in, and removes simple locking operations that are on an object accessed in only a single thread. Comprehensive locking operations are not optimized by this variant.
Another alternative embodiment of the method does not consider simple locking operations. For comprehensive locking operations, th, analysis and transformations are performed as in the original method.
Another alternative embodiment of the method does not consider simple locking operations, and only simplifies, but does not remove locks. For comprehensive locking operations, if the object being locked is accessed in only a single thread, then the locking operation is replaced by a simpler operation that only performs the locking operation side effects.
Another alternative embodiment of the method does not consider simple locking operations, and only removes but does not simplify locks. For comprehensive locking operations, if the object is being accessed in only a single thread, and the analysis of the method determines that the side effects of the locking operation are not needed, the locking operation is removed. Other locking operations with side effects are unoptimized.