The present invention relates to an object-based computing system including an object locking system having at least two modes for controlling access to an object, and more particularly to controlling transitions between said at least two modes.
Programs written in the Java programming language (Java is a trademark of Sun Microsystems Inc) are generally run in a virtual machine environment, rather than directly on hardware. Thus a Java program is typically compiled into byte-code form, and then interpreted by the Java virtual machine (JVM) into hardware commands for the platform on which the JVM is executing. An important advantage of this approach is that Java applications can run on a very wide range of platforms, providing of course that a JVM is available for each platform. In recent years Java has become very popular, and is described in many books, for example xe2x80x9cExploring Javaxe2x80x9d by Niemeyer and Peck, O""Reilly and Associates, 1996, USA, and xe2x80x9cThe Java Virtual Machine Specificationxe2x80x9d by Lindholm and Yellin, Addison-Wedley, 1997, USA.
One of the advantages of the Java language is that creating multi-threaded applications is relatively easy. This is partly because the locking concept within the Java language is simplified for the end-programmer; there is no need at the application level to specifically code lock and unlock operations. Of course locking is important for multi-threaded applications to avoid conflicts between different threads in terms of access to and manipulation of resources (which in the Java environment are referred to as objects). Thus while one thread owns a lock on an object, no other thread may perform any operation upon that object which also requires a lock on the object. This is the principle of mutual exclusionxe2x80x94if a thread attempts to lock an object and discovers that the object is already locked, it may not perform operations on that object until it can acquire ownership of the lock.
Controlling concurrent access to data structures is a fundamental problem in computing, for both uniprocessor and multiprocessor systems (in multiprocessor systems access may be truly concurrent; in uniprocessor systems interrupts and time slicing may occur in the midst of an operation that must be atomic to maintain correctness).
One way to implement efficient locks is to use spin locking. Typically in this approach, each lockable object contains a one-word owner field. When a thread needs to lock an object, it just goes into a loop that repeatedly tests if the object is unlocked (lock=0), and if it is unlocked it attempts to claim the lock by setting the lock field to its own thread identifier (thread).
Spin locking has a number of major advantages: it is simple to implement; it requires only one word of space overhead in the object; and if locks are released quickly it is very efficient. However, spin locking also suffers from some major disadvantages, particularly on a uniprocessor. If locks are not released quickly, or if contention for shared objects is high, then a large amount of computation will be wasted in xe2x80x9cspinningxe2x80x9d. On a uniprocessor, the spin-lock loop is usually modified so that the processor is yielded every time the lock acquisition fails, in order that the thread does not waste an entire time slice in spinning while other threads are waiting to run.
With spin-locking, the queues for the objects being locked are essentially encoded in the thread scheduler. When there is not much locking, this works very well. When locking is frequent and/or contention is high, then on a uniprocessor a great deal of time is wasted in scheduling threads which immediately yield again because they still can not acquire the desired lock. On a multiprocessor, a lot of excess traffic to main memory is generated by spin-locking, and this also degrades performance. A good summary and investigation of the multiprocessor performance issues is The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors, by T. E. Anderson, IEEE Transactions on Parallel and Distributed Systems, volume 1, number 1, January 1990.
The primary alternative to spin-locking is queued locking. When a thread fails to obtain a lock on an object, it places itself on a queue of threads waiting for that object, and then suspends itself. When the thread that owns the lock releases the lock, it checks if any threads are enqueued on the object. If so, it removes the first thread from the queue, locks the object on behalf of the waiting thread, and resumes the waiting thread.
Unlike spin-locking, queued locking is fair. However, unless a two-level locking scheme is employed (as described below), the time needed to acquire an uncontested queued lock is more expensive than for a spin lock. Performance is especially poor if objects are locked often, typically for short periods of time, and with contention. In this case the overhead of enqueuing and suspending becomes a factor. However, when objects are locked for longer periods of time and contention is low, queued locking is generally more efficient than spin-locking.
A problem with queued locking has to do with the management of the queues. The queues for a shared object are themselves shared objects (even while the object is locked). Therefore, some sort of mechanism is required to assure mutual exclusion on the object queues. Furthermore, there is a race condition inherent in the lock release policy: one thread may attempt to enqueue for the object at the same time that the owning thread is releasing the lock. The simplest way to solve both of these problems is to use a global spin-lock to guard the short critical sections for lock acquisition, release, and enqueuing. Every object now contains not only a lock field but also a queue field. Unfortunately, locking an unlocked object (the most common case) has now become significantly slower and more complex. There is also a global lock for which there could be significant contention as the number of threads increases (that is, the solution does not scale).
Java implementations have generally adopted a queued locking model based on the concept of monitors which can be associated with objects. A monitor can be used for example to exclusively lock a piece of code in an object associated with that monitor, so that only the thread that holds the lock for that object can run that piece of codexe2x80x94other threads will queue waiting for the lock to become free. The monitor can be used to control access to an object representing either a critical section of code or a resource.
Locking in Java is always at the object-level and is achieved by applying a xe2x80x9csynchronizedxe2x80x9d statement to those code segments that must run atomically. The statement can be applied either to a whole method, or to a particular block of code within a method. In the former case, when a thread in a first object invokes a synchronised method in a second object, then the thread obtains a lock on that second object. This lock covers all the methods in that second object (this may or may not be desirable depending on the application). The alternative of including a synchronised block of code within the method allows the lock to be held by taking ownership of the lock of an arbitrary object, which is specified in the synchronised command. If such an arbitrary object is used for the lock, this lock does not prevent execution of any other methods in the second object.
One consequence of the latter approach is that any Java object may be specified for locking, not just those involving synchronised code. This is important because it means that the space available in objects for implementing locking is tightly constricted; otherwise much space is wasted supporting locking in many very small objects that in fact are never likely to be locked.
The monitor structure in Java can also be used as a communication mechanism between separate threads of execution. This is achieved by a first thread including a xe2x80x9cwaitxe2x80x9d command within synchronised code. This suspends execution of this first thread, and effectively allows another thread to obtain the lock controlling access to this synchronised code. Corresponding to the xe2x80x9cwaitxe2x80x9d command is a xe2x80x9cnotifyxe2x80x9d command in synchronised code controlled by the same object lock. On execution of this xe2x80x9cnotifyxe2x80x9d command by a second thread, the first thread is resumed, although it will have to wait for access to the lock until this is released by the second thread. Thus when used for this purpose a thread may wait on an object (or event) and another thread can notify the waiter.
The usage of monitors in the execution of Java code is extensive. As noted above monitors are used frequently as programming constructs in user code, and Java class libraries included with JVMs make heavy use of monitors. In addition, the core JVM, which itself is partially written in Java, exploits the underlying locking structure to protect critical resources. Consequently, even single threaded applications can heavily utilise monitors, since the underlying Java libraries and core virtual machine are written to multi-thread.
Given the extensive use of the monitor structure in Java, the performance of the implementation is crucial to overall Java system performance. In this case performance is measured by the amount of time an acquire and release of a Java monitor consumes. The most important situation for performance, since it is by far the most common, is when a monitor is acquired without contention. That is, a thread successfully requests the monitor without blocking or waiting for another thread to release it.
Much work has been done on the design of Java object monitors. U.S. Pat. No. 5,797,004 describes the provision of a cache (pool) of synchronisation constructs (i.e. monitors). WO 98/44401 describes a way of locating a synchronisation structure associated with an object by searching first a local table and then a global table. EP-A 783150 describes providing local and global locking structures, the former being used for frequently locked objects to improve speed. U.S. Pat. No. 5,706,515 describes the provision of wait/notify by monitors. EP-A-840215 describes a system where a stack is used for synchronisation. A thread writes an object header into a stack, and can also update the stack with a reference indicator to the relevant stack. Access via the stack can be quicker than via standard monitors. Somewhat similarly, EP-A-955584 describes a synchronisation method for Java in which shared object identifications are put on a stack by a thread to indicate ownership. Canadian patent application 2222389 describes the use of monitor transition vectors to control the locking of objects by threads. WO98/33119 teaches the use of a register to store pairs of lock information, each pair including an identifier of a lock for an object, and also a count for that lock.
Of greater relevance to the present invention is U.S. patent application Ser. No. 08/937311 filed 17 Jul. 97 (IBM docket number YOR9-1997-0225), and in particular the related publication xe2x80x9cThin Locks: Featherweight Synchronisation for Javaxe2x80x9d by Bacon, Konuru, Murthy, and Serrano, SIGPLAN ""98, Montreal Canada, p258-268. These documents teach the concept of a light-weight xe2x80x9cflatxe2x80x9d (or thin) monitor that can be incorporated into a single word within the header space available in a conventional Java object. The flat monitor can be used for the most common, simple case whereby a lock is obtained by a thread without contention from other threads, so no waiting/queuing is required. Essentially the flat monitor word includes three fields; firstly a single mode bit to determine whether the monitor is operating as a thin monitor, or as a conventional xe2x80x9cfatxe2x80x9d monitor (see below); secondly, a unique thread-id which identifies the owning thread (or zero if it is unowned); and finally a field to indicate how many times the same thread has currently locked the object. Thus this design accommodates recursive locking by the same thread up to a certain limit (based on the size of the third field). The single word of the flat monitor can then be tested and updated by the use of simple instructions that guarantee consistency and atomic updates (such as Compare and Swap on S/390).
If a thread tries to obtain a lock that is already held or if the monitor is required for a xe2x80x9cwait/notifyxe2x80x9d call then the flat monitor cannot be used, since it does not contain enough complexity to handle queues of waiting threads. At this point a heavyweight xe2x80x9cfatxe2x80x9d monitor is assigned that can handle all these tasks. The flat monitor is then converted by changing the mode bit of the first field to indicate that a fat monitor is now being used, and by combining the second and third fields to provide a pointer to the associated heavyweight monitor. This heavyweight monitor is known as an xe2x80x9cinflatedxe2x80x9d monitor and the switch to using the inflated monitor instead of the flat monitor is known as xe2x80x9cinflationxe2x80x9d.
The process of inflation occurs as follows. A first thread tries to access an object, and discovers that it is currently owned by a second thread. This contention is the trigger for inflation. However, the first thread cannot initiate the inflation because it does not own the object. Rather, it has to enter into a spin-locking or spin-blocking loop until the lock is released by the second thread. The contending thread may then acquire the thin lock, and only at this point can the fat lock be created, with the mode bit and the monitor pointer being set appropriately. One consequence of this bi-modal use of the lock word is that any access to an object lock first tests the mode bit to determine whether a flat monitor or a fat monitor is being used; based on this information it can then proceed to interpret the rest of the lock word appropriately. (It will be appreciated that all this extra functionality for locking is performed by the underlying JVM transparently to the application).
Further details on the use of these bi-modal monitors in the IBM JVM implementation can also be found in the papers: xe2x80x9cJava Server Performance: A Case Study of Building Efficient, Scalable JVMsxe2x80x9d, by Dimpsey, Arora, and Kuiper, and xe2x80x9cThe evolution of a high-performing Java virtual machinexe2x80x9d, by Gu, Burns, Collins, and Wong, both in the IBM Systems Journal, Vol 39/1, January 2000.
One unfortunate aspect of the above model is that once inflation has occurred it persists for the life of the object, even after all locks on it have been removed. Consequently, the code paths for any lock/unlock are greatly increased, even for the simple case whereby a thread obtains an unowned monitor, since now the heavyweight monitor mechanism must be used. Therefore, the high performance case introduced through flat locks occurs only if the monitor is in the flat mode (i.e. has never been inflated). Once inflation has occurred, perhaps as a result of one short burst of initial activity, the lower performing monitor implementation must be employed. This is particularly disadvantageous in a server environment, where a JVM may run for a long time, and there is the possibility that more and more objects will go into the inflated state.
This problem is addressed in a paper: xe2x80x9cA Study of Locking Objects with Bimodal Fieldsxe2x80x9d by Onodera and Kawachiya, OOPSLA""99 Conference Proceedings, p223-237, Denver Colo., USA, November 99. At a high-level, this proposes xe2x80x9cdeflationxe2x80x9d, which is the transition from a fat monitor back to a flat or thin monitor, in association with so-called xe2x80x9cTasuki Monitorsxe2x80x9d. These are based on the bi-modal monitors described above, but with certain modifications. In particular, Tasuki monitors avoid spin-waiting during inflation by means of an extra bit, termed an xe2x80x9cflcxe2x80x9d bit, which is stored in the object header (in a different word from that used to store the lock information). This implementation also assumes that the fat monitor for locking any given object can be directly determined from the object id by means of a hash table or similar.
In Tasuki monitors inflation is again triggered when a thread fails to get access to an object, but the process is somewhat different from that described in the above-mentioned paper by Bacon et al. Thus when a first thread fails to obtain direct access to an object, in other words if the object is either (i) already inflated, or (ii) not inflated but owned by some second thread, it first determines the (fat) monitor identity corresponding to the object, and then tries to enter that monitor. If the object is already inflated as determined from the mode bit (ie the former possibility above), access to the object will now be controlled in accordance with standard monitor operation. On the other hand, if the object has not been inflated, the first thread sets the flc bit and now goes into a wait state on the (fat) monitor. When the second thread subsequently finishes with the object, then as part of the standard lock release process, it tests the flc bit, and finds that it has been set. This triggers the second thread to enter the fat monitor for the object, and wake up the first thread. This can now inflate the object, by clearing the flc bit, changing the mode bit, and writing the monitor pointer into the object header. Thus Tasuki monitors use wait/notify internally to control monitor inflation; the paper by Onodera demonstrates that this does not conflict with the (external) use of the same monitor to perform standard Java wait/notify synchronisations between threads.
Of particular relevance to an understanding of the present invention is that Tasuki monitors also support the concept of deflation. Thus when a thread exits an object that is inflated (as indicated by the mode bit), it will, subject to certain conditions described in more detail below, return the object from using a fat monitor to a thin monitor. This is accomplished by simply changing the mode bit in the object header and writing a null value into the owning thread identifier to replace the monitor pointer. Note that the fat monitor itself does not need to be altered.
As described in the Onodera et al paper, for deflation it is first necessary that the lock is being freed and that no objects are waiting for a notify from the monitor. This is effectively represented by the code of line 34 in FIG. 5 of Onodera. It is further suggested that additional criteria can be employed for determining whether to deflate: xe2x80x9cAs long as the necessary condition is satisfied, tasuki lock allows selective deflation . . . . For instances we can deflate lock words on the basis of dynamic or static profiling informationxe2x80x9d (see section 3).
In order to determine when best to deflate, Onodera divides objects that are inflated into two groups. The first (the xe2x80x9cnowaitxe2x80x9d group) is only involved in mutual exclusion by monitor enter and exit (i.e. standard synchronisation), whereas the second (the xe2x80x9cwaitxe2x80x9d group) includes application wait/notify operations. Onodera only performs deflation on objects in the xe2x80x9cnowaitxe2x80x9d group. This is accomplished by simply adding a wait-counter to an object monitor, the counter being incremented whenever the wait_object function is called. Thus deflation is not performed on any object for which the wait-counter is non-zero.
The rationale for this approach is the recognition that rapidly repeated inflation/deflation cycles will have an adverse effect on machine performance (xe2x80x9cthrashingxe2x80x9d). Thus an object should be left inflated if deflation would be followed by imminent re-inflation, in other words if the intervals between periods of contention for the object are short (a property termed xe2x80x9clocality of contentionxe2x80x9d). Onodera presents experimental data to indicate that this condition is met by objects in the wait group, but not by objects in the nowait group. Consequently, it is only objects in the latter group (as indicated by a zero wait-counter) that are deflated at the end of a period of contention. Further experimental data is adduced to show that thrashing is not a problem in such an implementation.
Nevertheless, the results in Onodera are derived from relatively simple test cases and may not scale to more realistic and complex scenarios. Thus a more comprehensive approach to the treatment of bi-modal locks remains to be developed.
Accordingly, the present invention provides a method of operating an object-based multi-threaded computing system having a cyclic garbage collection strategy and including an object locking system having (i) a first mode in which access by a single thread without contention to an object is controlled by a monitor internal to said object, and (ii) a second mode in which access by multiple threads with contention to said object is controlled by a monitor external to said object, and wherein for any given object a transition from the first mode to the second mode is termed inflation, and a transition from the second mode to the first mode is termed deflation, said method including the steps of:
entering a period of contention for an object in said first mode;
inflating said object to said second mode;
updating an inflation rate value for said object;
ending a period of contention for said object;
comparing the inflation rate value against a first predetermined value;
deflating or not deflating said object, based on the result of the comparison; and
resetting the inflation rate value at the next garbage collection cycle of the system.
In a preferred embodiment the inflation rate value, which is stored in the monitor external to said object, represents the number of inflation/deflation cycles between two successive garbage collection cycles. The updating step increments the inflation rate value for each inflation, except where there is no internal use of the wait/notify construct, since inflation will then occur quickly with relatively minor performance degradation. This exception can occur for two reasons: (i) because inflation results from an application wait/notify operation being performed on the object, and (ii) (more unusually) because the thread that had originally locked the object to cause contention releases it very quickly (before the internal wait/notify is set up). It is preferred that the inflation rate value is not incremented in either case (although the existence of the second case may be implementation dependent).
Also in the preferred embodiment the step of resetting the inflation rate counter is not performed if the inflation rate value is greater than a second predetermined value, except that on a periodic basis (for example, every K garbage collection cycles), the inflation rate value is reset irrespective of whether or not the inflation rate value is greater than the second predetermined value. The second predetermined value is typically set equal to said first predetermined value.
The invention further provides a computer program product, comprising computer program instructions typically recorded onto a storage medium or transmitted over a network, for implementing the above methods.
The invention further provides an object-based multi-threaded computing system having a cyclic garbage collection strategy and including an object locking system having (i) a first mode in which access by a single thread without contention to an object is controlled by a monitor internal to said object, and (ii) a second mode in which access by multiple threads with contention to said object is controlled by a monitor external to said object, and wherein for any given object a transition from the first mode to the second mode is termed inflation, and a transition from the second mode to the first mode is termed deflation, said computer system including:
means for entering a period of contention for an object in said first mode;
means for inflating said object to said second mode;
means for updating an inflation rate value for said object;
means for ending a period of contention for said object;
means for comparing the inflation rate value against a first predetermined value;
means for deflating or not deflating said object, based on the result of the comparison; and
means for resetting the inflation rate value at the next garbage collection cycle of the system.
The invention further provides a computing system having:
a plurality of objects, each object including an internal monitor to control access to said object in a first mode of object locking;
a plurality of monitors, each monitor controlling access to a corresponding one of said plurality of objects in a second mode of object locking, and including an inflation counter which is incremented by a transition from said first mode of object locking to said second mode of object locking; and
a garbage collector including means for resetting said inflation counter.
The invention addresses the problem that deflation in the prior art is controlled very crudely. Thus in the Onodera et paper, all objects in the xe2x80x9cnowaitxe2x80x9d group are deflated at the end of contention, but none of the other objects. For the best results object monitors should not be left inflated unnecessarily, but yet should not be deflated too eagerly, only to be inflated again a short time later.
As taught herein, the past history of the monitor can be leveraged in the decision as to whether the state of the monitor should be changed. In this way, a self-tuning of the transition is made for optimal performance. In a preferred embodiment a counter is kept in the inflated monitor structure and this counter is incremented on each of the slow inflation transitions. At the point of deflation the total number of expensive inflations is then known. This information is then used to decide whether the deflation should or should not be performed. Note however, that by itself this does not lead to an ideal solution because given a long enough running time most monitors will accrue a large number of the expensive inflations. Therefore the preferred embodiment utilises a rate instead of a raw number. This rate is construed not in the normal sense of operations per time but rather as operations per memory-usage. The memory usage rate is easier to implement than a time rate, since garbage collections are generally required after a given amount of memory usage and at this point the monitor structures are traversed. It is simple to clear the transition counters at this point. If a counter is above a given threshold then the counter is not cleared. Consequently, monitors that have encountered more expensive inflations since the last garbage collection continue to be flagged with their expensive inflation count.
When the decision point for deflation is approached, the count in the inflated monitor structure is consulted. If the count is too large then deflation is not performed and the expensive operation (inflation) is avoided in the future. A simplified view would be to allow a monitor to exist in a state which allows transitions and another state which does not allow transitions. Once a monitor was moved to the state which did not allow transitions then it would stay in that state. However, this is too rigid a solution for most situations. In the preferred embodiment, the counters kept in the inflated monitor are cleared after a certain number of garbage collections even if the rate of inflation/deflation is above the threshold.
We can generalise the above to consider a mechanism to identify the ideal transition point between multi-state monitors or synchronization devices that may exist in multiple states where each of the states has different performance considerations. Denoting the states S1, S2, S3, ... SN, with state S1 having the best performance and state N having the worst performance, transitions are made between states next to each other, i.e. S(n) to S(n+1) and back. In the preferred embodiment, there is the flat lock (S1) and the inflated lock (S2). The general solution represents keeping an array of counters in the synchronization structure to count the number of transitions between adjacent states. When a threshold is surpassed the transition is no longer made. The rate stored in the counters may involve memory usage or some other easy to acquire metric.