In the Java programming environment (Java is a trademark of Sun Microsystems Inc.), programs are generally run on a virtual machine, rather than directly on hardware. Thus a Java program is typically compiled into byte-code form, and then interpreted by the Java virtual machine (VM) into hardware commands for the platform on which the Java VM is executing. The Java environment is further described in many books, for example “Exploring Java” by Niemeyer and Peck, O'Reilly & Associates, 1996, USA, “Java Virtual Machine”, by Meyer and Downing, O'Reilly & Associates, 1997, USA, and “The Java Virtual Machine Specification” by Lindholm and Yellin, Addison-Wedley, 1997, USA.
Java is an object-oriented language. Thus a Java program is formed from a set of class files having methods that represent sequences of instructions. One Java object can call a method in another Java object. A hierarchy of classes can be defined, with each class inheriting properties (including methods) from those classes which are above it in the hierarchy. For any given class in the hierarchy, its descendants (i.e. below it) are called subclasses, whilst its ancestors (i.e. above it) are called superclasses. At run-time classes are loaded into the Java VM by one or more class loaders, which are themselves organised into a hierarchy. Objects can then be created as instantiations of these class files, and indeed the class files themselves are effectively loaded as objects.
The Java VM includes a heap, which is a memory structure used to store these objects. Once a program has finished with an object stored on the heap, the object can be deleted to free up space for other objects. In the Java environment, this deletion is performed automatically by a system garbage collector (GC). This scans the heap for objects which are no longer referenced, and hence are available for deletion. Note that the precise form of GC is not prescribed by the Java VM specification, and many different implementations are possible.
One limitation of the standard Java VM architecture is that it is generally designed to run only a single application (although this may be multithreaded). In a server environment used for database transactions and such-like, each transaction is typically performed as a separate application, rather than as different threads within an application. This is to ensure that every transaction starts with the Java VM in a clean state. In other words, a new Java VM is started for each transaction. Unfortunately however this results in an initial delay in running the application due to the overhead of having to start up (and then stop) a fresh Java VM for each new application. This can seriously degrade the scalability of Java server solutions.
Various attempts have been made to mitigate this problem. EP-962860-A describes a process whereby one Java VM can fork into a parent and a child process, this being quicker than setting up a fresh Java VM. The ability to run multiple processes in a Java-like system, thereby reducing overhead per application, is described in “Processes in KaffeOS: Isolation, Resource Management, and Sharing in Java” by G Back, W Hsieh, and J Lepreau (see:/flux/papers/kaffeos-osdi00/main.html at http://www.cs.utah.edu). Another approach is described in “Oracle JServer Scalability and Performance” by Jeremy Litzt, July 1999 (see:
/database/documents/jserver_scalability_and_performance_t wp.pdf at http://www.oracle.com). The JServer product available from Oracle Corporation, USA, supports the concept of multiple sessions (a session effectively representing a transaction or application). Resources such as read-only bytecode information are shared between the various sessions, but each individual session appears to its client to be a dedicated conventional Java VM. Somewhat similarly, WO 00/52572 describes a mechanism for allowing Java classes to be shared between many Java VMs by using a shared memory pool for storing classes, and an associated Java layer class manager.
U.S. patent application Ser. No. 09/304160, filed 30 Apr. 1999 (“A long Running Reusable Extendible Virtual Machine”), assigned to IBM Corporation (IBM docket YOR9-1999-0170), discloses a virtual machine having two types of heap, a private heap and a shared heap. The former is intended primarily for storing application classes, whilst the latter is intended primarily for storing system classes and, as its name implies, is accessible to multiple VMs. The idea is that as each new VM is launched, it can access system classes already in the shared heap, without having to reload them, relink them, and so on, thereby saving significantly on start-up time. The shared memory can also be used for storing application classes that will be used by multiple VMs, with the private heap then being used for object instances specific to a particular application running on a VM.
A related idea is described in “Building a Java virtual machine for server applications: the JVM on OS/390” by Dillenberger et al, IBM Systems Journal, Vol 39/1, January 2000. This describes two types of Java VM, a resource-owning Java VM which loads and resolves necessary system classes, and subsequent “worker” Java VMs which can reuse the resolved classes. Again this implementation uses a shared heap to share system and potentially application classes for reuse by multiple workers, with each worker Java VM also maintaining a private or local heap to store data private to that particular Java VM process. A similar approach is described in U.S. patent application Ser. No. 09/584151, filed 31 May 2000, entitled “CLASS SHARING BETWEEN MULTIPLE VIRTUAL MACHINES” and assigned to IBM Corporation.
The idea behind such systems which run multiple Java VMs in parallel, is that a class can be loaded into a single VM, and then accessed and utilised by multiple other VMs, thereby saving memory and start-up time.
The adoption of multiple, parallel Java VMs with shared objects does introduce the problem of how to control potentially conflicting access by different VMs to the same object. This is an extension of a problem that already exists on a single VM, since as previously mentioned, the Java language supports multiple threads which can run concurrently. It is important to be able to control access to resources shared by different threads, in order to avoid potential conflict as regards the usage of a particular resource by the various threads.
In conventional (single system) Java VMs, locking is generally implemented by monitors which can be associated with objects (i.e. the locking is performed at the object level). A monitor can be used for example to exclusively lock a piece of code in an object associated with that monitor, so that only the thread that holds the lock for that object can run that piece of code—other threads will queue waiting for the lock to become free. The monitor can be used to control access to an object representing either a critical section of code or a resource.
Note that creating multithreaded applications in Java is relatively easy, partly because there is no need at the application level to specifically code lock and unlock operations. Rather, resource control is achieved by applying a “synchronized” statement to those code segments that must run atomically. The statement can be applied either to a whole method, or to a particular block of code within a method. In the former case, when a thread in a first object invokes a synchronised method in a second object, then the thread obtains a lock on that second object. This lock covers all the synchronised methods in that second object (this may or may not be desirable depending on the application). The alternative approach specifies a synchronised block of code within a method. This allows the lock to be held via an arbitrary object, as identified in the synchronised command. If the synchronised block of code is contained in a first object, and a second (arbitrary) object is used for the lock, then this lock does not prevent execution of other methods in the first object, or indeed other code in the method containing the synchronised block (outside the synchronised block itself).
One consequence of permitting synchronised blocks of code is that any Java object may be specified for locking, not just those involving synchronised code. This is important because it means that every object must include support for being locked. However, it is desirable to use as small a proportion of an object for locking as possible. This reflects the fact that many very small objects in practice are never likely to be locked, and so it is desirable to minimise the amount of space which would otherwise effectively be wasted for supporting locking in such objects.
Synchronised code in Java can also be used as a communication mechanism between separate threads of execution. This is achieved by a first thread including a “wait” command within synchronised code. This suspends execution of this first thread, and effectively allows another thread to obtain the lock controlling access to this synchronised code. Corresponding to the “wait” command is a “notify” command in synchronised code controlled by the same object lock. On execution of this “notify” command by a second thread, the first thread is resumed, although it cannot reacquire the lock until this is released by the second thread.
The usage of monitors in the execution of Java code is extensive. As noted above monitors are used frequently as programming constructs in user code, and Java class libraries included with the Java VM make heavy use of monitors. In addition, the core Java VM, which itself is partially written in Java, exploits the underlying locking structure to protect critical resources. Consequently, even single threaded applications can heavily utilise monitors, since the underlying Java libraries and core virtual machine are written to multithread.
Given the extensive use of the monitor structure in Java, the performance of the implementation is crucial to overall Java system performance. In this case performance is measured by the amount of time an acquire and release of a Java monitor consumes. The most important situation for performance, since it is by far the most common, is when a monitor is acquired without contention. That is, a thread successfully requests the monitor without blocking or waiting for another thread to release it.
Much work has been done on the design of Java object monitors. Of particular relevance to the present invention is U.S. Pat. No. 6,247,025 and the publication “Thin Locks: Featherweight Synchronisation for Java” by Bacon, Konuru, Murthy, and Serrano, SIGPLAN '98, Montreal Canada, p258–268. These documents teach the concept of a lightweight “flat” (or thin) monitor that can be incorporated into a single word within the header space available in a conventional Java object. The flat monitor can be used for the most common, simple case whereby a lock is obtained by a thread without contention from other threads, so no waiting/queuing is required. Essentially the flat monitor word includes three fields; firstly a single mode bit to determine whether the monitor is operating as a thin monitor, or as a conventional “fat” monitor (see below); secondly, a unique thread-id which identifies the owning thread (or zero if it is unowned); and finally a field to indicate how many times the same thread has currently locked the object. Thus this design accommodates recursive locking by the same thread up to a certain limit (based on the size of the third field). The single word of the flat monitor can then be tested and updated by the use of simple instructions that guarantee consistency and atomic updates (such as Compare and Swap on S/390).
If a thread tries to obtain a lock that is already held or if the monitor is required for a “wait/notify” call then the flat monitor cannot be used, since it does not contain enough complexity to handle queues of waiting threads. At this point a heavyweight “fat” monitor is assigned that can handle all these tasks. The flat monitor is then converted by changing the mode bit of the first field to indicate that a fat monitor is now being used, and by combining the second and third fields to provide a pointer to the associated heavyweight monitor. This heavyweight monitor is known as an “inflated” monitor and the switch to using the inflated monitor instead of the flat monitor is known as “inflation”. (It will be appreciated that all this extra functionality for locking is performed by the underlying Java VM transparently to the application).
The use of fat and thin monitors is illustrated in FIGS. 4A and 4B. These Figures depict an object 400 including a 32-bit word 410 used to store locking information. The last eight bits 414 of word 410 are used to store type information and flags containing information about the object, which are not of direct interest at present. The first bit 411 of word 410 is used to indicate whether or not the monitor is inflated, in other words whether the monitor is in its thin state (shape bit=0) or its fat state (shape bit=1). The former case is illustrated in FIG. 4A, in which the remaining bits in word 410 are divided into two fields. The first of these 412 (15 bits) is used to store the identity of the thread that currently owns the monitor (this is set to zero if there is no current owner of the lock), whilst the second field 413 (8 bits) is used to store a recursion count, that permits nested locking of an object by the same thread.
FIG. 4B illustrates the situation after the monitor has been inflated. Fields 412 and 413 (thread id and recursion count) from the flat state of FIG. 4A have now been combined into a single field 422 (23 bits), which points into a monitor identity (MI) table 430. Also shown in FIG. 4B is a pool 460 of fat or heavyweight monitor structures, each of which is capable of provided full locking support—i.e. maintaining a queue of threads waiting to own its corresponding object, and so on. Index 422 points to a particular entry 431 in the MI table, which in turn references the particular system monitor 461 that is being used to regulate access to object 400.
Further details on the use of these bi-modal monitors in the IBM Java VM implementation can also be found in the papers: “Java Server Performance: A Case Study of Building Efficient, Scalable JVMs”, by Dimpsey, Arora, and Kuiper, and “The evolution of a high-performing Java virtual machine”, by Gu, Burns, Collins, and Wong, both in the IBM Systems Journal, Vol. 39/1, January 2000.
In the above implementations of bi-modal locking, once inflation has occurred it persists for the life of the object, even after all locks on it have been removed. Consequently, the code paths for any lock/unlock are greatly increased, even for the simple case in which a thread obtains an unowned monitor, since now the heavyweight monitor mechanism must be used. Therefore, the high performance case introduced through flat locks occurs only if the monitor is in the flat mode (i.e. has never been inflated). Once inflation has occurred, perhaps as a result of one short burst of initial activity, the lower performing monitor implementation must be employed. This is particularly disadvantageous in a server environment, where a Java VM may run for a long time, and there is the possibility that more and more objects will go into the inflated state.
This problem is addressed in a paper: “A Study of Locking Objects with Bimodal Fields” by Onodera and Kawachiya, OOPSLA '99 Conference Proceedings, p223–237, Denver Colo., USA, November 1999. At a high-level, this proposes “deflation”, which is the transition from a fat monitor back to a flat or thin monitor, in association with so-called “Tasuki Monitors” (these are based on the bi-modal monitors described above). Thus when a thread exits an object that is inflated (as indicated by the mode bit), it will, subject to certain conditions, return the object from using a fat monitor to a thin monitor. This is accomplished by changing the mode bit in the object header and writing a null value into the owning thread identifier to replace the monitor pointer. Note that the fat monitor itself does not need to be altered.
In order to determine when best to deflate, Onodera divides objects that are inflated into two groups. The first (the “nowait” group) is only involved in mutual exclusion by monitor enter and exit (i.e. standard synchronisation), whereas the second (the “wait” group) includes application wait/notify operations. Onodera only performs deflation on objects in the “nowait” group. This is accomplished by simply adding a wait-counter to an object monitor, the counter being incremented whenever the wait_object function is called. Thus deflation is not performed on any object for which the wait-counter is non-zero.
The rationale for this approach is the recognition that rapidly repeated inflation/deflation cycles will have an adverse effect on machine performance (“thrashing”). Thus an object should be left inflated if deflation would be followed by imminent re-inflation, in other words if the intervals between periods of contention for the object are short (a property termed “locality of contention”). Onodera presents experimental data to indicate that this condition is met by objects in the wait group, but not by objects in the nowait group. Consequently, it is only objects in the latter group (as indicated by a zero wait-counter) that are deflated at the end of a period of contention. Further experimental data is adduced to show that thrashing is not a problem in such an implementation.
An enhancement to the Onodera approach is described in U.S. patent application Ser. No. 09/574,137 entitled “MULTIPLE MODE OBJECT LOCKING METHOD AND SYSTEM”, filed 18 May 2000 (IBM docket number GB9-2000-0016). This maintains a counter to indicate the number of times a monitor has been inflated, and monitors which have a high counter value are never deflated, even when there is no longer any contention for them. This is because if they were deflated, the likelihood is that they would have to be reinflated shortly afterwards, and it is more efficient to simply leave them in the inflated state throughout. Note that the counter is reset at each garbage collection, to distinguish objects for which there genuinely is frequent contention from those objects which are simply very long-lived.
This earlier work on locking has been concerned with a single Java VM, but as previously discussed, some form of locking must also be provided in a shared Java VM environment. In the aforementioned WO 00/52572, objects in the shared memory pool are locked so that they can only be accessed by a single Java VM at a time. Note that the implementation therein is focussed on reducing memory requirements. However, this approach does not map well to a server environment, since the effect is for one Java VM to be able to suspend the other Java VMs. This leads to serious scalability problems in terms of performance, as one Java VM holds up the others, and also reliability (what happens if the Java VM holding a lock crashes, or if there is a deadlock between different Java VMs).
The desired behaviour is for synchronization on a shared object from a particular Java VM to only impact threads running on that particular Java VM. In addition, it is desirable for such synchronization to be achieved without sacrificing the very significant performance benefits obtained through advanced bimodal locking strategies on a single Java VM (since otherwise the whole rationale for sharing objects across multiple Java VMs could be undermined).