1. Field of the Invention
This invention relates to improvement in the efficiency of distributed computing systems. More particularly this invention relates to an improved caching technique that reduces the overhead of remote access of distributed objects while preserving coherency, and which permits the local execution of operations on distributed objects.
2. Description of the Related Art
In a distributed computing system caching is a common technique for avoiding overhead for remote accesses. In general, when an object is cached, some sort of coherency protocol is required to ensure that an update on one copy is correctly reflected on all other copies. Often this coherency protocol takes the form of allowing one node to update the object at a time. If objects which are read-only are cached, there are no updates, and thus no need for a coherency protocol. Once an object is cached in a node, operations (method invocations, read and write) on the object can be executed locally. Caching is most effective when used for data that is rarely or never modified.
In many cases objects are mostly or partially read-only in the sense that some subset of the object""s fields are read-only. In other cases, objects cannot be proven to be read-only statically, either because there is code that modifies the fields of the object, or not all of the program code is available for static analysis. Sometimes, the execution of code that modified the fields at run time depends on input data. For some input it may never be executed. According to known prior art, for these objects to be cached, and for operations against these objects to be locally executed, one of the following would have been required: (1) the use of a coherency protocol; or (2) the use of explicit hints provided by the programmer as to whether or not it was safe to cache the object.
A Java virtual machine (JVM) is platform-specific, as regards the operating system and the hardware. It is both an operating system and a program that implements a well-defined, platform independent virtual machine. It is an example of an environment suitable for the practice of the present invention. There are currently implementations of Java virtual machines for a range of platforms from embedded systems up to mainframes.
The Java virtual machine is a stack machine whose semantics are given by a set of bytecodes. Code belongs to methods which, in turn, belong to classes. Java and the Java virtual machine are very flexible, allowing classes to be dynamically created by an application, loaded, and then executed in the same application. When executed, the bytecodes change the state of the stack and can mutate objects allocated in a heap. The Java virtual machine supports multiple concurrent threads.
The basic memory model for the data manipulated by a Java virtual machine consists of stacks and a heap. There is a stack for each thread. Each stack consists of a collection of stack frames, one for each method that was invoked and which has not yet returned, where the frame is divided into three areas; parameters, variables and a conventional push-down operand stack.
Objects are allocated on a garbage collected heap via explicit program requests to create a new object. The request to create a new object, places a reference to the object on the top of the stack, enabling the object to be further manipulated.
In addition to the heap and the stack, the Java virtual machine internally uses system memory for various resources, including metadata related to the program""s classes, the program""s instructions, etc. The metadata associated with a class includes information such as an object representing the class, and information on the class methods, which is maintained in an array of method block structures entries (one for each method), and more. The program""s instructions are the bytecodes that make up its methods.
The Java virtual machine bytecodes are conveniently divided into different groups based upon the type of memory they access. Based upon this division, it is possible to gain an understanding of what is required to ensure the correct semantics of the bytecode in a cluster of Java virtual machines.
A large set of bytecodes only accesses the Java stack frame of a currently executing method. For example, bytecodes corresponding to load and store instructions to and from a stack frame, control flow, and arithmetic operations. It is relatively easy to guarantee a single system image for these bytecodes since the code can be replicated and since a stack frame is accessed by only a single thread.
Another group of bytecodes accesses objects in the heap. For example the bytecodes getfield and putfield access a specific object""s fields. It is this group that is particularly relevant to the present invention when applied to a distributed object system. If two different nodes access the same object, it is essential that they each see the same values, within the constraints of Java""s memory consistency.
The Java virtual machine as a virtual stack machine is powered by an interpreter loop. On each iteration of the loop the next bytecode is executed. The stack is modified as specified by the bytecode, the heap is accessed as appropriate and the program counter is updated. The interpreter loop can be viewed as a giant switch statement, specifying a distinct action for each of the bytecodes.
To enable correct multithreaded operations, Java provides a synchronization mechanism implemented in the Java virtual machine, which allows threads to share and manipulate data correctly. The semantics of the Java memory model are well known, and only a brief description is presented herein.
When a thread executes a synchronized operation it tries to acquire a lock on the specified object. If the lock has already been acquired by another thread, the current thread waits. When the lock is released, one of the waiting threads acquires it and the others remain in a wait state.
A thread may acquire the same lock several times in a row. A thread releases a lock L when the number of unlock operations it performs on the lock L equals the number of lock operations.
The cluster virtual machine for Java is a known implementation of the Java virtual machine, which provides a single system image of a traditional Java virtual machine, while executing in a distributed fashion on the nodes of a cluster. The cluster virtual machine for Java virtualizes the cluster, transparently distributing the objects and threads of any pure Java application. The aim of the cluster virtual machine for Java is to obtain improved scalability for Java server applications by distributing the application""s work among the cluster""s computing resources. While the existence of the cluster is not visible to a Java application running on top of a cluster virtual machine for Java, the cluster virtual machine for Java is cluster-aware. The implementation distributes the objects and threads created by the application among the nodes of the cluster. In addition, when a thread that is placed on one node wishes to use an object that has been placed upon another node, it is the cluster virtual machine for Java implementation that supports this remote access in a manner that is 100% transparent to the application.
The optimizations incorporated in the cluster virtual machine for Java adhere to Java memory semantics. Relevant components of the architecture of the cluster virtual machine for Java are now described. A full description can be found in the document, cJVM: a Single System Image of a JVM on a Cluster, Y. Aridor, M. Factor and A. Teperman. International Conference on Parallel Processing, Sep. 21-24, 1999.
FIG. 1 shows how a cluster virtual machine for java 10 executes a Java application 12 on a cluster 14. The upper half shows the threads 16 and objects 18 of the application 12 as seen by the program. This is the view presented by a traditional Java virtual machine. The lower half shows the distributed objects 20 and distributed threads 22 of the application 12 transparently distributed as to the application 12 across the nodes 24 of the cluster 14 by the operation of the cluster virtual machine for Java 10.
There is a cluster virtual machine for java process on each cluster node 24, where the collection of processes as a whole constitutes the cluster virtual machine for Java 10. Each of the processes implements a Java interpreter loop while executing part of the distributed threads 22 and containing a portion of the distributed objects 20 that were created by the application 12. More specifically on each of the nodes 24 the cluster virtual machine for Java 10 has a pool of server threads waiting for requests from the other nodes of the cluster 14.
The cluster virtual machine for Java distributes the application""s threads using a pluggable load balancing algorithm to determine where to place the newly created thread. The main method is started on an arbitrary node. When the application creates a new thread, the cluster virtual machine for Java determines the best location for it, and sends a request to the selected node to create the thread object. The request is executed by one of the available server threads.
The object model of the cluster virtual machine for Java is composed of master objects and proxies. A master object is the object, as defined by the programmer. The master node for an object is the node where the object""s master copy is located. A proxy is a surrogate for a remote object through which that remote object can be accessed. While a proxy is a fundamental concept used in systems supporting location-transparent access to remote objects, the cluster virtual machine for Java pushes the idea one step further. Smart proxies is a novel mechanism which allows multiple proxy implementations for a given class, while the most efficient implementation can be determined on a per object instance basis. Smart proxies are disclosed more fully in U.S. Pat. No. 6,487,714, entitled xe2x80x9cMechanism for Dynamic Selection of an Object Methodxe2x80x9d, filed May 24, 1999.
Smart proxies were motivated by the fact that different proxy implementations of different instances of the same class can improve performance. For example, consider two array objects with different run-time behavior. The first is a final static array, which after being initialized, all the accesses to its elements are read-only. The second array is public, relatively large and accesses are sparse and involve a mixture of read and write operations. It is clear that for the first array a caching proxy, i.e., a proxy where all the elements of the master array are cached, will boost performance, while the elements of the second array should be accessed remotely.
To maintain the single system image the cluster virtual machine for Java must give the application the illusion that it is executing on a traditional Java virtual machine, hiding any distinction between master and proxy from the application.
This challenge has been met by: 1) implementing proxy objects with the same internal representation, e.g. object header, and method tables, as their master objects and 2) having all the proxy implementations coexist within a single class object.
Specifically, the virtual method table of a class is logically extended into an array of virtual method tables 26, as seen in FIG. 2. In addition to the original table of method code, each of the other tables refers to the code for a particular proxy implementation. All the virtual tables and the code for the proxy implementations are created during class loading. In the base implementation of cluster virtual machine for Java, every class has two virtual tables: one for the original method code and one for a simple proxy implementation. The simple proxy is one where all invocations are transferred to the master copy of the object.
Upon creation of a master object 28 or a proxy 30, its method table pointer points to the correct virtual table of its implementation in the array of virtual method tables 26, which distinguishes it from other proxies as well as from the master object of a proxy. This distinction is only visible from within the implementation of the cluster virtual machine for Java; the application cannot distinguish between the master and the proxies. It should be noted that it is possible to change proxy implementations during run-time. A particular set of implementations may allow representation changes during run-time when certain conditions are met, and disallow them if, in the course of execution, these conditions are no longer true. However, at the level of a mechanism, the cluster virtual machine for Java is designed without any such constraints.
With the simple proxy implementation, when a method is invoked on a proxy, the method is shipped to the node holding the object""s master. This method shipping results in a distributed spaghetti stack 32 as shown in FIG. 3. As part of this remote invocation, the cluster virtual machine for Java is responsible for transferring any parameters and return values. The data transferred may include objects which are passed using a global address, a preferred format for uniquely identifying objects among nodes. When a node receives a global address it has not previously seen, a proxy for the object is created on the fly.
As described in the previous section, there is a set of bytecodes which accesses the heap. Since a distributed heap is provided, these bytecodes must be modified to work correctly. The cluster virtual machine for Java modifies the implementation of the relevant bytecodes (getfield, putfield, etc.) to be cluster aware. For example the base implementation for getfield checks if the target object is a proxy; if true, it retrieves the data from the remote master.
Just as instance objects have masters and proxies, class objects also have masters and proxies. When the cluster virtual machine for Java loads a class, the code and internal data structures are created on all nodes that use the class. However, the application visible data, i.e., static fields, are used on one node only, which is designated as the master for the class. All accesses to static fields of this class are directed to the master class object.
In cluster enabling the Java virtual machine the issue of locking has been addressed. The cluster virtual machine for Java requires that all locks be obtained and released on the master copy of the object being locked.
Since the bytecodes that access the heap are cluster aware as described above, it is not necessary to ship a method invoked on a proxy to the master. The code can be executed locally and each access to the fields of the proxy will be executed remotely. Thus, remote method shipping in the cluster virtual machine for Java can be viewed as an optimization, possibly replacing many remote accesses with one remote invocation and many local ones. However, there are two kinds of methods that must always be executed at the master: synchronized methods and native methods. As mentioned above, locks are always obtained at the master. Thus synchronized methods are always executed at the master. Native methods must always be executed at the master since they may use native state which is not visible to the cluster virtual machine for Java and which cannot be made available at the proxy""s node.
It is clearly desirable to design an efficient proxy implementation while maintaining Java""s semantics.
In some aspects of the present invention a proxy implementation is provided in a distributed computing system, which, with respect to the application, transparently caches individual fields of objects. When applied to Java virtual machines there is a substantial improvement in performance.
The application of some aspects of the invention advantageously provides a proxy implementation in a distributed computing system in which object fields are speculatively identified as candidates for caching.
Furthermore, some aspects of the present invention provide for optimal local execution of methods that access cached fields in a distributed computing system.
Some aspects of the present invention provide for optimal local caching of object fields in a distributed computing system by appropriate invalidation of cached fields throughout nodes of the system.
This disclosure introduces the concept of field-level caching in distributed object-oriented systems, in which a speculative approach is taken to identify opportunities for caching. Speculative approaches have been discovered to be particularly suitable for exploitation of opportunities for caching. Invalidation protocols, which are fully compliant with the Java memory model, are provided to recover from incorrect speculation, while incurring only a low overhead. In some embodiments update protocols may also be used, alone, or in combination with invalidation protocols. The technique has been implemented on a cluster of machines, and has been found to be readily scalable with multithreaded applications. Field caching, optionally combined with other optimizations produces a practically important performance step up in distributed environments, such as the cluster virtual machine for Java, which transparently distributes an application""s threads and objects among the nodes of a cluster.
According to some aspects of the invention speculation is used to cache only those fields which are xe2x80x9cread-only in practicexe2x80x9d or xe2x80x9cmostly-read-only in practicexe2x80x9d, as these terms are defined hereinbelow. An invalidation protocol is used at the level of the class in the event of an incorrect speculation. The mechanism has been realized in the cluster virtual machine for Java. The caching technique is an essential component in obtaining scalability, and in the context of the cluster virtual machine for Java, efficiency levels in excess of 85% efficiency have been obtained for applications which are cluster-unaware using the caching technique in conjunction with other optimizations.
The invention provides a method of distributed computing, comprising the steps of executing threads of an application in a plurality of interconnected nodes in a network, allocating memory of the nodes to data objects, responsive to the memory allocation for one of the data objects, applying a predefined set of criteria to individual fields of the one data object, selecting read-locally fields from the individual fields according to the predefined set of criteria, and caching the read-locally fields in a cache of at least one of the nodes. Performance of the caching is transparent to the application. The method further includes fetching at least one of the cached instances of the read-locally fields from the cache during execution of one of the threads by a proxy that is associated with the cache.
According to an aspect of the invention, the step of selecting is performed by initializing the individual fields, and speculatively applying the predefined set of criteria prior to the caching and fetching.
According to a further aspect of the invention, the predefined set of criteria includes field encapsulation in a code of the application or a library code used by the application.
According to a further aspect of the invention, the predefined set of criteria includes a programmer-provided indication.
According to yet another aspect of the invention, a candidate is selected from the individual fields according to a subset of the predefined set of criteria.
An aspect of the invention includes mutating one of the cached instances in one of the nodes, and responsive to the mutation, invalidating all of the cached instances of the one cached field.
In an additional aspect of the invention, the method includes, following the step of invalidating, modifying one of the individual fields, the individual field corresponding to a cached field in a master node, notifying the nodes of the modification, referencing the invalidated cache field in a referencing node, and thereafter transmitting the modified individual field from the master node to the referencing node.
Still another aspect of the invention includes identifying a method of the application that accesses read-locally fields of the data objects to define a locally executable method, executing the locally executable method on one of the nodes, wherein the read-locally fields that are accessed by the locally executable method are fetched from the cache of the individual node.
An additional aspect of the invention includes mutating one of the read-locally fields that is accessed by the locally executable method, and responsive to the step of mutating, invalidating all the cached instances of the one read-locally field, and invalidating the locally executable method, wherein the invalidated method subsequently executes on the master node of the object involved.
According to another aspect of the invention, the data objects comprise a class that has objects allocated in one of the nodes, and the method further includes mutating one of the read-locally fields in one of the objects of the class, and, responsive to the step of mutating, invalidating all of the read-locally fields of all of the objects of the class in the individual node.
According to a further aspect of the Invention, the data objects comprise a class that has objects allocated in one of the nodes, and the method includes the steps of mutating one of the read-locally fields in one of the objects of the class, and, responsive to the step of mutating, invalidating the one read-locally field in all of the objects of the class in the one node.
According to yet another aspect of the invention, execution of the threads of the application is performed using a Java virtual machine. The Java virtual machine may be a cluster virtual machine for Java.
The invention provides a computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to perform the steps of executing threads of an application on a plurality of interconnected nodes in a network, allocating memory of the nodes to data objects, responsive to the step of allocating memory for one of the data objects, applying a predefined set of criteria to individual fields of the one data object, selecting read-locally fields from the individual fields according to the predefined set of criteria, caching the read-locally fields in a cache of at least one of the nodes to define cached instances of the read-locally fields, wherein performance of the step of caching is transparent to the application, and fetching at least one of the cached instances of the read-locally fields from the cache during execution of one of the threads by a proxy that is associated with the cache. The invention provides a distributed computing system, comprising a plurality of processing units interconnected in a network, a runtime support program installed in at least one of the processing units and directing the processing units, wherein the processing units execute threads of an application, and responsive to program instructions of the application, the runtime support program transparently causes the processing units to execute the steps of allocating a portion of a memory to a data object, responsive to the step of allocating, applying a predefined set of criteria to individual fields of the data object, selecting read-locally fields from the individual fields according to the predefined set of criteria, caching the read-locally fields in a cache of at least one of the processing units to define cached instances of the read-locally fields, and fetching at least one of the cached instances of the read-locally fields from the cache during execution of one of the threads by the one processing unit.