1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for relocating running applications to topologically remotely located computing systems.
2. Description of Related Art
High availability and disaster recovery are increasingly more important in the information technology industry as today's society relies more heavily on electronic systems to perform daily activities. In this vein, it is becoming more important to be able to transfer a running application from one server computing device to another so as to ensure that the running application is available if a server computing system fails. Moreover, it is important to be able to relocate running applications in the event of a failure of a server computing system so that the running application may be recovered on a different computing system.
High availability of applications may be achieved by providing clustered server/storage environments that protect against server and storage failures. In such environments, when a failure occurs, the application is restarted on a redundant server and/or storage system in the clustered server/storage environment with only a small time in which the application is made unavailable.
In some clustered server/storage environments, a hot standby server/storage system is provided and a log shipping technique is used. With the log shipping technique, a log of the application state is maintained by a production server and is shipped to the hot standby server in order to keep the application state of the standby server close to the current state of the application on the production server. If a failover to the standby server is required, only updates since the last log update was shipped to the standby server will be lost.
It should be noted that such server clusters or storage system clusters are topologically and geographically limited such that the devices that make up the cluster must be in relative close proximity to one another. The clustered server/storage environments do not provide any application independent mechanism for providing availability and disaster recovery at remote network topological and/or geographic distances. Moreover, the clustered server/storage environments do not provide any such availability and recovery mechanism that has zero data loss including no loss of in-flight transactions.
One known solution for relocating running applications in a storage area network (SAN) is provided by the VMotion™ software available from VMWare (an evaluation copy of VMotion™ is available from www.vmware.com/products/vc/vmotion.html). The VMotion™ software allows users to move live, running virtual machines from one physical server computing system to another physical server computing system connected to the same SAN while maintaining continuous service availability. The VMotion™ software is able to perform such relocation because of the virtualization of the disks in the SAN.
However, VMotion™ is limited in that it requires that the entire virtual machine, which may comprise the operating system and a plurality of running applications, be moved to the new physical server computing device. There is no ability in the VMotion™ software to be able to move individual applications from one physical server computing device to another.
Moreover, VMotion™ is limited in that the movement of virtual machines can only be performed from one server computing device to another in the same SAN. Thus, VMotion™ cannot be used to move virtual machines to other server computing devices that are outside the SAN. This, in essence, places a network topology and geographical limitation on the server computing devices to which virtual machines may be moved using the VMotion™ software product.
Another known solution for providing high availability and disaster recovery of running applications is the MetaCluster™ UC 3.0 software product available from Meiosys, Inc., which has been recently acquired by International Business Machines, Inc. As described in the article “Meiosys Releases MetaCluster UC Version 3.0,” available from PR Newswire at www.prnewswire.com, the MetaCluster™ software product is built upon a Service Oriented Architecture and embodies the latest generation of fine-grained virtualization technologies to enable dynamic data centers to provide preservation of service levels and infrastructure optimization on an application-agnostic basis under all load conditions.
Unlike coarse-grained virtual machine technologies and virtual machine mobility technologies, such as VMotion™ described above, which run at the operating system level and can only move an entire virtual machine at one time, the MetaCluster™ software product runs in a middleware layer between the operating system and the applications. MetaCluster™ provides a container technology which surrounds each application, delivering both resource isolation and machine-to-machine mobility for applications and application processes.
The MetaCluster™ software product's application virtualization and container technology enables relocation of applications both across physical and virtual machines. MetaCluster™ also provides substantial business intelligence which enables enterprises to set thresholds and define rules for managing the relocation of applications and application processes from machine to machine, both to address high availability and utilization business cases.
Deploying MetaCluster™ UC 3.0 for business critical applications allows applications to be virtualized very efficiently so that the performance impact is unnoticeable (typically under 1%). Virtualized applications may then be moved to the infrastructure best suited from a resource optimization and quality of service standpoint. Server capacity can be reassigned dynamically to achieve high levels of utilization without compromising performance. Since MetaCluster™ UC 3.0 enables the state and context of the application to be preserved during relocation, the relocation is both fast and transparent to the users of the applications.
MetaCluster™ UC 3.0 uses a transparent “checkpoint and restart” functionality for performing such relocation of applications within server clusters. When generating a checkpoint, the necessary stateful data and metadata for recreating the full state, connections and context of the running application are preserved for a particular point in time. This checkpoint may then be provided to another server computing device in the same cluster as the original server computing device. The server computing device to which the checkpoint is provided may then use the checkpoint information to restart the application, using application data available from a shared storage system of the cluster, and recreate the state, connections, and context of the application on the new server computing device.
In a further product from Meiosys, i.e. MetaCluster™ FT a “Record and Replay” technology is provided in which events that continuously impact the behavior of an application at runtime are recorded to disk in the form of log files and then those events may be replayed in the event of a failure. Thus, the “Record and Replay” technology of MetaCluster™ FT allows recorded events to be replayed on a redundant application instance in the same server cluster in the event of a failure in order to provide failover fault tolerance. Information about the “Record and Replay” aspects of the MetaCluster™ FT (formerly referred to as “Meiosys FT”) software product may be found, for example, in “Meiosys breaks technology barrier for cost-effective fault tolerance technology designed to be embedded in OEM platform solutions” available from Primeur Monthly at www.hoise.com/primeur/05/articles /monthly/AE-PR-05-05-46.htmlat and the presentation materials for the 47th Meeting of IFIP WG10.4 in Puerto Rico available at www2.laas.fr/IFIPWG/Workshops&Meetings/47/WS/04-Rougier.pdf.
While MetaCluster™ UC 3.0 and MetaCluster™ FT allow relocation and failover of individual applications within the same cluster, as opposed to requiring entire virtual machines to be relocated, MetaCluster™ UC and FT are still limited to a localized cluster of server computing devices. That is, MetaCluster™ relies on the ability of all of the server computing devices having access to a shared storage system for accessing application data. Thus, MetaCluster™ UC and FT do not allow movement or relocation of running applications outside of the server cluster or failover of application instances outside of the server cluster. Again this limits the network topology and geographical locations of computing devices to which running applications may be relocated and to which failover may be performed.