1. Field of the Invention
This invention pertains generally to enterprise computer systems, computer networks, embedded computer systems, wireless devices such as cell phones, computer systems, and more particularly to methods, systems and procedures (i.e., programming) for providing high-availability, virtualization and checkpointing services for a group of computer applications.
2. Description of Related Art
Enterprise and wireless systems operating today are subject to continuous program execution that is 24 hours a day and 7 days a week. There is no longer the concept of “overnight” or “planned downtime”. All programs and data must be available at any point during the day and night. Any outages or deteriorated service can result in loss of revenue as customers simply take their business elsewhere, and the enterprise stops to function on a global scale. Traditionally, achieving extremely high degrees of availability has been accomplished with customized applications running on custom hardware, all of which is expensive and proprietary. Furthermore, application services being utilized today are no longer run as single applications or processes; instead, they are built from a collection of individual programs jointly providing the service. Traditionally, no mechanisms have existed for protecting such multi-application services. This problem is compounded by the fact that the individual applications comprising the service are typically provided by different vendors and may get loaded at different times. Furthermore, distributed storage systems contain much of the applications data and may need to be included.
Storage checkpointing operating at the block level of the storage subsystem are well known in the art and widely deployed. Commercial products are available from Symantec/Veritas in the form of “Veritas Storage Foundation”. Similar technologies are available from StorageTek under the Sun Microsystems brand. All of those technologies operate at the level of the storage device. If the storage device gets restored to an earlier checkpoint, all applications on that disk are affected; including applications unrelated to the restore event. The present invention breaks this fundamental constraint, and only checkpoints storage related to individual applications. This means that one application can do a storage checkpoint restore without affecting any other applications on the server.
Two references provide a background for understanding aspects of the current invention. The first reference is U.S. patent application Ser. No. 11/213,678 filed on Aug. 26, 2005, incorporated above in its entirety, which describes how to provide transparent and automatic high availability for applications where all the application processes run on one node. The second reference is U.S. Pat. No. 7,293,200 filed on Aug. 26, 2005 which describes how to transparently provide checkpointing of multi-process applications, where all processes are running on the same node and are launched from one binary. The present invention is related to applications comprised of one or more independent applications, where the independent applications dynamically join and leave the application group over time and where the applications may operate off of files located either locally or on the network.