1. Field of the Invention
The present invention relates to techniques for managing processes in computer systems. More specifically, the present invention relates to a method and apparatus for migrating a running process from a source computer system to a target computer system.
2. Related Art
When a computer system within a high-availability computing cluster fails, a redundant computer system can be used to replace the failed computer system. This process of replacing a failed computer system with a redundant computer system is referred to as a “failover” operation. Unfortunately, two problems arise during failover operations: (1) temporary service unavailability, and (2) the long time required for a failover operation. Note that the failover time generally includes three components: (1) file systems synchronization time, (2) cluster framework operation time, and (3) data service shutdown and restart time.
Some mission-critical applications, such as financial and banking applications, require a short recovery time after a computer system fails and also require transparent service resumption/migration, so that transactions can be processed consistently without being interrupted. These types of applications are typically executed on by traditional fault-tolerant computer systems.
Unfortunately, existing UNIX kernels provide no kernel service to generate a checkpoint for a running process and to resume it when needed. Due to this lack of operating system support, fault-tolerant computer systems (e.g., using cluster technology) need to restart a data service when a failover operation occurs. Thus, the original service needs to be shutdown on the source computer system and a new service needs to be started on a target computer system which replaces the source computer system. As a result, when the service is resumed on the target computer system, the target computer system has to reestablish all network-connections to the clients which were established before the failover. As a consequence, although the failover of a logical host is transparent to the client (i.e., the replacement host uses the same IP address as the host that failed), the failover of a data service is not transparent to the client.
One solution to these problems is to provide an efficient mechanism to migrate a process from a source computer system to a target computer system. Process-migration mechanisms typically fall into two categories: (1) heterogeneous process-migration mechanisms and (2) homogeneous process-migration mechanisms. Heterogeneous process-migration mechanisms provide the ability to migrate an active process between computer systems with different architectures (e.g., SPARC® to Intel®). This technique requires a special operating system, and special programming language support. Homogeneous process-migration mechanisms can migrate a process between two systems that have the same architecture (i.e., SPARC® to SPARC®) and the same system configuration.
Several techniques have been developed to support process migration, but most of these projects are aimed at migrating a process at high level, instead of migrating a process along with its kernel states. The biggest obstacle to migrating a process along with its kernel state is the lack of operating system support to gather this kernel state. Consequently, no effort has been made to try to migrate the network kernel state for a live process. In addition to this problem, many existing techniques require the user applications to be recompiled.
Hence, what is needed is a method and an apparatus for migrating a process without the problems described above.