1. Field of the Invention
The present invention relates to process and data migration, and more specifically, to safe transmitting active processes from one server to another with minimized down-time.
2. Related Art
Human beings are becoming increasingly dependent on computer systems. From simple home computers, which are used for word processing and Internet access, to high speed, multi-processor systems powering genomic research and simulating nuclear explosions, computers have permeated society. As computer use continues to increase and people become even more dependent on them, people need access to the information stored on computer systems on a continual basis.
One approach to providing continuous access is to make information available through multiple servers, where each of the servers performs the same task. In such an arrangement, the remaining server or servers can continue to provide information in the event one of the servers fails. While such an approach is advantageous for mission-critical information, implementation costs associated with such configurations can be considerable. Not only does such an approach require additional servers, which can significantly increase the costs, but such an approach also requires additional hardware and/or software to properly direct network traffic, balance the load placed on the servers, and other related and similar functions.
Although a large share of the information on the Internet, and even within corporate intranets, is not mission-critical, it is still desirable to minimize and plan down-time associated with even non-mission-critical information. For example, the time that a company's web site is unavailable can result in lost sales and decreased customer satisfaction.
An unfortunate reality is that servers must be down for a certain period of time for routine maintenance, such as applying operating system patches or hardware and/or software upgrades. Also, when old servers are replaced with new servers, the transition from one physical server to another requires interruption of service for a period of time. Transitioning, or migrating processes running on one server to another in such circumstances is generally referred to as “process migration.”
Other reasons for migration may include the need for a security patch, a need to reduce the load on a particular server, which is overloaded at the moment, a software upgrade. Generally the system administrator can decide for whatever reasons that a particular process should be migrated to a different server, etc.
An example of a conventional process transfer method is illustrated in FIG. 1. As shown in FIG. 1, process migration typically involves first shutting down a computer processes running on a first server, such as processes providing access to E-mail, stored files, or the like (step 100). A file copying process is then initialized (step 110). The file copying process typically copies only files associated with the process of concern (i.e., E-mail, file server, or the like) to the new server (step 120). The file copy process continues until all appropriate files have been copied (step 130). The process or processes of concern is then launched on the new server (step 140).
Typically, the following conventional methods for service transfer are used. A direct transfer method implies direct transfer of a service process or processes from one physical machine to another physical machine. The direct transfer is only possible if both machines use the same operating system and the same set of files.
Direct migration/transfer of all the processes supported by the operating systems yields optimal results as it requires no server disconnection. With smooth service migration and zero down-time this method appears to be the most efficient. However, in reality, this method does not always work because the service has to be supported by its own operating system so that all of the system settings and parameters exist at the new machine at all times.
The same is true for a file system. Migration processes require that all of the content of the files residing on both machines are absolutely identical. Such architecture is quite a challenge for a designer of an operating system. So far there are only experimental distributed operating systems and special development projects for operating system kernel modification have been implemented (for example, the MOSIX project developed for Linux operating system kernel).
Modern conventional operating systems do not support services of this kind due to numerous technological difficulties. For instance, regular implementations of UNIX type Linux, Sun Solaris, HP HPUX, FreeBSD, various versions of Microsoft Windows operating systems, Apple Mac OS X and others do not contain the necessary tools for effective process migration discussed above.
Another method of service migration with scheduled non-zero down-time is easier to implement. This method implies that the service at first is stopped at the original machine and only after that gets lunched at the new machine. The stoppage of the process, however short it may be, requires more than just restart of the processes and services on a new machine. It also requires that all of the data is absolutely identical on both of the machines.
Use of the network file systems, such as Sun Microsystems' NFS (Network File System) for UNIX, allows for maintaining the identical data at more than one machine. This system de facto represents a standard for UNIX operating systems and is implemented (for both the client and the server) in all of modern operating systems. The files used by a process or a service are located on a file server. The two machines are equipped with client sets of the corresponding file systems providing for visibility of the files. All changes to the files made by the process or the service are traced and copied to the file server by the file system.
The updated files are immediately accessible for use by the migrated processes. Thus, the transferred/migrated process can be immediately launched at the new machine provided that the proper software has been installed. This system nevertheless has certain shortcomings. One of the disadvantages of this method is very high requirements for the network file systems. For example, NFS (Network File System) is implemented in such a way that loss of connection between the two machines affects their performance significantly. In some instances, applications using files from a file server and from operating system slow down or stop their execution completely.
Another disadvantage of this method is scalability of the system. Modern network file systems (such as NFS) are limited by the number of clients that could be successfully served by an average workgroup server. This number ranges from 10 to 100.
Yet another disadvantage is typically referred to as “a single failure point.” The single failure point means that a disconnection of a file server disables all of the machines that use this file server. Thus, the method using a dedicated file server is very often unacceptable. This problem can be solved by synchronization of data by the technique of an online mirror backup, also referred to as Redundant Arrays of Independent Disks (RAID).
The principle of a mirror backup is quite simple: as the operating system stores a record into the disk sector where the file data is stored, the record is written to the disk and to its mirror backup. This provides for precise copying of all of the disc stored data. Essentially, this method requires disks duplicating. Therefore, data mirroring is quite expensive and inefficient. It is also difficult to implement and usually not very convenient to implement, because a server for storing the data copies should be determined prior to the service launch. Change of the location of the service launch will require moving of all the data from its old location.
The easiest way to solve this problem is to copy data directly for the time interval between the stoppage of the service or process at the original machine and its restart at the new machine. However, long time required for copying data of a large size is a great disadvantage of this method. Physical time of the transfer in this case equals the planned down-time of the service.
As discussed herein, a service transfer or a service migration from one physical machine to another requires server function interruption. A need exists, therefore, for software and hardware updates as well as regular service reorganization and reconfiguration that do not result in results in scheduled and unscheduled server function interruption.
Furthermore, a process and services it provides may be unavailable for significant periods of time, depending on the number and size of the files to be copied. Even with respect to non-mission-critical information, the time required to copy the files can have significant and unfortunate effects on a business.
Accordingly, there is a need for an efficient method for process migration with planned minimized down-time.