A virtual machine is a collection of software and hardware to support distributed computations between separate computers. Distributed computing enables users to exploit a network of computer hardware to solve pieces of larger problems at minimal additional cost. A virtual machine has three basic components. First, a network of computers, or workstations, is the basic resource for process execution. Second, a number of daemon processes residing on the workstations to perform the virtual machine functions. A daemon is a program that is not invoked explicitly, but lies dormant waiting for some condition or conditions to occur. The virtual machine daemons work collectively to provide resource access and management. A computer process can access the virtual machine's services via programming interfaces provided in the form of library routines. The third basic component is a scheduler, i.e., a process or a number of processes that control resource utilization within the virtual machine. The scheduler functionalities include bookkeeping and decision-making. Unlike in static distributed environments, such as that supported by PVM and MPI, a scheduler is a necessary component of a dynamic distributed environment such as the Grid environment disclosed by I. Foster and C. Kesselman in The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1998.
PVM (Parallel Virtual Machine) is a known software package that permits a heterogeneous collection of Unix and/or NT computers hooked together by a network to be used as a single large parallel computer, but does not allow migration. Thus, large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. With tens of thousands of users around the world, PVM has become a well known program for distributed computing. MPI (Message Passing Interface) is a well known program environment for process communication in a parallel computing system. MPI provides a library of routines that can be called from Fortran and C programs. MPI's advantages over older message passing libraries is that it is both portable because MPI has been implemented for almost every distributed memory architecture and fast because each implementation is optimized for the hardware it runs on.
Process migration moves a process running on one computer to another computer. A “process” is defined as at least a piece of a computer program which is in operation. Thus the definition of “process” herein necessarily implies at least a portion of a computer running the relevant process.
The term “computer” is defined as those components necessary to accomplish the running of a process, including but not limited to a processor, executable software routines, and memory. Thus, the term “computer” implies a location, or locations, where the process may operate.
One “workstation” may contain one or more computers and may run one or more “processes”. Migration, as stated above, will involve the transfer of processing operations from a first computer to a second computer. The person having ordinary skill in the art will understand that in a heterogeneous distributed computing environment, the computers may be, but are not necessarily, physically separated.
The process migration may be available through either a network of similar computers (homogeneous process migration) or over computers with different hardware/software environments (heterogeneous process migration). Motivations for process migration may include, for example, processor load balancing, fault tolerance, data access locality, resource sharing, reconfigurable computing, mobile computing, pervasive computing, system administration and high performance achieved by utilizing unused network resources. Process migration can also be used for portability such as migrating processes from one computing platform to an upgraded one. For example, enabling cellular computing from hand held devices may be especially suited for using process migration.
Process migration is also a fundamental technique needed for the next generation of internet computing. As large-scale distributed applications are becoming more and more popular, it is clear that migration-supported communication protocols may be essential for next-generation network computing. However, despite the need for these advantages, process migration has not been widely adopted due to its design and implementation complexities, especially within a network of heterogeneous computers.
Protocols are algorithms which function mainly to establish orders or rules for interacting processes. Communication protocols are sets of rules that regulate the exchange of messages between computer processes to provide a reliable and orderly flow of information among the communicating processes. Within a virtual machine, where multiple separate, but inter-related, processes may continually need to interact, it is evident that efficient and reliable communication protocols are especially valuable.
Traditionally, data communication between processes running on different computers is conducted with networking protocols such as TCP/IP and ATM. Although these communication protocols are reliable, efficient, and popular point-to-point communication protocols, they do not support process migration. Thus, they may be referred to herein as the “non-migration-supported protocols.” Migrating a process under non-migration-supported protocols may lead to message loss from peer computers or within the migrating process. The known art may implement mechanisms with the body of specific computer applications to prevent message loss. However, the application-specific approach lacks modularity and reusability for a virtual machine environment.
There are only a few communication software systems that support process migration in distributed environments. To the inventors' knowledge, the known systems only support process migration in homogeneous distributed systems, i.e., where the computers have the same computing software and hardware. Each of the known systems may also have drawbacks in their practical application to real-world distributed environments including lack of scalability, and lack of modularity of high resource usage.
Detailed descriptions of the known systems may be found in the “related work” section of the document: K. Chanchio and X. H. Sun, “Communication State Transfer for Mobility of Concurrent Heterogeneous Computing,” Proceedings of the 2001 International Conference on Parallel Processing, September 2001.
Therefore, there is a need in the art for communication and migration protocols which offer improvement of parallel computing performance to achieve efficient resource utilization in a distributed computing environment especially in heterogeneous environments.