The present invention relates generally to multicomputer systems, and more particularly, to a multicomputer system employing a microkernel-based serverized distributed operating system.
Microkernel-based operating system architectures have been employed to distribute operating system services among loosely-coupled processing units in a multicomputer system. For example, in an earlier microkernel-based xe2x80x9cserverizedxe2x80x9d operating system, a set of modular computer software-based system servers sit on top of a minimal computer software microkernel which provides the system servers with fundamental services such as processor scheduling and memory management. The microkernel may also provide an inter-process communication facility that allows the system servers to call each other and to exchange data regardless of where the servers are located in the system. The system servers manage the other physical and logical resources of the system, such as devices, files and high level communication resources, for example. Often, it is desirable for a microkernel to be interoperable with a number of different conventional operating systems. In order to achieve this interoperability, computer software-based system servers may be employed to provide an application programming interface to a conventional operating system.
The block diagram drawing of FIG. 1 shows an illustrative multicomputer system. The term xe2x80x9cmulticomputerxe2x80x9d as used herein shall refer to a distributed non-shared memory multiprocessor machine comprising multiple sites. A site is a single processor and its supporting environment or a set of tightly coupled processors and their supporting environment. The sites in a multicomputer may be connected to each other via an internal network (e.g., Intel MESH(trademark) interconnect), and the multicomputer may be connected to other machines via an external network (e.g., Ethernet network). Each site is independent in that it has its own private memory, interrupt control, etc. Sites use messages to communicate with each other. A microkernel-based xe2x80x9cserverizedxe2x80x9d operating system is well suited to provide operating system services among the multiple independent non-shared memory sites in a multicomputer system.
An important objective in certain multicomputer systems is to achieve a single-system image (SSI) across all sites of the system. An advantage of an SSI from the point of view of the user, application developer, and for the most part, the system administrator, the multicomputer system appears to be a single computer even though it is really comprised of multiple independent computer sites running in parallel and communicating with each other over a high speed interconnect. Some of the benefits of a SSI include, simplified installation and administration, ease-of-use, open system solutions (i.e., fewer compatibility issues), exploitation of multisite architecture while preserving conventional APIs and ease of scalability. There are several possible beneficial features of an SSI such as, a global naming process, global file access, distributed boot facilities and global STREAMS facilities, for example.
In one earlier system, a SSI is provided which employs a process directory (or name space) which is distributed across multiple sites. Each site maintains a fragment of the process directory. The distribution of the process directory across multiple sites ensures that no single site is unduly burdened by the volume of message traffic accessing the directory. There are challenges in implementing a distributed process directory. For example, such a distributed process directory should be effective in implementing global atomic operations. A global atomic operation (GAO) describes a category of functions which are applied to each process in a set of processes identified in the SSI.
GAOs typically are applied to a set of processes from what is often referred to as, a xe2x80x9cconsistent snapshotxe2x80x9d of the system process directory state. The processes that are operated upon by a GAO are often referred to as target processes. A consistent snapshot generally refers to a view of the directory which identifies the processes in the entire SSI at a discrete point in time. However, since process creation and process deletion events occur frequently, a process directory is a dynamic or xe2x80x9clivingxe2x80x9d object whose contents change frequently. Therefore, the consistent snapshot rule generally is relaxed somewhat such that a consistent snapshot may contain all processes which exist both before and after the snapshot is taken. For the purposes of a GAO, it can be assumed that processes which were destroyed during a consistent snapshot were destroyed prior to it, and processes created during the consistent snapshot were created subsequent to it.
An example of a GAO is what is referred to as sending a signal, which is a mechanism by which a process may be notified of, or affected by, an event occurring in the system. Some application program interfaces (API""s) which are provided to the programmer as part of a UNIX specification, for instance, deliver a signal,to a set of processes as a group; such an API, for example, mandates that all processes that match the group criteria receive the signal. The delivery of a signal to a set of processes as a group is an example of a GAO. The processes in the group are examples of target processes.
In a multicomputer system that employs a distributed process directory, GAOs, which must be applied to multiple target processes, may have to traverse process directory fragments on multiple sites in the system. This traversal of directory fragments on different sites in search of processes targeted by an operation can be complicated by the migration of processes between sites while the GAO still is in progress. In other words, a global atomic operation and process migration may progress simultaneously. The proper application of a global atomic operation is to apply it at least once, but only once, to each target process. As processes migrate from site to site during the occurrence of a GAO, however, there arises a need to ensure that a migrating process is neither missed by a GAO nor has the GAO applied to it more than once.
The problem of a GAO potentially missing a migrating process will be further explained through an example involving the global getdents (get directory entries) operation. The getdents operation is used to obtain a xe2x80x9cconsistent snapshotxe2x80x9d of the system process directory. The getdents operation is a global atomic operation. The timing diagram of FIG. 2 illustrates the example. At time=t, process manager server xe2x80x9cAxe2x80x9d (PM A) on site A initiates a migration of a process from PM A on site A to the process manager server xe2x80x9cBxe2x80x9d (PM B) on site B (dashed lines). This process migration involves the removal of the process identification (PID) for the migrating process from the process directory fragment on site A and the insertion of the PID for the migrating process into the process directory fragment on site B. Meanwhile, also at time=t, an object manager server (OM) has broadcast a getdents request to both PM A and PM B. At time=t1, PM B receives and processes the getdents request and returns the response to the OM. This response by PM B does not include a process identification (PID) for the migrating process which has not yet arrived at PM B. At time=t2, PM B receives the migration request from PM A. PM B adds the PID for the migrating process to the directory fragment on site B and returns to PM A a response indicating the completion of the process migration. PM A removes the PID for the migrating process from the site A directory fragment. At time=t3, PM A receives and processes the getdents request and returns the response to the OM. This response by PM A does not include the PID for the migrating process since that process has already migrated to PM B on site B. Thus, the global getdents operation missed the migrating process which was not yet represented by a PID in the site B directory fragment when PM B processed the getdents operation, and which already had its PID removed from the site A directory fragment by the time PM A processed the getdents operation.
An example of a prior solution to the problem of near simultaneous occurrence of process migrations and global atomic operations involves the use of a xe2x80x9cglobal ticketxe2x80x9d (a token) to serialize global operations at the system level and migrations at the site level. More specifically, a computer software-based global operation server issues a global ticket to a site which requests a global operation. In the exemplary prior solution, a number associated with the global ticket monotonically increases every time a new ticket is issued so that different global atomic operations in the system are uniquely identified and can proceed one after the other. Furthermore, each PID has associated with it the global ticket value of the GAO which most recently considered the PID. As each subsequent GAO considers a respective PID, that PID has its global ticket association changed to match the global ticket of the GAO that most recently considered it. Thus, global tickets are used to serialize all GAOs so that they do not conflict and to keep track of which process PIDs already have been considered by a respective GAO and which process PIDs have not yet been considered by such respective GAO.
More specifically, this illustrative prior solution involves a multicast message carrying the global ticket to process managers (PMs) on each site. Each process manager acquires the lock to the process directory fragment of its own site. The applicability of the global atomic operation is considered for each PID entered in the process directory fragment on the site. The global operation may be performed on a respective process corresponding to a respective PID in a respective directory fragment entry only if a global ticket number marked on the entry is lower than the current iteration global ticket number. A global ticket number marked on a process directory fragment PID entry is carried over from a site the process migrates from (origin site) to a site the process migrates to (destination site). It represents the last global operation ticket such process has seen before the migration,
During process migration, in accordance with the exemplary prior solution, a process being migrated acquires a process directory fragment lock on its origin site first. It then marks its corresponding process directory entry as being in the process of migration. The migration procedure stamps the process"" process directory entry with the present global operation ticket number, locks the process directory on the migration destination site and transmits the process directory entry contents to the destination site. The global operation ticket number on the destination site is then copied back in the reply message to the migration origin site. The migration procedure on the origin site is responsible for comparing the returned global ticket number from the target site and its own. If the global ticket number of the origin site is greater than the number from the destination site, then the global operation already has been performed on the migrating process, although the operation has not yet reached the destination site. The migration is permitted to proceed, but the process directory fragment slot for the migrating process on the destination site is marked with the higher global ticket number. As a result, the global process will skip the migrated process on the destination site and not apply the global operation twice to that process. If the global ticket number of the origin site is less than the number from the destination site, then a global operation has been performed on the destination site and has yet to be performed on the origin site and will miss the process currently being migrated. The migration will be denied and retried later.
Unfortunately, there have been problems with the use of global tickets (tokens) to coordinate global operations with process migrations. For instance, the global ticket scheme serializes global operations since only one global operation can own the global ticket at a time. The serialization of global operations, however, can slow down overall system performance. While one global operation has the global ticket, other global operations typically block and await their turns to acquire the global ticket before completing their operations.
Thus, there has been a need for improvement in the application of global atomic operations to processes that migrate between sites in a multicomputer system which employs a distributed serverized operating system. The present invention meets this need.
The present invention provides a method for responding to a computer system call requesting creation of such new process in a multicomputer system which employs a distributed process directory which is distributed across multiple sites such that different site memories include different fragments of the process directory. A new process is created on a respective individual computer site in the multicomputer system. There is provided in electronic memory of a computer site a designation of sites for which respective process directory fragments include at least one unallocated slot. A site is selected from the designation of sites. The new process is referenced in a slot in a respective process directory fragment on the selected site.
The novel method described above advantageously permits independent disposition of processes and corresponding process directory fragments referencing such processes in the multicomputer system. That is, a process and a process directory structure fragment referencing the process can be disposed on the same or on different sites. This feature makes possible migration of the process from one site to another site in the multicomputer system while the process directory fragment referencing such migrating process remains unchanged. The use of such fixed process directory fragment references to migratable processes makes it easier to keep track of migrating processes during their migrations. As a result, there can be improved application of global atomic operations to migrating processes.
Accordingly, in another aspect of the present invention, there is provided a novel method of process migration. A process which is operative on a first site and which is referenced in a slot of a respective process directory fragment on the first site, is transferred from the first site to a second site. Meanwhile, the reference to the transferred process is maintained unchanged in the slot of the respective process directory fragment on the first site.
Thus, a global atomic operation targeted at a process during process migration are less likely to miss the migrating process since a process directory fragment provides a fixed reference to such a migrating process. Moreover, since the process directory fragment referencing such a targeted does not change, there may be no need to lock the process directory fragment in order to ensure that migrating processes are subject to such global atomic operation. As a consequence, global atomic operations may have less of an impact on overall system performance.
Thus, in yet another aspect of the invention a novel method is provided for implementing a global atomic operation upon a group of processes operative in a multicomputer system. A process directory structure is distributed across multiple sites such that different site memories include different fragments of the process directory structure. Each process directory structure fragment includes a multiplicity of slots. Processes operative on respective sites in the system are referenced in respective slots in the process directory structure. Group information may be associated in respective site memories with respective processes operative on respective sites. This group information indicates group membership, if any, of the associated processes. For example, a group may comprise the processes in a session. A global atomic operation request is issued to a first process manager operative on a first site. The request is directed to a group of processes. A global atomic operation message directed to the group of processes is transferred by the first process manager to process managers operative on other sites. Each process manager that receives such global atomic operation message transfers a respective message to each respective process referenced in a respective process directory structure fragment disposed on the same respective site as such receiving process manager. The transferred messages request performance of the atomic operation. The atomic operation is performed by respective processes that are members of the group. Therefore, during a global atomic operation, fixed process directory fragments are used to locate migratable target processes
Another aspect of the invention provides a novel method of failure recovery in a multicomputer system. A process directory structure is distributed across multiple sites such that different site memories include different fragments of the process directory structure. Processes operative on respective sites in the system are referenced in respective slots in the process directory structure. Process structures are provided. These process structures correspond to respective processes and are disposed on the respective sites on which their respective corresponding processes are operative. Furthermore, these process structures provide references to sites which include slots that reference the processes corresponding these process structures. Whenever a failed site is identified, a reconstruction host site is selected. Process structures on non-failed sites are accessed to identify processes, if any, operative on sites that have an operative process referenced in a process directory fragment of the failed site. The process directory of the failed site is reconstructed on the reconstruction host site such that respective references to respective processes identified in the accessing step are provided in the reconstructed process directory fragment. Also, an attempt is made to contact each process corresponding to a process referenced in any process directory fragment. References to processes that are not successfully contacted are removed from process directory fragments of non-failed sites.
These and other features and advantages of the invention will be understood from the following detailed description of the invention in conjunction with the drawings.