The present invention relates to modifications of inter-process relationships in a multiprocessor system and to inter-process signalling. An understanding of certain inter-process operations, described below, is necessary in order to understand the invention. UNIX.TM. is taken as an example.
Recent years have seen a significant rise in the commercial popularity of the UNIX.TM. operating system. Although UNIX.TM. was originally preferred only by computer scientists, computer science students and other extremely technically proficient computer users, the preference for UNIX.TM. as a commercial programming environment is growing as those students matriculate into the work force and carry their developed preferences with them. Accordingly, it behooves a computer manufacturer to provide a UNIX.TM. or UNIX.TM.-like programming environment along with its proprietary hardware.
However, UNIX.TM. has historically been an operating system for uniprocessors: originally the Digital Equipment Corporation's PDP-11, later mainframes, and still later microprocessors with the boom in microcomputers. Even today, only a handful of multiprocessor implementations of UNIX.TM. exist. The assignee of this invention, Tandem Computers Incorporated, is preparing to offer for sale one such multiprocessor implementation under the product name NonStop Kernel Software ("NSK"), release D30.00. The UNIX.TM.-like portion of NSK is referred to as the Open System Services, "OSS," for short.
In UNIX.TM., a "process" is the dynamic, run-time embodiment of a program. The program typically resides in a static state on a storage medium such as disk or tape, while the process is loaded into memory and is executing. UNIX.TM. is a multitasking operating system: Many processes can be executing essentially simultaneously.
In UNIX.TM., a process can create another process by performing the fork() system call. The result of a fork() system call is the creation of a new process which is a copy of the old process, except inter alia it has its own unique process identification number. This procedure of a process creating a copy of itself is called "forking."
A process can also "exec," that is, a process can change the program code which it is running by reading in a new program--again typically from disk or tape--overlaying its old program with this new program, and executing the new program from the beginning. A process can accomplish this by calling the exec() system call.
In forking, the older process is called the "parent" and the newer process is called the "child." Of course, a parent can have many children, while a child has only one parent UNIX.TM., however allows processes to maintain other inter-process relationships, such as the process group relationship. Each process is a member of a process group. The default process group of a process is the process group of its parent. A process can change its process group by executing the appropriate system call, typically setpgid(). Accordingly, a child process can choose to be in the same process group as its parent or some other process group. A process group may have one or more processes as members.
Process group membership is important because the occurrence of an event within the system may need to be communicated to multiple processes. The process group can identify every process which is to be notified of the event. To take a standard example, suppose a user logs onto a UNIX system. The "login" process by which the user first communicates with the system has a process group. The login process typically execs a command interpreter (a "shell") to enable the user to execute shell commands and programs. The shell's executing a program entails forking and then execing the program, as described above. Thus, the newly executed program will have the same process group as the shell, its parent. In fact, any program executed by the shell, its children, its grandchild, etc. will have the same process group by default. Now, if the communication line between the user and the shell is broken, intentionally or otherwise, then the preferred action is for the shell and each process which has the shell as an ancestor to be notified of that event and to terminate itself. (The termination of a process is referred to as "exiting." Exiting occurs as a result of calling the exit() system call.)
The mechanism in UNIX.TM. for notifying processes of asynchronous events is called signalling. Processes can send each other signals, using the kill() system call. The operating system itself may send signals to processes. A process or the operating system may send a process group a signal. Sending a signal is referred to as signalling.
From the above, it is apparent that UNIX.TM.'s traditional multi-threading paradigm allows for essentially asynchronous modification of inter-process relationships, e.g., through forking and exiting. However, with such asynchronous inter-process relationship modifications, the question arises, how does a UNIX.TM. operating system guarantee atomic and ordered modifications of inter-process relationships?
Also apparent from the above is that UNIX.TM.'s traditional multi-threading paradigm allows for asynchronous modification of inter-process relationships while signalling occurs. For example, how does a system guarantee atomic, ordered delivery of signals in the presence of forking? A specific example of the signalling problem is presented in POSIX, discussed below, at section B.3.1.1
In the historical single-processor UNIX.TM. implementations, the asynchronicity of inter-process relationship modification and signalling did not present a significant problem with respect to atomicity and ordering. An implementation of a call to modify an inter-process relationship could involve an uninterruptable (at least at the crucial stage) access to the underlying kernel. Thus, the inter-process relationship modification could be performed on behalf of one process while another process desiring to modify inter-process relationships would be locked out.
Likewise, a call to kill() would result in a single pass through the kernel wherein the kernel generates a signal on behalf of one process and substantially simultaneously delivers that signal to all processes in the signalled process group. While the kernel is performing the mechanics of signalling, it can exclude from execution any processes which would simultaneously modify process group memberships.
The problems of atomic and ordered modification of inter-process relationships and atomic, ordered delivery of signals are much more intractable in multiprocessor implementations of UNIX.TM.. Also, multiprocessor environments raise the question of reliability: how does the multiprocessor system insure consistent inter-process relationships in the presence of a failing processor or processors? How does the multiprocessor guarantee reliable delivery of signals when processors fail? One of the facts of multiprocessor systems--at least non-shared memory multiprocessors--which increase the intractability of the atomicity, ordering and reliability problems is that the processes in a process group can be and usually are distributed over more than one processor. The uniprocessor solution of having the kernel resolve any potential timing conflicts by single-threading is unavailable in the multiprocessor environment: There are multiple kernels, operating asynchronously, and on each kernel are multiple processes, each running asynchronously. Acting independently, each processor can only insure the reliable and ordered modification of inter-process relationships on that processor. For example, on a first processor a first process may be generating a signal for delivery to a process group. The process group has processes including the first process and a second process on a second processor. At the same time that the first process is generating a signal for the process group, the second process is forking a third process, which will also be a member of that process group for a limited time and then change its process group membership. Does the third process receive the signal generated by the first process or not? Thus, the much-desired paradigm of the multiprocessor system being simply a more powerful or faster version of the uniprocessor system begins to disintegrate. Without a resolution of these atomicity, ordering and reliability problems, the multiprocessor system cannot offer the same services as a uniprocessor UNIX.TM. system implementing signalling. In particular, a multiprocessor system cannot offer the full system services detailed in POSIX.
In fact, the problem has been so intractable in multiprocessor systems as to cause such hardware vendors to offer software products without a solution. The LOCUS TNC, MACH, OSF1, and ISIS implementations are described in turn below. Locus has a product called LOCUS TNC. LOCUS TNC implements a UNIX.TM. distributed system, based on a "vproc" abstraction. A vproc is a data structure which only refers to a process. Copies of a single vproc may exist in the memories attached to many processors. An "owning" or "master" processor describes the actual process with a non-referring data structure. At an overview level, the vproc abstraction allows the processors which have vprocs to be out-of-step with the master copy, and the local copy is used for some operations. Thus the system saves the expense of some messages between the master processor and a modifying vproc processor. It is believed that LOCUS TNC does not correctly deal with the atomicity, ordering and reliability conditions described above.
MACH, available from International Business Machines of Armonk, N.Y., and OSF1, available from the Open Software Foundation, Cambridge, Mass., are also a multiprocessor UNIX.TM. implementation. The MACH/OSF1 solution involves a "UNIX.TM. server," a single, multi-threaded process which maintains process relationships. This process is not distributed. There is only a single copy. Thus, it does not address the distributed algorithm discussed here.
ISIS solves a similar set of problems for message ordering and process group membership--but using a different definition of a "process group" and not for signalling. ISIS does not attempt to implement UNIX.TM.-like semantics.
There are no known implementations of atomic, ordered and reliable modification of inter-process relationships or signal delivery in a distributed processor system, particularly in a multiprocessor system without shared memory. Indeed, prior to the pending release of NSK with OSS, Tandem Computers Incorporated did not offer such features in its UNIX.TM.-like operating system software.
Along with ancestry and process groups memberships, another UNIX.TM. inter-process relationship is the "session." A session is a collection of process groups, allowing users to selectively suspend the execution of processes and resume their execution at a later point. Each process group is a member of a session, and a process is a member of the session of which its process group is a member.
There are other inter-process relationships in UNIX.TM., the mentioned three are simply the primary ones. The primary ones suffice, however, to illustrate that certain UNIX.TM. functions operate on individual processes or process groups, sessions or other inter-process relationships. In a multiprocessor environment, the simultaneous, asynchronous operations manipulating these inter-process relationships can create numerous race conditions as the processes on various processors modify distributed data structures.