The problem of controlling the allocation of resources in a distributed or multiprocessor system is well known. Multiprocessor systems fall into one of two categories: tightly coupled multiprocessor systems or distributed architecture multiprocessor systems. In a tightly coupled multiprocessor system, the processors share common memory and kernel data structures and schedule processes from a common pool. In a distributed architecture multiprocessor system, the processors are pooled to allow resource sharing but each processor retains autonomy over its own environment. Each processor or computer is an autonomous unit consisting of a CPU, memory, and peripherals. A computer can be used in the distributed architecture even though it does not have local file storage. The most important feature that distinguishes distributed systems from tightly coupled systems is that the physical memory available to each machine is independent of activity on other machines. Consequently, the kernels on each machine are independent, subject only to the external constraints of running in a distributed environment. There are three major types of distributed systems. The first type is satellite systems which are tightly clustered groups of machines that are centered on one machine. Normally, the center machine is a larger machine. The satellite processors share the process load with the center processor and refer all system calls to it. The purpose of a satellite system is to increase system throughput and, possibly, to allow dedicated use of a processor for one process in a UNIX system environment. The system runs as a unit; unlike other models of the distributed system, satellite processors do not have real autonomy except, sometimes, in process scheduling and in local memory allocation.
Newcastle distributed systems are the second type of system. A Newcastle distributed system allows access to remote systems by recognizing the names of remote files in the C library. The remote files are distinguished by special characters embedded in the path name or by special path component sequences that proceed the file system route. This method can be implemented without making changes to the kernel and is therefore easier to implement then the other types of systems, but is less flexible. The final type of distributed system is the transparent distributed system. The latter system allows standard path names to refer to files on other machines; the kernel recognizes that the files are remote. Path names cross machine boundaries at mount points, much as they cross file system mount points on disks.
The satellite architecture provides a configuration that improves system throughput by offloading processes from the central processor and executing them on the satellite processors. Each satellite processor has no local peripherals except for those it needs to communicate with the central processor. Each process on a satellite processor has an associated stub process on the central processor. When a process on a satellite processor makes a system call that requires services provided only by the central processor, the satellite process communicates with its stub process on the central processor to satisfy the request. The stub process executes the system call and sends the results back to the satellite processor. The stub process is created when the process is assigned to the satellite processor. The problem with the satellite architecture is that all system calls involving external files or devices must be handled by the central processor thus slowing down the throughput of the system. Further information concerning the satellite architecture may be found in the article by Birrel, et al., "Implementing Remote Procedure Calls", ACM Transactions on Computer Systems, Vol. 2, No. 1, Feb. 1984, pp. 39-59.
In the Newcastle architecture, the kernel does not participate in determining that a file is remote; instead, C library functions that provide the kernel interface detect that a file access is remote and take the appropriate action. For both naming conventions, the C library parses the first components of a file name to determine that a file is remote. The problems associated with the Newcastle architecture are as follows. System performance may be degraded. Because of the larger C library, each process takes up more memory even though it makes no remote references; the library duplicates kernel functions and takes up more space. Local requests may execute more slowly because they take longer to get into the kernel, and remote requests may also be slow because they have to do more processing at user level to send requests across a network. Finally, programs must be recompiled with new libraries to access remote files; old programs and vendor supplied object modules do not work for remote files unless recompiled. The problem then with the Newcastle architecture is that it is not transparent to the user.
The transparent distributed architecture has a pool of server processes and assigns them temporarily to handle each remote request as it arrives. After handling a request, the server process re-enters the pool and is available for assignment to other requests. The server process does not remember the user context between system calls, because it may handle system calls for several processes. The server processes are set up by the system administrator at initialization time. The problem with the transparent distributed architecture is that for each remote operation, process-specific information must be transmitted to the server process thus increasing the amount of information that must be communicated by packet. Another problem is in handling flow control since a server process is locked up waiting to finish the operation using a large number of server processes.