1. Field of the Invention
The present invention relates to a multiprocessor system, and in particular relates to a multiprocessor system which is capable of reducing overhead due to required synchronization among the processors and to ineffective scheduling. The overhead is reduced as much as possible to improve system performance and to provide effective usage of processor resources.
2. Background of the Problem
The advance of VLSI methods has provided multiprocessor systems with each system having many processors. Parallel processing, which enables one to perform tasks rapidly through the use of a plurality of processors, is also gaining in importance. In such multiprocessor systems, sometimes one processor uses the result of a process performed by another processor. In this situation, acknowledgement of the completion of that process, an important aspect of synchronization, is required. In order for a plurality of processors to operate in cooperation with one another, synchronization among processors is thus seen to be indispensable.
Conventional synchronization techniques are now described. In a computing system, control of real hardware resources are performed by an operating system (hereinafter referred to as an "OS"). A user or programmer describes operations by using the concept of "process" which virtualizes a real processor. Real processors are allocated to processes, one processor to one process, under control of an OS to perform the operations. Such allocation is referred to as "process scheduling" (hereinafter referred to simply as "scheduling").
In parallel processing, a plurality of processes which should operate in cooperation with one another are created, and parallel processing proceeds, keeping synchronization among the processes. Conventionally, the following two methods have been employed for synchronization. The first is the performance of synchronization through an OS, and the second is through the use of shared memory among processes. For synchronization, some kind of shared entity is required which enables the exchange of information among processes which are synchronized with one another. The first method uses an OS as the entity, and the second uses a memory. The problems associated with these two methods are now described. In the case where synchronization is achieved through an OS, a process which does not establish synchronization is removed from the allocated processor and enters a sleeping or idle state, and the freed processor is allocated to another process. In such a way, processor resources are effectively used. The synchronization through an OS however causes an undesirable overhead. The repetition of entering a sleeping state and thereafter receiving an allocation produces a degradation in performance. If the granularity of a program is large enough, the overhead can be neglected. In most cases it is not however negligible.
In the case that synchronization is achieved using busy and wait states and a shared memory rather than using an OS, the above overhead problem can be avoided. However, another problem can occur. As mentioned above, an OS dispatches one process to one processor at a time. During a single scheduling operation, the 0S cannot usually assign a plurality of processes to a plurality of processors at one time. For example, consider a program where a plurality of processes are created for parallel processing and they operate in synchronization with one another. Depending on the scheduling operation, some processes in the group can be dispatched to processors and the remaining processes can be in an idle state waiting for dispatching. In this case, a process can try to establish synchronization with another process which is not scheduled to any processor and then an ineffective busy and wait condition can occur. An example is a case where processes are dispatched to processors as shown in FIG. 1, and the processes A1, A2 and A3 are in a loop of busy waiting (in synchronization) for use of the operational result of process A4. In such a case, while CPU time is being consumed, programs will not proceed until process A4 gets dispatched to one of the actual processors upon rescheduling by a time slice operation or the like. In addition to the scheduling problem, when a "barrier synchronization" (that is, when a plurality of processes each wait for the others at a point) is performed through a shared memory, exclusive memory accesses for the synchronization occurs in a concentrated fashion in the multiprocessor, thus raising the problem of overhead due to contention of data communication paths and the like.
As indicated from the above, process synchronization and scheduling are very much correlated. For applications involving certain kinds of parallel processing programs, adjustment of scheduling can improve performance. In a conventional OS, however all processes are scheduled based on the same algorithm, so that scheduling cannot be adapted to individual processes.
The following are relevant to the background of the present invention.
1. "Stellix: UNIX for a Graphics Supercomputer", Proceedings of the Summer 1988 USENIX Conference, Jun. 20-24, 1988, San Francisco, Calif., USENIX Association, pp. 321-330, Thomas J. Teizeira & Robert F. Gurwitz.
This article appears to disclose that a fault signal is generated by hardware when all the processes are in a wait state during synchronization operation by a special instruction stream for synchronization. However, that article does not even suggest that a process itself should check certain conditions using processor information in a shared memory (as stated later, the information desired includes data on dispatching of processes to processors, on grouping of processes and on process synchronization) to issue a rescheduling request and to provide effective process synchronization.
2. IBM Technical Disclosure Bulletin Vol. 32, No. 1, Jun. 1989, pp. 260-262, "DEVICE THAT PROVIDES FOR CONTENTION-FREE BARRIER SYNCHRONIZATION IN A MULTIPROCESSOR".
3. IBM Technical Disclosure Bulletin Vol. 31, No. 11, April 1989, pp. 382-389, "LOW-COST DEVICE FOR CONTENTION-FREE BARRIER SYNCHRONIZATION".
The above articles (2) and (3) disclose hardware configurations for performing barrier synchronization in a concentrated fashion, but does not even suggest any design for synchronization waiting.
4. H. S. Stone, "High Performance Computer Architecture", Addison-Wesley, Reading, Mass., 1987.
This text book provides a tutorial explanation about barrier synchronization in general.