1. Field of the Invention
This invention relates to improvements in data processing systems having interconnected multiple processors between which data and control information is exchanged.
2. Prior Art
One mechanism that is frequently used to increase the processing capability of a system is the use of multiprocessing, i.e., the addition of a second or third processor. This increases the number of computer instructions per second available to apply to a task. The interconnection channel typically will consist of a parallel bus, with the transfer being storage-to-storage in nature. Frequently the transfer will be the movement of large "blocks" of data from the storage of one processor to the storage of another processor. The data rate of this transfer is of major concern; if it is too slow, the full advantage of multiple processors is not achieved; if it is too fast, it will tend to stop effective processing of both processors and impact any time dependent operations such as I/O devices, interrupt processing, etc.
One of the problems associated with a multiprocessor system is that the system designer must carefully balance the transfer speed and block size of the processor-to-processor transfer such that neither processor is "locked out" during the transfer, while getting maximum benefit from the additional processors.
In a typical system structure, access to the storage subsystem is through a common address and data bus. Thus, all transfers between processors will directly reduce the available storage bandwidth, and hence never obtain the maximum potential benefit inherent in the multiprocessor system structure. Any "lock out" and reduced processing capability may increase interrupt latency beyond desirable or acceptable limits.
A block diagram showing the data flow for a conventional prior art processor-to-processor transfer is contained in FIG. 1. In this example, two processor subsystems are shown with data flowing from P1 to P2. Neglecting initialization and transfer ending service, the data transfer sequence can be subdivided into 3 operations as follows:
1. This phase of the operation reads data from the storage unit of processor P1 and transfers it to the interface network of P1. During this phase, processor P1 is prohibited from accessing its system bus.
2. The second phase of the operation concerns itself with the transfer of data over a processor-to-processor channel.
3. The data is written into the storage unit of processor P2 during the third phase of the operation. During this phase, processor P2 is prohibited from accessing its system bus.
If the system is designed to maximize the processor-to-processor transfer rate, then both processors P1 and P2 will be prohibited from accessing their internal busses during all three phases of the operation for the duration of the block transfer. Both P1 and P2 will be locked out of their respective storage units, and thus stopped from executing instructions during the transfer.
The system can be designed to distribute the interference over a period of time. Access to the storage unit by the processor-to-processor interface network may be interleaved with other activity within the respective system, such as instruction fetching or direct memory access DMA traffic, for example. In this environment, processors P1 and P2 will be stopped only during phases 1 and 3, respectively. Thus, instruction execution would continue, but at a reduced rate. Compared to the previous example, the interference will occur over a longer period of time with the accumulated or total interference being greater due to the asynchronous nature of the two activities (instruction execution and the transfer operation) and losses due to repeated arbitration at the internal system bus.
In either example, interference to the processor is directly proportional to the amount of data transferred.