Large scientific data processing applications can often be partitioned such that they may be carried out by several concurrently operating (parallel) processors, each of which handles a different portion of the problem so as to reduce the total processing time required. In a concurrent parallel processing system, for a given topology, any node processor can be attributed the task of "master processor" for a given time or application. The master processor controls the distribution of tasks among the slave processors and monitors and directs their progress. The slave processors often share large volumes of data among themselves as required by the particular tasks assigned to each.
The master processor typically provides the primary user interface to the parallel processing system and as such may require rapid access to a large volume of data. These system level functional requirements in turn place the following requirements on the communication network serving the parallel processing system: (1) flexibility to support dynamic re-configuration and asynchronous communication; (2) wideband communication to support rapid transmission of high density data; and (3) real time communication of system commands and system status with reduced software transmission overhead. These requirements are independent of the parallel processing system configuration used (i.e,, fixed, dynamic, tree, mesh, cubic, hypercubic) but do become increasingly critical as the system's configuration increases in complexity. Increased communication efficiency with respect to time has been a focus of the prior art.
Prior art solutions have taken a variety of approaches to improving data communication speed in the demanding parallel processing environment. The solutions have called for trade offs between speed, flexibility, the maximum number of nodes permitted, and cost. For example, Cowley U.S. Pat. No. 4,876,641 discloses a parallel processing network with a plurality of processors located on a plurality of chips in multiple rows and columns. The processors are interconnected with a first switching of logic means (multiplexer) on each chip which interconnects the processors in parallel data paths. The processor is also interconnected with a second switching logic means (multiplexer) external to the chips which connect selective rows and columns between chips in serial data paths.
The Cowley patent system involves communications between parallel processing elements in both parallel and serial data paths. The processing elements are envisioned as simple shift registers with the multiplexer switching between serial and parallel data paths; this is considerably different from the parallel transfer controller envisioned by the present invention. There is no apparent distinction made between the types of information flowing over serial and parallel paths. There is apparently no discussion of transmitting command messages over the serial links and data information over the parallel links.
Call U.S. Pat. No. 4,891,751 patent discloses a massively parallel processing system which includes a transputer interfacing with a processing node to other processing nodes. A separate peripheral processing network of peripheral processing nodes are also interconnected to transfer data back and forth. One network may interface with the other or may bypass the other network as desired. However, data is transferred by serial link transfers between nodes rather than on fast parallel channels.
Kneib U.S. Pat. No. 4,641,238 shows a network of parallel processing nodes communicating with each other over a serial bus. The nodes also communicate over a serial bus to a central global memory which feeds to a master processor. An arbiter controls which of the nodes is to be utilized in various computations required by the master processor. Kneib shows serial interconnection between the nodes. However they communicate by parallel only indirectly through the global memory under the control of master processor and arbiter. The local nodes do not determine the transmission of data over the parallel bus. Moreover, there is no distinction between transferring control commands on the serial bus and data on the parallel bus as in the present invention.
Therefore a need exists for an efficient communication solution to support a parallel processing network having a plurality of processing nodes. The present invention meets this need through an optimized transfer of data between processing nodes over a fast parallel channel in response to control commands being sent between processing nodes over serial links. The approach of the present invention facilitates efficient reconfiguration of parallel system topology, re-distribution of tasks among the nodes, and maintains maximum data transfer rates.