a. Field of the Invention
The present invention relates to an apparatus for controlling data transfer between processor elements in a parallel processing system which interconnects a plurality of processor elements through communication paths.
b. Description of the Related Art
Powerful, high-speed computer systems are being demanded due to recent increases in data quantity and diversification of data processing. In realizing a high-speed computer system, one normally uses a parallel processing system, i.e., a computer system with a plurality of processor elements each of which has its own main memory, and the processor elements are connected in the form of a matrix through communication paths. A parallel processing system can improve processing capability by increasing the number of processor elements. However, when the number of processor elements increases, the frequency of information exchange between the processor elements increases, but the quantity of data for each information exchange (hereinafter called a message) can decrease. As a result, a parallel processing system is required to efficiently transfer a great number of messages between the processor elements.
Further, a main memory of a processor element is often hierarchically structured to increase the performance of the processor element. Therefore, in a parallel processing system consisting of processor elements, main memories having a hierarchical memory structure are also required to effectively transfer messages between processor elements.
FIG. 1 is block diagram of a conventional parallel processing system. Processor elements 1 are interconnected by communication paths 2 via ports 13 to form a matrix. The processor elements execute the same or different programs while mutually exchanging data (messages) over the communication paths 2.
To write data in a port 13 for transmitting the data to another processor element 1, it is necessary to previously determine that the port 13 is ready for receiving the data to be written (called Ready Status). Conventionally, data transmission is carried out by writing data in a port 13 by either hardware in a direct memory access (DMA) mode or software. When data transfer is carried out using hardware, a DMA controller (DMAC) (not shown in FIG. 1), which is connected to the common bus 16 and the port 13, controls data transfer while checking the port 13 for a ready status. On the other hand, when data transfer is carried out by software, a program issues a Move instruction. For example, the program might write data into the port 13 after recognizing a ready status by polling or interrupt processing. In general, to transmit a large quantity of data, DMA transfer by hardware is quite effective. However, data transfer by software is less effective because it takes time for polling or interrupt processing and depends on the instruction execution speed of a central processing unit (CPU).
FIG. 2 is a chart illustrating quantity of data vs. time required to transfer the data in a DMA mode. The chart assumes the CPU 10 of processor element 1 is equipped with a well-known cache memory (CACHE) 11. CPU 10 also needs a flushing function (or flush operation) to write the contents of the cache memory 11 into the main memory (MM) 12 in units of a block (hereinafter called a line) to keep the contents of both memories 11, 12 matched.
Data transfer in a DMA mode is performed between the main memory 12 and a port 13 under the control of an above-mentioned DMAC. Therefore, the time required for DMA transfer is determined by totaling the following three times (1), (2) and (3) illustrated in FIG. 2:
time (1) required to write the data stored in the cache memory 11 into the main memory 12 (flushing), PA1 time (2) required to set parameters for data transfer (e.g. starting address of main memory 12, length of the data transferred, address on a record medium, input/output command, etc.), and PA1 time (3) required for DMA-mode data transfer per se.
Times (1) and (3) are proportional to the data quantity, while time (2) is constant irrespective of the data quantity.
As is understood from FIG. 2, when the data quantity to be transferred is large, parameter setting is not such a large load. On the other hand, the data quantity is small as in the case of a message exchanged between the processor elements 1, it proves to be quite a large load.
Moreover, since a message, whose quantity may be small, is not generally arranged in a continuous block of data, the data pieces have to be buffered (or gathered in an area of main memory 12 to prepare a message, prior to DMA transfer of the message. On the other hand, a message received by another processor element 1 has to be scattered to the areas of main memory 12 as required.
Thus, one problem of the prior art DMA transfer along a time axis is that parameter setting, flushing, buffering and scattering cause a heavy load to a program and reduce the memory areas for user use due to the areas for buffering use because these are all carried out by the CPU 10. Another problem of the prior art is that checking a port 13 for ready status by a program loop will cause a tremendous load on the CPU 10. Moreover, when a message used to control communication between processor elements 1 is transmitted, a transmission delay caused by such buffering and scattering will reduce the performance of the parallel processing system as a whole.