1. Field of the Invention
This invention relates to an input/output system for SIMD parallel computers, and more particularly, to a distributed input/output system using a temporary storage buffer, individual for each processing element of the SIMD computer, capable of providing a two-dimensional data transfer scheme that substantially increases the I/O rate of the SIMD system.
2. Discussion of the Prior Art
Scientists and engineers from all disciplines have become dependent upon computers to further their work, and with this dependancy they have grown to expect the performance of these computers to increase by an order of magnitude approximately every five years. This trend of increasing computer performance in the order of magnitude range is slowing, in fact, the supercomputers presently available may already be within an order of magnitude of their technological limit. Heretofore, the limit was approximately 3 gigaflops which corresponds to approximately 3 billion floating point instructions per second, which is a function of the length of time it takes electrical signals to propagate through various wires and interconnections at approximately one half the speed of light. The drawback of the prior art system is that many of the problems facing todays scientists and engineers can only be solved utilizing computers with performance capabilities far exceeding the 3 gigaflop limit.
Recent advances in supercomputer performance have been achieved by dividing applications among many processors working in parallel. Theoretically, parallel processing computers should provide performance in the teraflop range. While these computers provide increased capacity and speed, they also provide a new set of problems, namely, programming the new computers, handling the input/output operations and manipulating the data. The programming difficulties stem from the fact that no matter how well a program is written, it is extremely hard to achieve 100 percent utilization of multiple processors. The problem of handling input/output (I/O) operations and data manipulation arises because of the sheer volume of data associated with these types of computers. The programming problem may resolve itself with experience while the I/O and data manipulation problems can be lessened by improving the input/output systems for the computers.
As shown in FIG. 1, a conventional SIMD (single instruction multiple data) parallel system includes a SIMD computer 10 interacts with a host computer 20 via an I/O subsystem 30. The SIMD computer 10 consists of a processor array 11, that includes a plurality of processors 12, numbered P1, P2 . . . PN, each of which is a very simple CPU, a network 13 to connect the processors 12, a memory 14 for each processor, numbered M1, M2 . . . MN, and a control unit 15 to issue instructions and clock pulses to the processors. The I/O subsystem 30, typically comprises a staging memory that is responsible for transferring data between the SIMD computer 10 and the host 20.
In fine-grained, massively parallel SIMD systems, one single instruction after another is broadcast simultaneously to the processor array, with each instruction being applied to different pieces of data.
Traditionally, fine grained SIMD parallel systems devoted their application emphasis to image-oriented computing which resulted in the input/output system being designed only to handle regularly structured two-dimensional data such as image or matrix data. The input/output rate of a SIMD computer system was typically low due to the fact that for a N-processor SIMD system, arranged as a .sqroot.N.times..sqroot.N mesh, only .sqroot.N items of data are input or output to or from the system per machine cycle. Most fine grained SIMD parallel systems are connected by mesh networks and their input/output is done by shifting data between a host and one boundary row/column of the SIMD system. This type of data transfer is considered one dimensional. In addition, data must be pre-arranged by the host such that a particular datum can be assigned to a desired processor. The low input/output rate and restricted capability in handling only regular data structures effectively confine SIMD computers to a narrow application domain.
A second disadvantage of the mesh oriented row/column shifting scheme used in the prior art SIMD input/output systems is the difficulty in programming. Since the input/output function is overlapped with the current task execution, the programmer must interleave the instructions for computing with the instructions for input/output. This situation may lead to a very unreadable code as well as force the programming to stay at the assembly language level.
A third aspect of the prior art input/output subsystems presently employed by SIMD computers is the handling of the corner turning function. The corner turning function is a phenomenon due to the different arrangement of data at the host and SIMD systems. For example, N 32-bit words are arranged in the host as N consecutive words, each being 32-bits wide. However, in transfer, these data words are distributed among 32 planes of SIMD memory with each plane containing N bits, each of which is associated with one processor. This situation arises due to the fact that in the SIMD system, all processors need to access the same memory location in the same machine cycle and the plane organization supports such memory accessing. The corner-turning of regular data structures such as image or matrix is supported by mesh-oriented row/column shifting. However, corner-turning for irregular data structures is not supported by the prior art row/column shifting I/O scheme.
As noted above, prior art input/output systems are presently implemented as a centralized piece of hardware, such as a staging memory. This approach requires the centralized input/output system to connect to all processors and as a result, many wires are needed for the input/output system. U.S. Pat. No. 4,727,474 to Batcher discloses a staging memory for a massively parallel computer. The staging memory is a very complex interface between host memory and local processor memory. This network is capable of buffering, permutating, and shuffling of the data. The circuitry to implement this scheme is complex, requires several stages and is not easily distributed to a very large number of processors.
The mesh-oriented row/column shifting scheme is a compromise, because it connects the input/output system to the boundary of the mesh in order to save wires, but, this in turn, reduces the input/output rate of the system.
U.S. Pat. No. 4,380,046 to Fung discloses a massively parallel processor computer which utilizes a one-dimensional input/output scheme. The disclosed input/output system serves as a storage element for input and output operations. The instantaneous logical state of a bidirectional data bus utilized by the system can be stored into the input/output system in a one bit register and similarly, the logical state of the one bit register can be read out to the data bus. The disclosed input/output system is capable of shifting bits to the input/output system in neighboring processing elements. The bits are shifted only in a single direction and thus for a mxm processing element array, one bit slice data stream array will require m shifting operations to move the data array into the processing element array. Thus, there is a need for an I/O system that reduces wiring complexity while maintaining a high input/output rate.