This invention relates generally to interconnection circuitry, and more particularly, to interconnect circuits for use in high-speed computers employing a parallel processing architecture. Although there may be other practical uses for the interconnect circuit of the invention, it will be described here in relation to its application in parallel processors.
Modern parallel processors have a number of processing elements, such as arithmetic multipliers and adders, which can operate simultaneously. Such computers are useful in addressing certain classes of computation problems, for the most part in scientific applications, which could not be solved fast enough using a more conventional serial processor, in which each arithmetic or logical operation has to be performed. A serial computer is typically controlled by a program, of which each instruction is stored in a single "word" of memory. Since there is only one processing element in a serial computer, it is logical to store a program in this manner. The computer retrieves the program word-byword and thus executes the program of instructions.
By way of contrast, the parallel processor can handle at one time as many instructions as there are processing elements. The instructions are usually stored and retrieved in a single, relatively wide word. Hence the term "horizontal processor". It can readily be understood that, depending on the complexity of the computational problem to be addressed, the preparation of an efficient program for such a computer may be a very difficult task. For a serial computer, programming is relatively straightforward, since there is only one processing element to be concerned about. For the horizontal processor, the programmer's goal is not only to program the necessary steps to arrive at a desired arithmetic result, but to do so with the most efficient utilization of the parallel nature of the processing elements.
Complicating the programming task is the nature of the computational problem, which is generally not completely parallel. For example, the output from an adder may be one of two input quantities to be multiplied in a multiplier circuit, but the other required input to the multiplier may have to be derived from the adder output by first adding to it an additional quantity. Thus, the first multiplier input must be stored until the second is also available. In the past, such temporary storage was provided by "scratchpad" memory devices. In this simple example, the first available input to the multiplier would be stored in a scratchpad memory, then later retrieved when the second input was available and the multiplier was also available.
Basically, then, the task of programming a horizontal processor is one of scheduling the times of operation of the available computer resources, to make the best use of those resources in a parallel manner. The computer includes not only the resources or processing elements, but an interconnect circuit, by means of which outputs from selected processing elements can be connected as inputs to other selected processing elements. When an output has to be temporarily stored in a scratchpad memory, the memory is in effect another computer resource, the use of which has to be scheduled by the programmer. Unfortunately, there is a large class of scientific computations for which these scheduling steps are not only non-trivial, but can be performed only on a time-consuming trial-and-error basis.
The class of computations referred to is that involving iterative techniques. Iterative computations, in which an identical, or nearly identical computational loop is repeated many times to obtain a result, account for a major fraction of the execution time in scientific computations. Scheduling iterative computations for parallel processors is, therefore, of considerable importance. Subject to data dependencies between them, successive iterations can be scheduled in any manner that does not result in conflict for the use of the resources. One way of overlapping iterations is to use identical schedule for each, and to initiate successive iterations spaced by a fixed interval, referred to as the initiation interval.
The minimum initiation interval is the smallest initiation interval for which a schedule without conflicts can be formulated. Use of the minimum initiation interval results in optimum utilization of the computer resources. An optimum schedule can be arrived at only if the minimum initiation interval is known. This interval will depend in part on the usage that is made of scratch-pad memories, since these are also computer resources. But the scratchpad usage can only be determined when a schedule is known. In practice, the programmer selects a likely candidate for the minimum initiation interval and tries to formulate a schedule with no conflicts for the usage of resources. If a schedule cannot be found, the programmer may either keep trying, or may increase the estimated minimum initiation interval and try again, bearing in mind that a schedule using the increased interval will not be as efficient.
An important consequence of these scheduling difficulties is the lack of availability of an efficient compiler for horizontal machines. A compiler is a computer program whose only function is to translate a program written in a higher-level programming language into instructions in "machine language" for execution by a particular machine. The higher-level language is designed to be easily understood by scientists or engineers who might use it, and it requires no detailed knowledge of the machine that will be used ultimately to execute the program. In the case of horizontal machines, a compiler should be capable of performing the function of scheduling the activities of the computer resources for optimum parallel utilization. Since the scheduling task, as already discussed, has been accomplished only by trial-and-error methods, an efficient compiler to achieve the same result has proved to be an elusive goal.
Because there has been no efficient compiler available for high-speed parallel processing machines, these machines have for the most part been grossly under-utilized. Faced with the task of programming a horizontal machine for a specific complex task, a person responsible for such a project has had to choose between very high programming costs or very low program efficiency. As discussed above, the scheduling function associated with machine-language programming of horizontal machines is tedious and time-consuming. If the task is limited in time, or if an inefficient compiler is used, the resulting program will not make efficient use of the machine's parallelism, and the relatively high cost of the machine will not be justifiable. On the other hand, a large programming expense for schedule optimization may not be justifiable either.
A compromise solution proposed by manufacturers of horizontal machines is to provide software subroutines or modules for performing commonly encountered computations, such as the fast Fourier transform (FFT) and various vector manipulations. Each module is written in highly efficient machine code, to take maximum advantage of the horizontal architecture. However, if different computations are needed, for which no standard modules are available, the machine reverts to an inefficient mode of operation, unless time is spent programming all of the computations for optimum execution. Moreover, even when the relatively efficient software modules are used, different applications will utilize the modules in different combinations and mixes, and the efficiency may be diminished to some degree.
It will be appreciated from the foregoing that there has been an important need in the field of horizontal computer for an improvement that facilitates scheduling of the computer resources, and thereby permits the production of a compiler for the inexpensive generation of highly efficient machine-language programs. The present invention fulfills this need.