In early computers, programs were executed in a digital computer one at a time and each program had to be completed before a new one could be started. The computer, under control of the program, would read its input, compute, and then generate outputs. This obviously was an inefficient usage of computer resources since, while the input and output operations were being performed, the central processing unit (CPU) was not computing and, conversely, while the central processing unit was computing, the input/output capabilities of the computer were idle.
This led to development of independent peripheral device controllers, or peripheral processors, for handling input/output operations independently from the central processing unit. With the use of such devices, one program could be causing an input operation to be executed while a second program was causing computing, and a third program causing output to be generated. An example of peripheral processing is shown, for example, in "Design Of a Computer The Control Data 6600" by J. E. Thornton, Scott, Foresman & Company (1970), pages 141-154.
As technology evolved and computers became faster, the utilization of other resources became the limiting factor. These resources included the adder, multiplier, and other function units which make up the central processing unit. Program instructions could be fetched, or pulled, from the memory unit much faster than the function units (such as an adder, etc.) could execute them.
This then led to development of various techniques and/or systems for increasing the utilization of the function units. Of these various techniques and/or systems, one of the more useful is known as pipelining. The basic idea of pipelining is to provide a set of registers which hold data with the registers being connected with pipe inputs and outputs. A new data word is placed in the pipe each clock time. After a certain number of clock times (for example, six), the data word moves through the pipe and comes out of the other end.
This technique has been heretofore employed to speed up function units, such as an adder. If, for example, an adder requires 600 ns to add two numbers, then, once an addition is started, the adder cannot be used to add a new set of numbers until it has completed the current addition. Thus, its time rate of doing work is 600 ns/addition. If registers are placed at strategic locations inside the adder to catch intermediate results, this is pipelining. A second pair of numbers can now be fed to the adder as soon as the intermediate result of the first pair is stored in the first register. For example, if six registers are used, the input/output time rate of the adder is 100 ns/addition even though each addition still requires 600 ns.
The principal difficulty heretofore encountered with pipelining is what is called the precedence problem. This problem, although quite complex, can be illustrated with a simple example. Suppose the first instruction in a program adds 1 to a variable A, and the second instruction adds the result to B. It is evident that the execution of the second instruction cannot be started until completion of the first. Because the second instruction follows immediately after the first, pipelining of the adder does nothing to increase the execution speed. The addition of A and 1 must move all the way through the adder pipe before the second addition can be started.
Another technique heretofore developed for speeding up central processing unit execution is overlap. Overlap is an attempt to execute several instructions within a program simultaneously. For example, if an add instruction is followed by multiply instruction, then the central processing unit will try to execute both simultaneously. Although overlap leads to an increase of execution speed, it still is limited in applicability by the precedence problem.
A recent and well publicized technique which purports to solve the precedence problem is the use of array processing. An array processor executes instructions which specify whole vectors of operands which are to be operated upon by pipelined function units. This technique, however, is applicable only to a certain class of problems (namely those which are vector-oriented), and is not particularly effective for most applications.
The simplest type of processor discussed hereinbefore is called a single instruction, single data stream (SISD) processor, i.e., a processor wherein the central processing unit performs one instruction at a time on a single job (data stream). Other types of processors have also been heretofore suggested and/or developed. A processor which performs a single instruction on several data streams simultaneously is called a single instruction, multiple data stream (SIMD) processor, while a processor which performs multiple instructions on multiple data streams is called a multiple instruction, multiple data stream (MIMD) processor. Such processors are discussed, for example, in "Some Computer Organizations And Their Effectiveness" by Michael J. Flynn, IEEE Transactions on Computers, Vol. C-21, No. 9, September, 1972.
A data flow processor has also been heretofore suggested for parallel processing whenever sections of the processor are connected by interconnection networks. Such a processor is shown, for example, in "Performance Analysis of a Data-Flow Processor", by David P. Misunas in proceedings of the 1976 International Conference on Parallel Processing (1976), pages 100-105, along with the references cited therein.
In a multiple instruction, multiple data stream (MIMD) processor, several data streams are processed simultaneously and independently by instructions which are also independent (in contrast with a SIMD processor). This type of processor can be implemented either with separate central processing units, one for each data stream, or by the use of one central processing unit which is, in effect, multiplexed among the several data streams.
The main problem with the use of separate control processing units is that the cost is sizable and each of them is still subject to the precedence problem, as are all SISD processors. A practical approach to the implementation of the MIMD processor is therefore to use a central processing unit which is multiplexed among the several data streams in a way that it does not suffer from precedence constraints.