The present invention pertains to the field of high speed digital data processors, and more particularly to computing machines adapted for vector processing.
Most prior art computers are organized with an arithmetic unit which can communicate with a memory and with input-output (I/O). To perform a given arithmetic function, each of the operands must be successively brought to the arithmetic unit from memory, the function must be performed, and the result must be returned to the memory. This process typically takes a great deal of time, since a number of clock periods are required for transmission between memory and the arithmetic unit, plus several clock periods for performing the operation. Machines operating with this type of organization are called scalar machines.
There are many circumstances in problem solving with computers where it is necessary to perform the same operation repetitively on each successive element of a set of data. In scalar machines, in order to perform this type of repetitive operation, it is necessary to operate by software program and indexing so as to successively bring each operand to the arithmetic unit and to return the result to memory. This type of task is very time consuming because of the delays associated with the memory-arithmetic unit transfers for each operation on the individual elements of the ordered set.
In order to provide more rapid processing when dealing with ordered arrays of data, vector machines have been developed. As used in the data processing field, the term vector refers to an ordered set or array of data. An illustrative vector operation performed by a vector machine would be the adding of two operand vectors to perform a result vector. Each element of the result vector in this example would be the sum of the correpondingly ordered operands of the operand vectors. Basically, a vector machine is one which deals with these ordered sets of data by virtue of its hardware organization, rather than by a software program and indexing, thus attaining a higher speed of operation.
One prior art vector machine is known as the Illiac, which attempts to provide full parallel operation on arrays of data. A network of 256 arithmetic unit processors are provided, and entire vectors up to 256 elements long are transmitted in parallel to the arithmetic unit processors. Results are transmitted in parallel back to memory. One obvious problem with this parallel approach to vector processing is the extremely high cost of providing a great number of independent arithmetic unit processors. Another problem is the inherent reliability problems with synchronizing all elements of such a large network.
Another type of vector machine makes use of four programmable arithmetic units, each having high speed paths termed "pipes" to bring operands to the arithmetic unit and to return results to memory. In operation, the vector is read through the pipes under hardware control to supply a stream of data to an arithmetic unit. One type of streaming machine employing pipes has been developed by Texas Instruments.
Another type of streaming system is represented by the Control Data Corporation Star machine. In this machine, one large stream of data, called a "sword" is provided to an arithmetic unit, rather than four smaller streams for pipes.
All streaming types of vector machines operate memory-to-memory. As such, they all suffer from the start-up time lag which is inherent in the memory to arithmetic unit path. It may take from 10 to 100 clock periods from initial instruction before the first pair of operands begin to arrive through the pipe or sword to the arithmetic unit. In other words, it takes a considerable period of time to fill the pipe with the stream of data. However once the pipe is filled and the data begins arriving, streaming type vector machines will give successive answers at a high rate of speed, much faster than scalar machines. Streaming type vector machines are thus put to their most efficient use when operating on very long vectors containing a great number of individual elements.
The length of vectors associated with the solution of a given problem is of course determined by the nature of the problem. Vector lengths can vary from two elements on up, with no theoretic upper limit. From actual experience in working with vector processing, it has been found that the most usually encountered vector problems involve rather short vectors of approximately 5 to 10 elements. With short vectors, it is generally not worth the start-up time required to operate in vector processing mode on prior art streaming type vector machines. As a result, the short vectors routinely encountered in numerous scientific calculations are generally still performed in the time inefficient scalar mode.
The present invention provides a vector processing machine which is uniquely adapted for efficient processing of short and moderate length vectors. Longer vectors can also be accommodated by processing in groups under program control.