The present invention pertains to a computer system, and more particularly, to a parallel vector processor in said computer system for rapidly processing a pair of vectors and storing the results of said processing.
A typical vector processor, such as the vector processor shown in FIG. 1, includes a plurality of vector registers, each vector register storing a vector. The vector comprises a plurality of vector elements. A pipeline processing unit is connected to a selector associated with the vector registers for receiving corresponding elements of a first vector from a first vector register and utilizing the corresponding elements to perform an arithmetic operation on the corresponding elements of a second vector stored in a second vector register. The results of the arithmetic operation are stored in corresponding locations of one of the vector registers, or in corresponding locations of a third vector register.
However, with this configuration, it is necessary to perform operations on each of the corresponding elements of the vectors in sequence. If the vectors include 128 elements, 128 operations must be performed in sequence. The time required to complete operations on all 128 elements of the vector is a function of the cycle time per operation of the pipeline unit as is operates on each of the corresponding elements.
As a result of increasing sophistication of computer systems, there is a need to increase the performance of the vector processor portion of the computer system by decreasing the time required to process or perform arithmetic operations on each of the corresponding elements of a plurality of vectors stored in the vector registers within the computer system.