The present invention relates to a vector processing apparatus for performing vector operations for scientific and technological computation.
In parallel with ever extending application of computers, the demand for high-speed data processing by computers is increasing. Today, ultrahigh-speed computers, or supercomputers, are under extensive development as an implementation which successfully meets such a demand. In a supercomputer, an enormous amount of data to be processed is considered to be a mass of vector data, i.e., ordered one-dimensional data (vector data), and they are processed at high speed by a vector processing apparatus operating based on a pipeline technique as described in the publication "The Architecture of Pipelined Computers", pp. 1-5, published in 1981 by McGraw-Hill Book Company. For the above-described type of vector processing apparatus, a reference can be made to the U.S. Pat. No. 4,128,880. The disclosed apparatus comprises a plurality of vector registers for individually holding an ordered set of data elements, an output selection circuit for delivering the successive data elements designated by a vector instruction in response to clock, a vector operation unit associated with the pipelined vector instruction for processing the data element, and an input selection circuit for writing the result of the data processing from the output of the vector operation unit into the vector register which is designated by the vector instruction.
Where such a prior art vector processing apparatus is used to compute, for example, an inner product which often appears in fluid equations and other scientific and technological computation, i.e., an operation wherein a successive result of multiplication is added to its preceding result of multiplication, intermediate results of operations are written into a common vector register. It follows consequently that in the case of addition of intermediate results, the operation is performed on a common vector register in contrast to the plurality of registers of the pipeline system, thereby lowering the processing rate.
In more detail, the U.S. Pat. No. 4,128,880 describes by way of example cumulative summation of a vector of sixty-four elements in FIG. 9 and col. 17, lines 1-39. The summation is such that elements stored in a vector register V1 are added element by element to an initial value "0", the result being stored in a vector register V2. The stored result of summation is fed again to the operation unit as an operand. In this proposed method, assuming that n clock pulses are required for a cycle which begins with reading a certain operand and ends with using it via the operation unit as the next operand, the number of results eventually obtained is (total number of elements)/n, summation at intervals of n. In this particular example, since n=8, the vector summation of sixty-four elements yields eight results of summation at the intervals of eight. In order that a cumulative result of sixty-four elements may be obtained on the basis of the eight intermediate results, (8-1) scalar operations are required in each of which a result of the operation is employed as an operand for the next operation. The problem with the scalar operations is that they consume a far longer period of time than vector operations.