1. Field of the Invention
The present invention relates to a vector-processing-oriented digital computer for executing high-speed vector operations, referred to as a vector processor hereinafter, and particularly to the construction of a vector register included in the vector processor.
2. Description of the Prior Art
Conventionally, various vector processors have been proposed for performing high-speed data processing, for example, calculation of a matrix having a great number of elements which often appears in scientific data processing. One such vector processor has vector registers for improving the operational data transfer performance so that a plurality of pipeline-type arithmetic logic units included in the vector processor are effectively operated concurrently at a high speed.
FIG. 1 illustrates a general block diagram of a vector processor comprising the type of vector registers explained above.
According to FIG. 1, a plurality of vector registers 1 (VR1 to VRn) are capable of storing a sequence of element data items, for example, each vector register can store 64 elements, each element consisting of eight bytes. Vector elements necessary for an operation are sequentially fetched from the main storage (MS) 5 through fetch data lines 10. Each vector element is distributed by a selector circuit 2 and is written through a write data line 6 in a vector register 1 having the number specified by a vector instruction. Afterwards, each vector element is sequentially read from a proper vector register via a data line 7 and is delivered through a selector circuit 3 to be input as an operand to a desired arithmetic logic unit 4 via an operand line 8. The operation result output from an arithmetic logic unit 4 is fed to the selector circuit 2 via an operation result line 9 and is sequentially written in the specified vector register 1 through the data write line 6. Each arithmetic logic unit 4 is a pipeline-type arithmetic logic unit independent of the other units, for example, it is a floating-point number adder, a floating-point number multiplier, or the like. The final resultant vector obtained by repeating the data transfer between arithmetic logic units 4 and vector registers 1 is delivered from vector registers 1 to a selector circuit 3, then it is sequentially stored in the main storage 5 through the write data line 11.
In FIG. 1, reference numeral 13 indicates a timing generator circuit for allowing a vector register (VR) 1 and a pipeline arithmetic logic unit 4 to operate at the same operating speed. Moreover, the operating speed of the main storage 5 is set to be equal to that of both the vector register (VR) 1 and the pipeline arithmetic logic unit 4 using another timing generator circuit (not shown).
Reference numeral 14 is a vector operation control section for controlling operations of the selector circuit 2, the selector circuit 3, the vector register (VR) 1, and the pipeline arithmetic logic unit 4 according to a vector instruction which has been read from the main storage 5.
Features of a vector processor having vector registers like those depicted in FIG. 1 will be explained in conjunction with a simple vector operation example. The following FORTRAN statements will be discussed assuming that the number of vector elements to be operated on in the pertinent vector operation is L. EQU DO 1.phi.I=1, L EQU 10Y(I)=A(I)+B(I)*C(I)
This processing is expressed as follows by use of vector instructions for each element.
1. Vector Load VR "0" A PA1 2. Vector Load VR "1" B PA1 3. Vector Load VR "2" C PA1 4. Vector Multiply VR "3" VR "1"*VR "2" PA1 5. Vector Add VR "4" VR "0"+ VR "3" PA1 6. Vector Store VR "4" Y
Where, VR stands for vector register. Each vector instruction is executed to perform an operation and a data transfer repeatedly L times, that is, for each of all L elements.
In general, the number of data transfer operations with the main storage is substantially reduced in a vector processor having vector registers by temporarily storing vectors obtained as intermediate results after a vector operation in the vector registers and by storing only the final resultant vector in the main storage; therefore, a data transfer performance necessary for an operation can be guaranteed by providing vector registers allowing high-speed read and write operations even if a main storage has a lower access speed as compared with that of vector register operations.
Next, the vector instructions 4 and 5 above will be examined precisely. VR "3" for storing the multiplication result of the instruction 4 is used for reading the operand of the following instruction 5 for adding vectors. If operations are controlled so that the vector addition instruction 5 is initiated only after the results of all elements (L in number) are written in VR "3", concurrent operations of a plurality of arithmetic logic units cannot be efficiently utilized, thus a considerable processing time is necessary. As explained above, the succeeding vector instruction must be set in a wait state before it reads the operation result of the preceding instruction or the VR for storing the fetched data as its operand. This waiting relationship also resides between the vector instruction 4 and vector instruction 2 or 3, between the vector instructions 1 and 5, and between 5 and 6, respectively. A chaining theory is adopted to solve this problem of the waiting relationship. The chaining theory is applied to the chaining operation as follows: When a data item read from the main storage or an operation result obtained by a vector instruction is written in a vector register, the written data is transferred to the main storage or is transferred to an arithmetic logic unit as an operand of the succeeding vector instruction immediately after the write operation. This chaining feature enables a plurality of arithmetic logic units to effectively operate even in a calculation of a polynomial-type vector, thereby improving the simultaneous operation and realizing a high-speed processing.
As is clear from the foregoing explanation, the chaining feature is adopted as a method to speed up execution of vector instructions by use of the relationship between the two continuous vector instructions. Whether the chaining feature can be satisfactorily carried out or not depends on the read/write operation performance of the vector registers.
The vector processing and the chaining are taught in the literature of Richard M. Russel "The Cray-1 Computer System" in "Communications of the ACM" 1978, Jan. Vol. 21, No. 1, pp. 63-72.