The present invention relates to a vector process oriented digital electronic computer adapted for carrying out vector operations at a high speed (hereinafter referred to as a vector processor).
A vector processor is a processor capable of high speed operation on a plurality of elements of an ordered set of data (which is called vector data or simply a vector) and another vector stored in a main storage.
FIG. 1 diagrammatically shows an operation for summing vectors B and C to produce a vector A. In the illustrated example, corresponding elements b.sub.ij and c.sub.ij of the vectors B and C are summed to obtain an element a.sub.ij of the vector A. In general, the vector processor can carryout, in addition to the operation for the vector data, a process necessary for the preparation of the vector operation (e.g. calculation of a start address of the vector), which is an ordinary scalar operation, and an input/output process. Typical examples of such a vector processor are STAR-100 of CDC and CRAY-1 of Cray Research Inc.
When a FORTRAN program having doubly nested DO loops is to be executed by the vector operation, only the inner DO loop is processed in the vector operation form and the outer loop is usually processed in a software loop.
As a result, in the prior art vector processor, the vector processing and the scalar processing are sequentially carried out. For example, an operation shown in FIG. 2 is processed as shown in FIG. 3. In FIG. 3, symbols V and S shown at the respective steps represent vector processing and scalar processing, respectively. The respective steps are explained below.
Step 1: Calculate a length of a vector to be processed. (Scalar processing). In the present example, the length is 100 as determined by a range of J.
Step 2: Set the vector length into a vector length register (VLR) by a SETVL instruction (Scalar processing)
Step 3: Calculate a start vector address of the vector B (in the present example, the address of the element B (1, 1)). (Scalar processing)
Step 4: Set the start vector address into a vector address register (VAR) by a SETVAR instruction (scalar processing).
Step 5: Calculate the address increment between the vector elements of the vector B. (In the present example, the increment is 3 because the vector elements are arranged at an interval of two addresses.) (Scalar processing)
Step 6: Set the increment into a vector address increment register (VAIR) by a SETVAIR instruction (Scalar processing)
Step 7: Fetch the vector B from the main storage by a LOADVR instruction by referring the contents, namely the address value in the registers VAR, VAIR and the vector length in the VLR and load it into a #0 vector register (VR0). (Vector processing)
Steps 8-11: Carry out steps similar to the steps 3-6 for the vector C. (Scalar processing)
Step 12: Fetch the vector C from the main storage by the LOADVR instruction by referring the contents of the registers VAR, VAIR and VLR and load it into a #1 vector register (VR1). (Vector processing)
Step 13: Add the elements of the vectors B and C stored in the vector registers VR0 and VR1, respectively, by an ADDVR instruction for the number of elements corresponding to the vector length specified by the vector length register VLR and store the resulting sum into a #2 vector register (VR2). (Vector processing)
Steps 14-17: Carry out steps similar to the steps 3-6 for the vector A. (Scalar processing)
Step 18: Store the content of the vector register VR2 by a STOREUR instruction by referring the contents of the registers VAR, VAIR and VLR. (Vector processing)
Step 19: Increment an Index I by one by an INCREMENT instruction, compare the incremented index I with 100 by a COMPARE instruction, and return to the step 3 if I is not larger than 100 by a BRANCH instruction. (Scalar processing)
Thereafter, the steps 3-19 are repeated until I reaches 100. The start vector address is changed from the initial address B (1, 1) to the addresses B (2, 1), B (3, 1), . . . for each repetition. The same is true for the vectors C and A.
In the present example, the scalar processing time occupies approximately 10% of a total processing time. The ratio depends on the number of vector elements to be processed. Usually, the number of vector elements processed ranges from 10 to 1000 or more. In actual practice, an upper limit on the number of vector elements which can be continuously processed in one vector processing is equal to the number of vector elements which can be retained in one vector register (i.e. vector register length), and hence if the number of vector elements to be processed is larger than the vector register length, the vector processing must be carried out in a plurality of cycles. For example, if the vector register length is 64 and the number of vector elements is 100, the vector processing is completed when the processing of FIG. 2 is repeated twice (64 vector elements in the first cycle and 36 vector elements in the second cycle). In this case, the ratio (overhead) of the scalar processing time to the total processing time amounts to as much as 20%. In general, with the presently available computer hardware technology, it is not always easy to increase the processing speed of the scalar processing or the vector processing. This imposes a significant limitation in increasing the speed of the vector operation. Thus, with the prior art it is difficult to increase the speed of the vector operation.