The present invention relates to a digital computer (which will be called a "vector Processor") for executing arithmetic or logical operations of vectors at high speeds.
There has been devised a vector processor with vector registers for high-speed processings of large-scale matrix computations which frequently appear in scientific and technical computations (as is disclosed in U.S. Pat. No. 4,128,880). This vector processor can add respective components of vector data A and B to obtain a resultant vector C at a high speed.
The vector processor S-810 developed by the present assignee is able to execute arithmetic or logical operations of the front halves L(1,i) (i=1 to N) (of 4-byte length) and rear halves L(2,i) (i=1 to N) (of 4-byte length) of the respective components (of 8-byte length) of vector data A on a main storage, as shown in FIG. 1A. In this case, vector data (which will be called "vector data B") composed of the front halves L(1,i) and vector data (which will be called "vector data C") composed of the rear halves L(2,i) of the respective components are loaded, as shown in FIG. 1B, in vector registers VR#i and VR#j by a later-described method and have their respective components subjected to pipe-line arithmetic or logical operations. Vector data (which will be called "vector data K") composed of resultant data K(i) (i=1 to N) is stored in a vector register VR#k.
At this time, it is necessary to store the vector components L(1,i) in the front 4-byte portions of the respective component storing regions of the vector register VR#i and the vector components L(2,i) in the front 4-byte portions of the respective component storing regions of the vector register VR#j. In FIG. 1B, the slashed columns of the respective component storing regions of the vector registers VR#i, VR#j and VR#k designate where those region data is not directly used in the aforementioned arithmetic or logical operations.
One method of loading the vector registers with the vector data B and C, as shown in FIG. 2, is by a loading instruction to write the vector data A as it is in the vector register VR#i and a loading instruction to read out the vector data A from the main storage, to shift each component leftward by the 4-byte length by means of a shift circuit (although not shown in the drawing) and then to write (i.e., loading with shifting) the shifted component in the vector register VR#j. Even if the aforementioned two loading instructions are executed simultaneously in that case, the read-out of the vector data B and C from the main storage is delayed because the common vector data A is accessed by those instructions.
Incidentally, in FIG. 2 it is not possible to selectively read out the components L(2,1), L(2,2), . . . , and L(2,N) alone from the main storage so as to load the vector data C. This is because the main storage can access only the data of 8-byte length, which starts from address positions made discrete by the 8-byte length. For loading the vector data C, therefore, it is necessary to read out both the components L(1,i) and L(2,i). This requirement creates the above-specified problem.
On the other hand, a second method of loading the vector registers VR#i and VR#j with the vector data B and C, as shown in FIG. 3, is by a double loading instruction to load the vector registers VR#i and VR#l simultaneously with the vector data A and then an instruction to shift the data in the vector register VR#l leftward by 4 bytes by a shift operator (although not shown in the drawing) and to store the shifted data in the vector register VR#j. Since it is necessary in this case to execute both the double loading instruction and the shifting instruction, the processing time is elongated as a whole.