1. Field of the Invention
The present invention relates to a vector processing device.
2. Description of the Prior Art
A conventional vector processing device processes list vector instructions in IF statements using masked vector operations and vector compress and vector expand instructions, because it is not provided with any means for masked list vector processing.
FIG. 7 illustrates the vector compress instruction. The vector data A (a1, a2, a3, a4 and a5) are checked against the mask bits M (1, 0, 1, 0, 1) indicated at the vector control register. From the data A, the elements at the positions where the mask bit is "1" are picked up. Such picked up elements replaces the elements in the vector data B (b1, b2, b3, b4, b5) from the first end. For the remaining positions of the vector register, the original elements of the vector data B1 are left. Thus, the data B' (a1, a3, a5, b4, b5) are generated. Such vector compress instruction is hereinafter referred to as a VCP instruction.
FIG. 8 illustrate the vector expand instruction. The elements in the vector data B (b1, b2, b3, b4, b5) are replaced or left depending on the mask bit M (1, 0, 1, 0, 1) at the vector control register. For the positions where the mask bit is "1", the applicable element in the vector data A (a1, a2, a3, a4, a5) replaces the element in the data B; when it is "0", the corresponding element in the data B is left as it is. Thus, the data B1 (a1, b2, a2, b4 and a3) is generated. Such vector expand instruction is hereinafter referred to as a VEX instruction.
FIG. 9 shows a configuration example of such a conventional vector processing device. In the figure, vector registers 92-1 to 92-8 and a vector control register 93 are connected to operation devices 94-1 to 94-4 via a crossbar 95. The results obtained at the operation devices 94-1 to 94-4 can be stored in any of the vector registers 92-1 to 92-8. In masked operation, the mask data stored at the vector control register 93 controls whether or not to store an operation result to a vector register. More specifically, the operation result is not stored to the element for which the mask data in the vector control register 93 is "0" and is stored to the element for which the mask data is "1". The vector registers 92-1 to 92-8 send data to and receive data from a main storage 97 via a main storage controller 96.
Suppose a conditional expression (IF statement) as follows:
DO 10 I=1, N
IF (M(I).EQ.O)
THEN
X(I)=A(B(I))+C(D(I))
ELSE
X(I)=1
10 CONTINUE
Such an expression may be generally processed by masked operations for all vector elements (hereinafter referred to as VL) or by utilizing VCP and VEX instructions while limiting the processed elements to those for which the condition is true.
Referring first to the flowchart of FIG. 10, the method to subject all vector elements (VL) to the masked operation will be described below with showing the register contents.
Firstly, a comparison instruction causes comparison of M(I) and O. For a mask generation instruction, the comparison condition can be specified in the operation code. This instruction stores "1" when the condition is true and "0" when it is not true to the vector control register 93. It is supposed here that the mask data are (1, 1, 0, 1, 0, 1, 1, 0, . . . ). Then, B(I) (b1, b2, b3, b4, . . . ) and D(I) (d1, d2, d3, d4 . . . . ) are loaded from the main storage 97 to the vector registers 92-1 and 92-2.
The data in the vector registers 92-1 and 92-2 are read and sent to the main storage controller 96 and then the main storage 97 is accessed with using B(I) and D(I) as the address data. Via the main storage controller 96, the data A(B(I))(a1, a2, a3, a4 . . . ) and C(D(I))(c1, c2, c3, c4 . . . ) are loaded to the vector registers 92-3 and 92-4. Such processing where the main storage is accessed using the vector data as addresses for vector data loading is called the list vector loading.
The data in the vector registers 92-3 and 92-4 are read out and input to the operation device 94-1 for processing. Among the results, only those for the elements having "1" as the mask data are stored to the vector register 92-5.
Next, a mask reverse instruction is issued so as to reverse the bits in the vector control register 93 (M' (0, 0, 1, 0, 1, 0, 0, 1 . . . )). The elements in the vector register 92-5 corresponding to "1" in such mask data are replaced with "1". Then, the contents of the vector register 92-5 are stored to the main storage 97.
This method is advantageous because it eliminates the need of auxiliary operations using VCP and VEX instruction. However, it has a drawback that all vector elements must be processed even when the rate of "true" is low.
Next, referring to the flowchart of FIG. 11, the other method where only the elements for which the condition is true are processed by utilizing VCP and VEX instructions, while showing the register contents.
Firstly, a comparison instruction causes comparison of M(I) and O, and a mask generation instruction stores the mask data to the vector control register 93. The mask data are supposed here to be (1, 1, 0, 1, 0, 1, 1, 0, . . . . ). The data B(I) and D(I) are loaded from the main storage 97 to the vector registers 92-1 and 92-2. A VCP instruction compresses the data B(I) and D(I) in the vector registers 92-1 and 92-2 according to the contents in the vector control register 93 so as to generate the data B'(I) (b1, b2, b4, b6, b7 . . . ) and D'(I) (c1, c2, c4, c6, c7 . . . ), which are stored to the vector registers 92-3 and 92-4.
Here, the bits having "1" in the data at the vector control register 93 are counted (PCNT instruction), and the counted value is used as the vector processing element number (VL': VL'&lt;VL). Thereafter, VL' is used for processing until VL is reset. Then, the data in the vector registers 92-3 and 92-4 are read out and sent to the main storage controller 96 and the main storage 97 is accessed with using B'(I) and D'(I) as addresses. Via the main storage controller 96, the data A(B'(I)(a1, a2, a4, a6, a7) and C(D'(I)(c1, c2, c4, c6, c7 . . . ) are loaded to the vector registers 92-5 and 92-6. The data in the vector registers 92-5 and 92-6 are read out and input to the operation device (adder) 94-1 for processing. The operation results are stored to the vector register 92-7. Then, the VL is reset to the original value and the data in the vector register 92-7 are expanded by a VEX instruction, and the results are stored to the vector register 92-8.
Next, a mask reverse instruction is issued so as to reverse the bits in the vector control register 93 (M' (0, 0, 1, 0, 1, 0, 0, 1 . . . )). The positions in the vector register 92-8 corresponding to "1" in such mask data are replaced with "1". Then, the contents in the vector register 92-8 are stored to the main storage 97.
This method requires shorter operation time when the rate of "true" is low because processed elements are limited to those for which the condition is "true" in the IF statement. It requires, however, auxiliary operations such as VCP, VEX and PCNT instructions, which take much time when the rate of "true" is high.
Suppose now a conditional expression as follows:
DO 10 I=1, N
IF (M(I). EQ. O) GO TO 10
X (C(I))=A(I)+B(I)
10 CONTINUE
Conventionally, such an expression is processed by utilizing VCP and VEX instructions while limiting the processed elements to those for which the condition is true. Referring to the flowchart of FIG. 12, this method is described below while showing the register contents.
Firstly, a comparison instruction causes comparison of M(I) and O, and a mask generation instruction stores the mask data to the vector control register 93. The mask data are supposed here to be (1, 1, 0, 1, 0, 1, 1, 0, . . . . ). The data A(I), B(I) and C(I) are loaded from the main storage 97 to the vector registers 92-1, 92-2 and 92-3. The data in the vector registers 92-1 and 92-2 are read out and input to the operation device (adder) 94-1 for processing. The results are stored to the vector register 92-4, only for the positions for which the mask data is "1".
Then a VCP instruction compresses the addition results (store data) in the vector register 92-4 and and the address data C (I) in the vector register 92-3 according to the contents in the vector control register 93 and stores the results to the vector registers 92-5 and 92-6. Here, the bits having "1" in the data at the vector control register 93 are counted (PCNT instruction), and the counted value is used as the vector processing element number (VL'). Thereafter, VL' is used for processing until VL is reset. Then, the store data stored in the vector register 92-5 and the address data (C'(I)) stored in the vector register 92-6 are read out and sent to the main storage controller 96. The store data are stored to the vector storage 97 using the data C'(I) as the addresses. An advantage of this method is that the vector elements to be processed are limited to those for which the condition is true and no other elements are processed. However, the need of auxiliary operations such as VCP and PCNT instructions results in a lengthy operation.
As described above, a conventional vector processing device has drawbacks concerning processing of list vectors in conditional expressions. When it processes all vector elements by masked operation, it eliminates the need of auxiliary operations, but has to perform many unnecessary operations because all vector elements are processed even when the rate of "true" is low.
When the vector processing device adopts the other method where the true elements only are processed with using VCP and VEX instructions, the operation time becomes shorter when the rate of "true" is low because the true vector elements only are processed, but it requires auxiliary operations such as VCP and VEX instructions.
Besides, in order to take the maximum advantage of a vector processing device, it is necessary to adopt the most suitable method for the program. For this purpose, a conventional device requires a compiler which examines the rate of "true" and other information for the conditional expression in the program so as to select the suitable one from the two methods above.