1. Field of the Invention
The present invention relates to a vector processing device that processes a large quantity of vector data by one instruction, and more particularly to a serializing control of instruction execution using a post instruction and a wait instruction in a vector processing device in which a data buffer is provided between an instruction executing unit and a main memory unit.
A vector processing device has a plurality of operation pipelines, and performs a complex process by means of the pipelines in order to process a large quantity of data at a high speed. Generally, the vector processing device includes an instruction executing unit including a vector unit and a scalar unit, a main memory unit, and a memory control unit. The vector unit includes a vector register, and operation pipelines for an addition/logic operation, multiplication and division. Data to be operated (operation data) is read from the main memory unit and is stored in a data buffer provided in the memory control unit. Thereafter, the read operation data is loaded into the vector register in the vector unit. The operation data is operated by means of the operation pipelines in accordance with a vector instruction. The operation result thus obtained is transferred to and stored in the data buffer in the memory control unit, and is then stored in the main memory unit.
In some cases, a system as mentioned above employs a preceding control and parallel execution between a scalar instruction and a vector instruction and/or between vector instructions. In this case, the order of referring to the main memory unit may not be ensured. Hence, in order to ensure the order of referring to the main memory unit, it is necessary to serialize vector instructions.
The serializing of vector instructions is to terminate a main memory operand access generated during execution of a first instruction before a main memory operand access of a second instruction to be executed after the first instruction is generated. The serializing of vector instructions is important to main memory operand accesses between vector instructions or between a vector instruction and a scalar instruction, and is not need to be considered between scalar instructions because the order of referring to the main memory unit between scalar instructions is ensured. In order to satisfy the requirements of speeding up in recent computer systems for scientific technology, the present invention is intended to reduce overhead caused by the serialization of vector instructions.
2. Description of the Prior Art
FIG. 1 is a block diagram of a conventional vector processing device. The vector processing device shown in FIG. 1 is made up of an instruction executing unit 100, a memory control unit (MCU) 200 and a main memory unit (MSU) 300. The instruction executing unit 100 includes a vector unit VU and a scalar unit SU. The vector unit VU performs a plurality of elements by one instruction, and the scalar unit SU performs one element by one instruction. The vector unit VU includes a vector instruction execution control unit VI, an operation pipeline unit VE and a vector memory control unit VS. Similarly, the scalar unit SU includes a scalar instruction execution control unit I, a scalar operation unit E and an access unit S.
An address and store data from the scalar unit SU are sent to the main memory unit 300 via the memory control unit 200. An instruction and load data are sent to the scalar unit SU via the memory control unit 200. Similarly, an address and store data from the vector unit VU are sent to the main memory unit 300 via the memory control unit 200, and load data from the main memory unit 300 is sent to the vector unit VU via the memory control unit 200.
FIG. 2 is a block diagram of the details of the vector processing device shown in FIG. 1. The access pipeline part VS of the vector unit VU includes a vector register 12, a mask register 14, a controller 16 and data transfer pipelines 18 and 20. The operation pipeline part VE includes add/logic operation pipeline ADD, a multiplication pipeline MLT, a division pipeline DIV, and a mask pipeline MSK. The vector instruction execution control part VI shown in FIG. 1 is not shown in FIG. 2 for the sake of simplicity. The scalar operation unit E of the scalar unit SU includes a scalar operation unit 22, and the access part S includes a scalar register 24 and a buffer 26. The scalar instruction execution control part I shown in FIG. 1 is omitted for the sake of simplicity. The memory control unit 200 includes a data buffer 30 and a controller 32.
The vector processing device shown in FIG. 2 operates as follows. By way of example, execution of a vector addition instruction will now be described. The vector addition instruction VADD is executed as follows:
VLOAD VR1 PA1 VLOAD VR2 PA1 VADD VR1, VR2, VR3 PA1 VSTORE VR3. PA1 (a) Ensure a vector load instruction proceeding to a post instruction PA1 (b) Ensure a vector store instruction proceeding to a post instruction PA1 (a) Ensure a scalar load instruction proceeding to a post instruction PA1 (b) Ensure a scalar store instruction proceeding to a post instruction PA1 (a) Ensure a vector load instruction proceeding to a post instruction PA1 (b) Ensure a vector store instruction proceeding to a post instruction
At the commencement of execution of the vector addition instruction, the first vector load instruction VLOAD is executed, and hence data is loaded from the main memory unit 300 into a register VR1 formed in the vector register 12 via the data buffer 30 of the memory control unit 200 and the load pipeline 18 of the vector unit VU. Next, the second vector load instruction VLOAD is executed, and hence data is loaded from the main memory unit 300 to a register VR2 in the vector register 12 via the same route. Then, using the addition/logic pipeline ADD, the vector addition instruction VADD is executed, and hence the data stored in the register VR1 and the data stored in the register VR2 are added. The result of the addition operation is stored in a register VR3 formed in the vector register 12. Finally, the vector store instruction VSTORE is executed, and hence the operation result stored in the register VR3 is stored in the main memory unit 300 via the store pipeline 20 and the data buffer 30. An access from the scalar unit SU to the main memory unit 300 is carried out via the memory control unit 200.
It is desirable to operate the scalar unit SU and the vector unit VU in parallel in terms of improvements in the system processing performance. However, in a case where a vector instruction or a scalar instruction in a program uses, as an operand, data which is the result of execution of a vector instruction or a scalar instruction to be executed, it is necessary to ensure the order of execution of these instructions. The order is ensured by the serializing control.
In the conventional systems, the serializing control in the vector operation is carried out by means of a post instruction and a wait instruction. This control is carried out so that the reference of the main memory operand of an instruction to be executed prior to the post instruction is carried out in advance of reference of the main memory operand of an instruction to be executed after the wait instruction. Hence, a main memory operand sandwiched between the post instruction and the wait instruction is excluded from the serializing control of the vector operation.
Conventionally, the serializing in combinations of vector and scalar instructions is ensured as follows.
(1) Serializing between vector instructions
until the priority order is assigned. PA2 until the priority order is assigned. PA2 originally ensured. PA2 until the priority order is assigned. PA2 until the priority order is assigned. PA2 until buffer invalidation to the scalar unit SU is completely reflected to the scalar unit.
(2) Serializing between a scalar instruction and a vector instruction
(3) Serializing between a vector instruction and a scalar instruction
The above item (3)-(b) means that, in a case where data is written into the main memory device 300 by a vector store instruction proceeding to a post instruction, if old data is stored in the buffer 26 in the scalar unit SU, an invalidating process for invalidating all data in the buffer 26 is carried out, and a scalar instruction needing to refer to the main memory after the wait instruction is not prevented to be executed.
However, the above buffer invalidating process is not capable of separating only a vector store instruction proceeding to the post instruction and processing the separated vector store instruction in view of the mechanism of the buffer invalidating process. Hence, the end of the buffer invalidating process is detected after execution of a vector store instruction between the post instruction and the wait instruction is completed. Hence, a scalar instruction that is subsequent to the wait instruction and needs to refer to the main memory is not executed until the buffer invalidating process resulting from all vector store instruction proceeding to the wait instruction is completed.
FIG. 3 shows an instruction sequence for carrying out the serializing control using the post instruction and the wait instruction. In FIG. 3, VSTORE denotes a vector store instruction, POST a post instruction, WAIT a wait instruction and LOAD a scalar load instruction. The scalar load instruction LOAD is not performed until execution of the vector instruction VSTORE before the post instruction POST is completed, and the order of referring to the main memory between the vector store instruction VSTORE and the scalar load instruction LOAD is ensured. However, the serializing control is not carried out for the vector store instruction between the post instruction POST and the wait instruction WAIT, and the order of reference of the main memory is not ensured.
In the conventional structure, it is not possible to discriminate the end timing of the buffer invalidating process for the vector store instruction VSTORE before the post instruction POST from the buffer invalidating process for the vector store instruction VSTORE after the post instruction POST. Hence, the wait instruction WAIT cannot be carried out until the buffer invalidating process for all the vector store instructions VSTORE proceeding to the wait instruction WAIT. Hence, execution of an instruction that follows the wait instruction and refers to the main memory is influenced by the scalar store instruction STORE which is sandwiched between the post instruction POST and the wait instruction WAIT and is not needed to ensure the referring order, and is waited until the execution of the above instruction is completed. Hence, the processing efficiency is degraded.