1. Technical Field
The present invention relates to a processor and a vector load instruction execution method and, particularly, to a processor and a vector load instruction execution method for speculatively executing a vector load instruction.
2. Background Art
A vector processing device typically includes a plurality of vector registers and a vector arithmetic unit. The plurality of vector registers store vector data loaded from a main memory, intermediate results during a vector operation and the like. Further, the vector arithmetic unit performs an arithmetic operation on the vector data stored in the vector registers. The vector processing device can process massive data at high speed by the plurality of vector registers and the vector arithmetic unit.
Further, the access speed of the main memory is generally lower than the speed of the vector operation. Therefore, the above-described vector processing device includes load buffers that temporarily store vector data between the main memory and the vector registers, in order to speed up loading of the vector data to the vector registers. Then, the vector processing device starts reading of vector data from the main memory to the load buffers at decoding of the vector load instruction.
Japanese Patent No. 3726092 (Japanese Unexamined Patent Application Publication No. 2005-25693) discloses a technique related to a vector processing device in which transfer from load buffers to vector registers can be executed not in the sequence the instruction is issued but in the sequence the conditions that all elements are stored in the load buffer and a destination vector register is not in the busy state are satisfied.
A method of executing a vector load instruction in the vector processing device disclosed in Japanese Patent No. 3726092 is as follows. First, the vector processing device allocates a load buffer that temporarily stores vector load data. Next, the vector processing device makes memory access. Then, the vector processing device stores all elements of vector data received from the memory into the load buffer. Further, the vector processing device transfers the data to the vector registers. After that, the vector processing device executes the subsequent vector instruction.
Further, regarding a scalar instruction after a branch instruction in pipeline processing, branch target prediction processing that predicts a branch target in the branch instruction is performed so that the pipeline processing can be performed efficiently. In this case, the scalar instruction of the branch target based on the branch target prediction is speculatively executed, and when the branch target prediction succeeds, the speculatively executed scalar instruction can be executed continuously, thereby reducing the processing latency. Further, even when the branch target prediction fails, a correct result can be obtained by newly executing a correct branch target.
However, if speculative execution is performed simply for the vector load instruction after the branch instruction, there is a possibility that, when the branch target prediction fails, the subsequent vector instruction uses data by mistake. For example, when the vector load instruction is issued speculatively, after vector data obtained by memory access is stored into the load buffer, the vector data acquired by the speculative execution is transferred to the vector register even when the branch target prediction fails.
Therefore, in the vector processing device disclosed in Japanese Patent No. 3726092, it is necessary to wait until the branch target in the branch instruction is determined before executing the vector load instruction succeeding to the branch instruction. There is thus a problem that the vector load instruction cannot be executed speculatively, and performance is degraded by the latency of memory access.