1. Field of the Invention
The present invention relates to a vector processing apparatus for carrying out load/store by in response to a vector instruction, and an overtaking control circuit.
2. Description of the Related Art
A vector processing apparatus carries out a vector instruction to access a main memory to carry out a memory access for load/store. Various techniques for this memory access faster are proposed.
A vector processing apparatus is disclosed in Japanese Laid Open Patent Application (JP-P2002-366538A: first conventional example). In the vector processing apparatus, a control is carried out that a precedent store-based instruction is overtaken by a following load-based instruction, in order to make the memory access faster. The vector processing apparatus contains a first overtaking determining section and a second overtaking determining section. The first overtaking determining section determines the overlap between an address region accessed by the preceding store-based instruction and an address region accessed by the following load-based instruction when instructions are generated from a software program, and sets an overtaking bit, and refers to the overtaking bit to carry out the overtaking control of the following load-based instruction. The second overtaking determining section calculates the address region accessed by the store-based instruction and the address region accessed by the load-based instruction, and refers to the overlap between the calculated address regions to carry out the overtaking control.
A vector store instruction overtaking control apparatus is disclosed in Japanese Laid Open Patent Application (JP-A-Heisei 9-231203: second conventional example). In the conventional vector store instruction overtaking control apparatus, in order to make the memory access faster, a following load-based instruction is carried out to carry out the memory access early, if an access region accessed by the following load-based instruction does not overlap with a region of a preceding vector store instruction. The conventional vector store instruction overtaking control apparatus contains a request receiving section, at least one vector store instruction holding section, a following instruction holding section, an access region calculation pipeline section, a region holding section, an overtaking determining section, an instruction pipeline holding section and a selecting section. The request receiving section receives a memory request issued from an instruction issuing section. The vector store instruction holding section holds only a vector store instruction among the instructions received by the request receiving section. The following instruction holding section holds the following instructions except the vector store instruction. The access region calculation pipeline section carries out a region calculation of each of the address regions accessed by the vector store instruction held in the vector store instruction holding section and accessed by the load-based instructions held in the following instruction holding section in a pipeline manner. The region holding section holds the address regions which are calculated by the access region calculation pipeline section for the following load-based instructions and the vector store instruction. The overtaking check determining section detects in the pipeline manner the fact that the vector store instruction can overtake the following load-based instructions, from the address regions held in the region holding section. The instruction pipeline holding section holds the instructions during the region calculation and overtaking check in the pipeline manner. The selecting section selects any instruction to access the memory, from any of the vector store instruction holding section, instruction pipeline holding section and request receiving section, in accordance with the result of the overtaking check.
An instruction sequence control system is disclosed in Japanese Laid Open Patent Application (JP-A-Heisei 8-12661: third conventional example). In the conventional instruction sequence control system, in order to make the memory access faster, a control is carried out that a preceding vector store instruction is overtaken by a following vector load instruction. The conventional instruction sequence control system contains a vector operating unit and a main memory processing unit. The vector operating unit is configured from one or more pipelined operating units, a plurality of vector registers, and a network for connecting the operating units and the vector registers. The main memory operating unit carries out a load/store in units of vector instructions between it and the vector register of a main memory. The conventional instruction sequence control system contains a first section, a second section and a third section. The first section holds a group of instructions to be inputted to the vector operating unit and the main memory processing unit. The second section holds the statuses of the vector register, operating unit and main memory that are used by the instruction under execution. The third section determines the instructions to be inputted to the vector operating unit and main memory processing unit, from the group of instructions held by the first section in accordance with the statuses of various resources held by the second section irrespectively of an instruction input order specified by a program. The third section supplies the vector load instruction to the vector operating unit and main memory processing unit prior to the vector store instruction when a distance between vector elements specified by the vector store instruction is equal to a distance between vector elements specified by the vector load instruction, when a store start point address specified by the vector store instruction is not equal to a load start point address specified by the vector load instruction and if a difference between the store start point address specified by the vector store instruction and the load start point address specified by the vector load instruction is smaller than the distance between the vector elements specified by the vector load instruction.
A vector gather instruction overtaking circuit is disclosed in Japanese Laid Open Patent Application (JP-P2002-297566A: fourth conventional example). In the conventional vector gather instruction overtaking circuit, in order to make the memory access faster, a control is carried out that a preceding vector gather instruction is overtaken by a following vector load instruction. The conventional vector gather instruction overtaking circuit contains a scalar unit, a vector unit and a memory access request generating section. The conventional vector gather instruction overtaking circuit contains an instruction buffer, a first flip-flop group, a first decoder, a second flip-flop group, a second decoder, a third flip-flop group, a vector gather instruction issuing determining section, a selection signal generating section and a selecting section. In the instruction buffer, a vector gather instruction and a following instruction are waited until an address of the vector gather instruction in the vector unit is aligned. The first group of flip-flops holds an instruction effective flag indicating validity/invalidity of an instruction stored in each of stages of the instruction buffer. The first decoder determines whether or not an instruction inputted from the scalar unit is the vector gather instruction. The second group of flip-flops holds a vector gather instruction flag indicating whether or not the instruction stored in each stage of the instruction buffer is the vector gather instruction, in accordance with an output signal of the first decoder. The second decoder determines whether or not the instruction inputted from the scalar unit is a load instruction. The third group of flip-flops holds a load instruction flag indicating whether or not the instruction stored in each stage of the instruction buffer is the load instruction, in accordance with an output signal of the second decoder. The vector gather instruction issuing determining section counts a vector gather instruction issue permission from the vector unit and determines an issue possible state of the vector gather instruction stored in the instruction buffer. The selection signal generating section determines the possibility or impossibility of the overtaking of the preceding vector gather instruction by the following load instruction flag, in accordance with the output signal of the vector gather instruction issuing determining section, the instruction validity flag, the vector gather instruction flag and the load instruction flag, and generates a selection signal based on the possibility or impossibility of the overtaking. The selecting section selects the instruction to be sent to the memory access request generating section from the inputted instructions and the instructions stored in the instruction buffer in accordance with the selection signal from the selection signal generating section.
A vector gather/scatter instruction execution order control circuit is described in Japanese Laid Open Patent Application (JP-A 2002-32361, fifth conventional example). This conventional vector gather/scatter instruction execution order control circuit includes a processing unit and an overtaking control circuit, in order to make the memory access faster. Fields for specifying registers that stores a head address and an end address of a memory area to be accessed are provided in the instruction portion of each of the vector gather instruction and the vector scatter instruction. In the overtaking control circuit, similarly to the vector load instruction and the vector store instruction, it is possible to know the memory area to be accessed based on the registers specified by the instruction even in the vector gather instruction and the vector scatter instruction. Therefore, the vector gather instruction and the vector scatter instruction are handled as targets for the overtaking control.
Also, an information processing apparatus is disclosed in Japanese Laid Open Patent application (JP-A-Heisei 4-369773). In this conventional example, a data path is provided between a main memory unit and a vector operation processing unit. A data path is provided between the main memory unit and a scholar operation processing unit. An instruction circuit issues a load/store instruction of vector data/scholar data to the vector operation processing unit or the scholar operation processing unit in accordance with a program command. The vector operation processing unit contains a vector buffer memory circuit to store a copy of data of the main memory unit, an address storage circuit in which an address of the main memory unit corresponding to the data stored in the vector buffer memory circuit is registered, and an address control circuit to newly register the address or invalidate the registered address. The scholar operation processing unit contains a scholar buffer memory circuit to store a copy of data of the main memory unit, a tag storage circuit in which a block address of the main memory unit corresponding to the data stored in the scholar buffer memory circuit is registered, and a tag control circuit to newly register the block address or invalidate the registered block address. The scholar operation processing unit further contains a tag registration invalidation instructing circuit to check whether a store address of each of a plurality of vector elements generated in accompaniment with the vector data store instruction is registered in the tag storage circuit, to output an invalidation instruction to the tag control circuit when the store address is registered on the tag storage circuit, and a vector store address calculating section to output a store start address and a store end address as an address region on the main memory unit corresponding to the vector data store instruction. The scholar operation processing unit further contains a region detecting circuit to check whether a scholar data load address of a scholar data load instruction is in the address region indicated by the vector store address calculating section when the following scholar load instruction is received from the instruction circuit before the tag registration invalidation instructing circuit completes the operation of the invalidation instruction in response to the vector data store instruction. The scholar operation processing unit further contains a cache control circuit to carry out a control to load the scholar data from the vector buffer memory circuit if the address is registered which is given to the address storage circuit by a scholar data load instruction when a in-region detection signal is outputted from the region detecting circuit in response to the scholar data load instruction from the instruction circuit, and to load the scholar data from the main memory unit if the address corresponding to the scholar data load instruction is registered on the address storage circuit.