A vector processing device (vector processor) performs calculation processing or the like in a pipelined manner according to an instruction on array-type data stored in a vector register file. The vector processing device has a plurality of execution pipelines as illustrated in FIG. 18, and the execution pipelines process the array data respectively.
FIG. 18 is a block diagram illustrating a structural example of the vector processing device. In FIG. 18, IF denotes an instruction fetch stage, ID denotes an instruction decoding stage, and EX denotes a calculation execution stage. The vector processing device has an instruction buffer 101, a data dependence detecting unit 102, an instruction issuance control unit 103, execution pipelines 104, a vector register file 105, and a multiplexer circuit 106. FIG. 18 illustrates a vector processing device having four execution pipelines 104, which are pipelines A, B, C, and D.
The instruction buffer 101 stores an instruction (vector instruction) read from a storage device. The data dependence detecting unit 102 determines whether or not a vector register specified by a preceding instruction which is executed precedingly overlaps with a vector register specified by a succeeding instruction which succeeds the preceding instruction, so as to detect a data dependence relation between the preceding instruction and the succeeding instruction. The instruction issuance control unit 103 issues an instruction to an execution pipeline 104 upon reception of an instruction stored in the instruction buffer 101 and a detection result in the data dependence detecting unit 102. The instruction issuance control unit 103 requests a next instruction from the instruction buffer 101 when there is a vacancy in the execution pipelines 104, determines which execution pipeline 104 an instruction is to be issued to according to the data dependence relation and vacant states of the execution pipelines 104, and issues the instruction.
The execution pipelines 104 execute processing on array data by complying with the instruction received from the instruction issuance control unit 103. Each of the execution pipelines 104 has a sequencer 107 and a calculating unit 108. The sequencer 107 performs control related to execution of the instruction received from the instruction issuance control unit 103. For example, the sequencer 107 instructs instruction execution or instructs execution of reading or writing of data from or to the vector register file 105. The calculating unit 108 has a plurality of calculators 109 and executes processing according to an instruction from the sequencer 107. Here, in this specification, for convenience of explanation, it is assumed that the calculating unit 108 has eight 16-bit calculators 109, and 32-bit data are processed using two calculators.
The vector register file 105 stores array data. The array data stored in the vector register file 105 are supplied to the execution pipelines 104 via the multiplexer circuit 106. Note that although not being written in the vector register file 105 yet, it is possible to supply array data generated already as a calculation result to the execution pipelines 104 via the multiplexer circuit 106.
The size of the array data, that is, the number of array elements are specified by a vector length (VL). The array elements whose number is specified by the vector length (VL) forms one array register, and one logical vector register number corresponds to one array register. The size of each array element is assigned according to a data word length handled by the vector processing device. A head value of the physical vector register number corresponding to the logical vector register number is a power of two. When the vector length (VL) is a power of two, a value obtained by multiplying the vector length (VL) by a logical vector register number is the start value of the physical vector register number corresponding to this logical vector register number. Further, when the vector length (VL) is not a power of two, a value obtained by multiplying the smallest value among powers of two equal to or larger than the vector length (VL) by a logical vector register number is the start value of the physical vector register number corresponding to this logical vector register number.
In the following description, using i, j as indexes, vri represents the register of a logical vector register number i, and vr[j] represents the register of a physical vector register number j. When the data word length handled by the vector processing device is Halfword (16 bits), one of registers vr[j] corresponds to one array element, and when the data word length is Word (32 bits), a pair of registers vr[j] corresponds to one array element.
For example, the correspondence of the logical vector register number and the physical vector register number of the vector register when the vector length (VL) is 32, which is a power of two, and the order of processing when calculation processing is executed are as illustrated in FIGS. 19A and 19B. Further, for example, the correspondence of the logical vector register number and the physical vector register number of the vector register when the vector length (VL) is 40, which is a power of two, and the order of processing when calculation processing is executed are as illustrated in FIGS. 20A and 20B.
As illustrated in FIG. 19A, when the data word length is Halfword, a vector register vr[32×i] to vr[32×i+31] with a physical number (32×i) to (32×i+31) corresponds to a vector register vri with a logical number i. Then, for example, when a vector register vr0 with a logical number 0 is specified by a Halfword calculation instruction, vector registers vr[0] to vr[7] with physical numbers 0 to 7 are targets of processing in a first cycle, and vector registers vr[8] to vr[15] with physical numbers 8 to 15 are targets of processing in a second cycle. Further, vector registers vr[16] to vr[23] with physical numbers 16 to 23 are targets of processing in a third cycle, and vector registers vr[24] to vr[31] with physical numbers 24 to 31 are targets of processing in a fourth cycle.
Further, as illustrated in FIG. 19B, when the data word length is Word, a vector register vr[32×i] to vr[32×i+63] with a physical number (32×i) to (32×i+63) corresponds to the vector register vri with the logical number i. Then, for example, when the vector register vr0 with the logical number 0 is specified by a Word calculation instruction, the vector registers vr[0] to vr[7] with the physical numbers 0 to 7 are targets of processing in the first cycle, and the vector registers vr[8] to vr[15] with the physical numbers 8 to 15 are targets of processing in the second cycle. The vector registers vr[16] to vr[23] with the physical numbers 16 to 23 are targets of processing in the third cycle, the vector registers vr[24] to vr[31] with the physical numbers 24 to 31 are targets of processing in the fourth cycle, and vector registers vr[32] to vr[39] with physical numbers 32 to 39 are targets of processing in a fifth cycle. Further, vector registers vr[40] to vr[47] with physical numbers 40 to 47 are targets of processing in a sixth cycle, and vector registers vr[48] to vr[55] with physical numbers 48 to 55 are targets of processing in a seventh cycle, and vector registers vr[56] to vr[63] with physical numbers 56 to 63 are targets of processing in an eighth cycle.
As illustrated in FIG. 20A, when the data word length is Halfword, a vector register vr[64×i] to vr[64×i+39] with a physical number (64×i) to (64×i+39) corresponds to the vector register vri with the logical number i. Then, for example, when the vector register vr0 with the logical number 0 is specified by the Halfword calculation instruction, registers to be targets of processing in the first cycle to the fourth cycle are the same as those when the vector length (VL) is 32. Moreover, the vector registers vr[32] to vr[39] with the physical numbers 32 to 39 are targets of processing in the fifth cycle.
Further, as illustrated in FIG. 20B, when the data word length is Word, a vector register vr[64×i] to vr[64×i+79] with a physical number (64×i) to (64×i+79) corresponds to the vector register vri with the logical number i. Then, for example, when the vector register vr0 with the logical number 0 is specified by the Word calculation instruction, registers to be targets of processing in the first cycle to the eighth cycle are the same as those when the vector length (VL) is 32. Moreover, vector registers vr[64] to vr[71] with physical numbers 64 to 71 are targets of processing in a ninth cycle, and vector registers vr[72] to vr[79] with physical numbers 72 to 79 are targets of processing in a tenth cycle.
Upon reception of an instruction “INS A, B, C”, the vector processing device illustrated in FIG. 18 performs calculation processing corresponding to the instruction INS using corresponding data in a vector register with a logical vector register number A and a vector register with a logical vector register number B, and stores a calculation result in a vector register with a logical vector register number C.
For example, when a Halfword calculation instruction “vaddh vr1, vr6, vr7” is issued to a certain execution pipeline 104. The instruction “vaddh vr1, vr6, vr7” causes a result of adding data in the vector register vr1 with the logical number 1 and the vector register vr6 with the logical number 6 to be stored in the vector register vr7 with the logical number 7. The execution pipeline 104 which has received this instruction executes the following calculation processing in the first cycle.
            vr      ⁡              [        224        ]              =                  vr        ⁡                  [          32          ]                    +              vr        ⁡                  [          192          ]                                vr      ⁡              [        225        ]              =                  vr        ⁡                  [          33          ]                    +              vr        ⁡                  [          193          ]                      …            vr      ⁡              [        231        ]              =                  vr        ⁡                  [          39          ]                    +              vr        ⁡                  [          199          ]                    
Thereafter, when the vector length (VL) is 32, calculation processing is performed while changing the vector registers which are targets of processing in each cycle until the fourth cycle, and when the vector length (VL) is 40, calculation processing is performed while changing the vector registers which are targets of processing in each cycle until the fifth cycle.
Further, for example, when a Word calculation instruction “vadd vr2, vr4, vr0” is issued to a certain execution pipeline 104. The instruction “vadd vr2, vr4, vr0” causes a result of adding data in the vector register vr2 with the logical number 2 and the vector register vr4 with the logical number 4 to be stored in the vector register vr0 with the logical number 0. The execution pipeline 104 which has received this instruction executes the following calculation processing in the first cycle.vr[1−0]=vr[65−64]+vr[129−128]vr[3−2]=vr[67−66]+vr[131−130]vr[5−4]=vr[69−68]+vr[133−132]vr[7−6]=vr[71−70]+vr[135−134]
Thereafter, when the vector length (VL) is 32, calculation processing is performed while changing the vector registers which are targets of processing in each cycle until the eighth cycle, and when the vector length (VL) is 40, calculation processing is performed while changing the vector registers which are targets of processing in each cycle until a tenth cycle.
Thus, in the vector processing device, one instruction is executed across plural cycles in one execution pipeline. The execution pipeline is occupied across the plural cycles until processing is completed regarding the one instruction. Further, the respective execution pipelines included in the vector processing device are operable in parallel. Therefore, when the register specified by a preceding instruction overlaps with the register specified by a succeeding instruction, issuance timings of the instructions with each other is adjusted so that access to the overlapping register is performed properly and is reflected on the respective processing of the preceding instruction and the succeeding instruction. For this purpose, the vector processing device determines presence of the data dependence relation between the preceding instruction and the succeeding instruction when issuing the instructions.
Hazards related to the data dependence relation (data hazards) include a RAW (read after write) hazard and a WAR (write after read) hazard. The RAW hazard is a hazard such that after writing to a vector register is performed by the preceding instruction, in processing performed by the succeeding instruction using the vector register in which writing is performed by the preceding instruction, reading by the succeeding instruction is performed before the writing by the preceding instruction. Further, the WAR hazard is a hazard such that after reading from a vector register is performed by the preceding instruction, in processing performed by the succeeding instruction to write in the same vector register, writing by the succeeding instruction is performed before the reading by the preceding instruction.
When the data dependence relation is detected between the preceding instruction and the succeeding instruction, the vector processing device performs control to delay issuance of the succeeding instruction for a certain cycle until processing by the preceding instruction is performed, thereby avoiding the data hazard by a stall. FIGS. 21A and 21B are diagrams illustrating an operation example of avoiding the data hazard. Note that the vector length (VL) in the example illustrated in FIGS. 21A and 21B is 32.
FIG. 21A illustrates an example of an issuance timing of instruction related to avoidance of the RAW hazard. FIG. 21A illustrates an example in which “vadd vr2, vr4, vr0” is issued as the preceding instruction to a pipeline A, and as the succeeding instruction thereafter, “vaddh vr1, vr6, vr7” is issued to a pipeline B. The instruction “vadd vr2, vr4, vr0” is a Word calculation instruction for storing a result of adding data in the vector register vr2 with the logical number 2 and the vector register vr4 with the logical number 4 in the vector register vr0 with the logical number 0. Further, the instruction “vaddh vr1, vr6, vr7” is a Halfword calculation instruction for storing a result of adding data in the vector register vr1 with the logical number 1 and the vector register vr6 with the logical number 6 in the vector register vr7 with the logical number 7. In FIG. 21A, for the instruction “vadd vr2, vr4, vr0”, the physical number (head value) of the vector register vr0 with the logical number 0 which is a destination register in which the calculation result is written is illustrated in every cycle. Further, for the instruction “vaddh vr1, vr6, vr7”, the physical number (head value) of the vector register vr1 with the logical number 1 which is a source register from which data used in the calculation operation is read is illustrated in every cycle.
In processing by the preceding instruction “vadd vr2, vr4, vr0” and processing by the succeeding instruction “vaddh vr1, vr6, vr7”, the vector registers vr[32] to vr[63] with the physical numbers 32 to 63 overlap. For example, in the vector registers vr[32] to vr[39] with the physical numbers 32 to 39, reading of data is performed in the beginning cycle in processing of the succeeding instruction “vaddh vr1, vr6, vr7”, but writing of data is performed in the fifth cycle in processing of the preceding instruction “vadd vr2, vr4, vr0”. In order to reflect a processing result of the preceding instruction on processing of the succeeding instruction, reading of data from the vector registers vr[32] to vr[39] with the physical numbers 32 to 39 by the succeeding instruction “vaddh vr1, vr6, vr7” needs to be performed after the cycle 5. Accordingly, in the cycle 2 to cycle 5, a stall due to the RAW hazard is made to occur, and the succeeding instruction “vaddh vr1, vr6, vr7” is issued in the cycle 6.
FIG. 21B illustrates an example of an issuance timing of instruction related to avoidance of the WAR hazard. FIG. 21B illustrates an example in which “vadd vr0, vr4, vr2” is issued as the preceding instruction to the pipeline A, and as the succeeding instruction thereafter, “vaddh vr6, vr7, vr1” is issued to the pipeline B. The instruction “vadd vr0, vr4, vr2” is a Word calculation instruction for storing a result of adding data in the vector register vr0 with the logical number 0 and the vector register vr4 with the logical number 4 in the vector register vr2 with the logical number 2. Further, the instruction “vaddh vr6, vr7, vr1” is a Halfword calculation instruction for storing a result of adding data in the vector register vr6 with the logical number 6 and the vector register vr7 with the logical number 7 in the vector register vr1 with the logical number 1. In FIG. 21B, for the instruction “vadd vr0, vr4, vr2”, the physical number (head value) of the vector register vr0 with the logical number 0 which is a source register from which data used in the calculation operation is read is illustrated in every cycle. Further, for the instruction “vaddh vr6, vr7, vr1”, the physical number (head value) of the vector register vr1 with the logical number 1 which is a destination register in which the calculation result is written is illustrated in every cycle.
In processing by the preceding instruction “vadd vr0, vr4, vr2” and processing by the succeeding instruction “vaddh vr6, vr7, vr1”, the vector registers vr[32] to vr[63] with the physical numbers 32 to 63 overlap. For example, in the vector registers vr[32] to vr[39] with the physical numbers 32 to 39, writing of data is performed in the beginning cycle in processing of the succeeding instruction “vaddh vr6, vr7, vr1”, but reading of data is performed in the fifth cycle in processing of the preceding instruction “vadd vr0, vr4, vr2”. In order to perform processing of the preceding instruction before a processing result of the succeeding instruction is written, writing of data in the vector registers vr[32] to vr[39] with the physical numbers 32 to 39 by the succeeding instruction “vaddh vr6, vr7, vr1” may be performed after the cycle 5. Accordingly, in the cycle 2 to cycle 5, a stall due to the WAR hazard is made to occur, and the succeeding instruction “vaddh vr6, vr7, vr1” is issued in the cycle 6.
Further, in Patent Document 1 below, there is proposed a technique such that, when there is register interference (the data dependence relation exists between the preceding instruction and the succeeding instruction) and the preceding instruction needs a longer processing time than the succeeding instruction, the starting time of the succeeding instruction is set to eliminate the necessity to wait until execution of the preceding instruction is completed, so as to improve processing performance.    [Patent Document 1] Japanese Laid-open Patent Publication No. 60-178580
However, in the vector processing device, due to handling of array-type data, there is a problem that the stall period becomes long when the stall due to the data hazard is made to occur. For example, there is a problem that when a vector register storing a processing result from the middle of processing of the preceding Word instruction is used by the succeeding Halfword instruction, the succeeding instruction is stalled for a long period until writing by the preceding instruction in the vector register used in the succeeding instruction is completed.