A vector processor is used for vector processing in which a constant operation is repeated on a large quantity of element data forming an array. By means of a vector processor, element data of an array can be processed continuously using a single instruction, and high operation throughput can be obtained.
A vector processor has for example load/store and operation pipelines. An operation pipeline performs a single operation instruction fetch and decoding, sequentially and continuously reads out element data from a register (hereafter called a vector register), and executes arithmetic operation or other operation. The operation pipeline stores element data indicating the operation result in a vector register in the order of processing.
In a vector processor, operation instructions are processed for element data with different bit widths (for example, 8 bits, 16 bits, 32 bits, 64 bits, or similar). In general, an operation pipeline has a plurality of operation units each of which perform prescribed bit operations, and performs operation processing for a plurality of arrays in one cycle. Hence when the bit widths of element data differ depending on the operation instruction, the number of arrays of element data processed in one cycle also differ depending on the operation instruction. In a vector processor, the number of arrays of element data for each instruction is set to be the same, and so when the bit widths of element data are different depending on the operation instruction, the number of processing cycles for the operation instruction differs with the operation instruction. For example, under prescribed conditions, a half-word instruction with a bit width of 16 bits requires four cycles, whereas a full-word instruction with a bit width of 32 bits requires eight cycles.
In this way, when processing operation instructions for element data with different bit widths, delays in issuing subsequent operation instructions may occur. For example, subsequent to a preceding full-word instruction (for example, eight cycles), a half-word instruction (for example, four cycles) may be processed. Further, at this time the subsequent half-word instruction is assumed to process element data which is processed in the latter-half four cycles among the element data processed by the preceding full-word instruction.
At this time, in the cycle immediately after the preceding full-word instruction is issued, when the subsequent half-word instruction is issued, in the preceding full-word operation instruction, processing of the element data which is to be processed in the subsequent half-word instruction is not ended. Hence the vector processor waits for the end of processing of the element data, and issues the subsequent half-word instruction. As a result the issuing of the subsequent half-word instruction is delayed, and operation throughput falls.
Hence when a vector processor has a plurality of operation pipelines, a full-word instruction requiring numerous processing cycles is for example divided into two operation instructions, and the divided operation instructions are processed using separate operation pipelines. By this means, processing of element data which is to be processed in a subsequent half-word instruction ends more quickly, and the vector processor can issue the subsequent half-word instruction earlier. As a result, reductions in operation throughput are suppressed.
In a vector processor having a plurality of operation pipelines, when operators with a large circuit scale such as multipliers or operators with low frequency of use are all implemented in operation pipelines, the circuit scale of the processor as a whole becomes large. Hence operators with a large circuit scale and operators with low frequency of use are implemented in only a portion of operation pipelines among the plurality of operation pipelines.
Vector processors are for example described in Japanese Patent Publication No. 2544770 and Japanese Patent Application Publication No. 2009-193378.