Generally, a processor such as CPU (Central Processing Unit) or MPU (Micro Processing Unit) is an arithmetic processing unit that is included in an information processor, a portable telephone, or the like and executes various types of arithmetic processing. In recent years, with the miniaturization of a processor and the improvement of processing performance, the application field of the processor has been various.
It will be now explained about the configuration of a processor and a pipelining of the processor with reference to FIG. 10 and FIG. 11. FIG. 10 is a diagram illustrating a configuration example of a processor. FIG. 11 is a diagram illustrating an example of a pipelining of the processor.
For example, as illustrated in FIG. 10, the processor sends an instruction from a primary instruction cache to an instruction decoder and decodes the instruction by the instruction decoder in a decode (D) cycle (see FIG. 11). The decode by the instruction decoder is performed on RS (Reservation Station) that functions as an execution queue that stores instructions to be executed and is performed in order (in a program sequence).
For example, RS includes RSA (Reservation Station for Address) for load and store instructions, RSE (Reservation Station for Execution) for fixed point arithmetic instructions (integer arithmetic instructions), RSF (Reservation Station for Floating point) for floating point arithmetic instructions, RSBR (Reservation Station for BRanch) for branch instructions, and the like.
The processor registers all instructions decoded by the instruction decoder in CSE (Commit Stack Entry) that performs the management of all the instructions and registers all the instructions in the reservation stations that perform an out-of-order execution control in such a manner that an executable instruction is executed in first regardless of a program sequence. Next, the processor selects an instruction that can be executed in a priority (P) cycle (see FIG. 11) of each RS in an out-of-order manner.
After that, the processor reads out a register in a buffer (B) cycle (see FIG. 11) and executes arithmetic processing in an instruction execution (X) cycle (see FIG. 11). The execution result by the arithmetic processing is stored in an updating buffer in a register updating (U) cycle (see FIG. 11). After that, the process waits a commit process that is an instruction completion process.
Then, the processor receives reports such as execution completion of arithmetic processing in CSE, completion of data transfer process from a primary data cache, and completion of branch determination process from a branch control mechanism, and performs a commit process in order. Next, the processor writes information from the updating buffer to the register in a register writing (W) cycle (see FIG. 11) and performs updating on PC (Program Counter) and NPC (Next Program Counter) that is the next PC.
However, as a technique for improving a resource utilization ratio in a cache, a pipeline, an arithmetic unit, and the like that are required for instruction execution performed by the processor and drawing out the performance of the processor, there is a technology called a super scalar method as illustrated in FIG. 12. FIG. 12 is a diagram explaining an out-of-order process performed by a super scalar method.
For example, as illustrated in FIG. 12, in a super scalar method, when a processor includes a plurality of arithmetic units, the processor includes reservation stations (RSs) corresponding to the respective arithmetic units. The assignment of instructions to the RSs is performed by a decoder. The arithmetic unit executes arithmetic processing that is first entered in each RS and that is dispatched (issued) in accordance with instruction priority of execution preparation completion (oldest ready).
In FIG. 12, the processor includes four arithmetic units that can be simultaneously executed as the maximum number of processing to execute instructions in an out-of-order manner. The processor also performs the fetch, decode, and, commit of a plurality of instructions in order. In other words, in FIG. 12, even if the plurality of arithmetic units preferentially executes instructions that are first entered and are in an execution preparation (oldest ready) state in an out-of-order manner, the processor performs a commit process in order.
Next, it will be explained about RS that has a plurality of dissymmetric fixed point arithmetic units with reference to FIG. 13. FIG. 13 is a diagram explaining an example of RS that has a plurality of asymmetric fixed point arithmetic units. In this case, when an arithmetic unit A and an arithmetic unit B exist together, asymmetry means that operations that can be executed by the arithmetic units A and B are different. As an example of asymmetry, the arithmetic unit A can perform only addition and subtraction and the arithmetic unit B can perform only multiplication and division.
For example, as illustrated in FIG. 13, when a processor includes asymmetric fixed point arithmetic units (EX: EXecution unit) EXA and EXB, the presence of an instruction that can be executed by only the EXA is determined by a decoder and the instruction is registered in the corresponding RSE (RSEA or RSEB) of EXA and EXB. Then, the instruction is dispatched to the EXA or the EXB in accordance with the priority of an instruction for each RSE.
Next, it will be explained about the dispatch to an arithmetic unit with reference to FIG. 14. FIG. 14 is a diagram explaining an example of the dispatch to an operational unit. In FIG. 14, it will be explained about the case where four instructions are simultaneously decoded at a maximum. Moreover, EXA_ONLY in FIG. 14 indicates an instruction that can be executed by only the EXA.
For example, as illustrated in FIG. 14, when four instructions can be simultaneously decoded at a maximum, the decoded instructions are assigned to RSEA or RSEB and the instructions are registered in “0 to 3” that are queues of RSEA or RSEB. At this time, as described above, instructions that can be executed by only the EXA are registered in RSEA.
While the old instruction of the instructions registered in each of queues is moved to the upper number (for example, 4 to 9) every cycle, i.e. a bubble up is performed, a priority circuit selects an oldest-ready instruction from each of RSEA and RSEB. Next, when “+INH_EXA(B)_P_TGR” that is a dispatch inhibition flag for each of the arithmetic units EXA and EXB is not on, the selected instruction is dispatched because “EXA(B)_VALID” of a priority selection signal of the arithmetic unit or that is a priority selection signal of the arithmetic unit becomes valid.
In the configuration illustrated in FIG. 14, the priority circuit after bubble up transmits and receives information such as various types of flags or conditions and resources such as real data and addresses required for operations are transmitted and received by a tag part illustrated at the right side of the priority circuit. In brief, when the dispatch inhibition flag is not on (when the dispatch inhibition flag is “−INH_EXA(B)_P_TGR”), the arithmetic unit EXA or EXB performs arithmetic processing of real data.
Next, it will be explained about the functional block of the priority circuit of RSEA with reference to FIG. 15. FIG. 15 is a diagram explaining the functional block of the priority circuit of RSEA.
For example, as illustrated in FIG. 15, the priority circuit sets only one bit of a signal “+P_EXA_SEL[0:9]” for selecting an entry to be dispatched to EXA to “1” by using a READY condition of each of nine instructions “+P_RSEA—0” to “+P_RSEA—9” that function as entries as an input condition.
A READY condition of each entry includes, for example, “Condition 1: the entry is valid and is not dispatched”, “Condition 2: a source 1 register does not dependency relation or can be bypassed”, and “Condition 3: a source 2 register does not dependency relation or can be bypassed”.
Next, it will be explained about the flow performed by the priority circuit of RSEA with reference to FIG. 16. FIG. 16 is a diagram explaining the flow performed by the priority circuit of RSEA.
For example, as illustrated in FIG. 16, the priority circuit decides the READY conditions of “P_RSEA—0” to “P_RSEA—9” from READY conditions (Step S1). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_RSEA—9” (Step S2).
Next, when all the three conditions are satisfied (Step S2: YES), the priority circuit selects the queue “9” “+P_EXA_SEL[9]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S3). In this case, the priority circuit in relation to the input of three conditions determines a queue from an old queue, i.e., the queue “9” among “0 to 9” when Condition 1 to Condition 3 are satisfied as illustrated in FIG. 17. FIG. 17 is a diagram illustrating the details of the priority circuit of RSEA.
Moreover, when the three conditions are not satisfied (Step S2: NO), the priority circuit does not select the queue “9” “+P_EXA_SEL[9]=0” (Step S4). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_RSEA—8” (Step S5). Next, when all the three conditions are satisfied (Step S5: YES), the priority circuit selects the queue “8” “+P_EXA_SEL[8]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S6).
Moreover, when the three conditions are not satisfied (Step S5: NO), the priority circuit does not select the queue “8” “+P_EXA_SEL[8]=0” (Step S7). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_RSEA—7” (Step S8).
Next, when all the three conditions are satisfied (Step S8: YES), the priority circuit selects the queue “7” “+P_EXA_SEL[7]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S9). On the other hand, when the three conditions are not satisfied (Step S8: NO), the priority circuit performs the same process on the queues “0 to 6”.
According to the flow, as illustrated in FIG. 18, the priority circuit of RSEA selects “9” if it is “+P_RSEA—9_READY”, for example. On the other hand, in the “−P_RSEA—9_READY” state where “9” is not selected, the priority circuit selects “8” if it is “+P_RSEA—8_READY”.
Moreover, in the case of “−P_RSEA—9_READY”, “−P_RSEA—8_READY”, and “+P_RSEA—7_READY”, the priority circuit selects “7”. FIG. 18 is a diagram illustrating the details of the priority circuit of RSEA. Furthermore, dispatch is performed on RSEB by a priority circuit similarly to RSEA.
However, as described above, when each of the fixed point arithmetic units has RSE, a mounting area in a circuit increases. Thus, it is preferable that a processor for executing a floating point arithmetic program for HPC (High Performance Computing) etc. that has a higher load than that of a fixed point arithmetic program effectively utilizes the RSE resource to be a smaller mounting area because it is a few that all the entries of RSE are filled up.
Therefore, in recent years, as illustrated in FIG. 19, there is a technique for sharing RSE to be a smaller mounting area and registering instructions in one RSE without having RSE for each fixed point arithmetic unit. In this regard, FIG. 19 is a diagram explaining an RSE sharing example.
However, in the technology for sharing RSE described above, there is a problem in that an operation according to an instruction that can be executed by only a predetermined arithmetic unit has errors. Specifically, in the technology for sharing RSE described above, an instruction that cannot be executed by an arithmetic unit is dispatched and thus an operation according to an instruction that can be executed by only a predetermined arithmetic unit has errors because it is not considered that different executable instructions are assigned to arithmetic units.    [Patent Document 1] Japanese Laid-open Patent Publication No. 2000-105699