When a single thread is executed, various factors (an instruction cache miss, a data cache miss, an inter-instructions penalty, or the like) cause a processor to be fallen in a stalled state.
A cycle in which the processor is in a stalled state decreases execution efficiency (i.e., a throughput) of the processor during its execution of the single thread.
In view of the above, a multi-thread processor having an architecture capable of simultaneously executing a plurality of threads are proposed for improving the execution efficiency of the processor.
As the multi-thread processor capable of simultaneously executing the threads, a method has been known which executes threads with switching the threads to be executed at predetermined time intervals, using a round robin scheduling, for example.
In this case, a time period required for processing of each of the threads is longer than that for a case where a single thread is executed. However, even if a certain thread is in the stalled state, another thread is executed after a predetermined time period elapses. Therefore, a whole throughput is improved in comparison with the case where a single thread is to be executed, thereby improving the execution efficiency of the processor.
However, upon switching among the threads, if the threads are simply switched at predetermined time intervals, it is difficult to estimate a processing time period up to completion of the processing. For this reason, practicality is low in a field which requires real time. In addition, since an execution order of the threads is not optimal, the execution efficiency of the processor is low.
In view of the above, in order to perform the switching in threads for improving the estimation of the processing time period and the execution efficiency, techniques disclosed in Patent Literatures 1 and 2 have been proposed.
Patent Literature 1 discloses a thread scheduler, referring to diagrams which show an overall configuration in detail. With this configuration, a priority in threads is considered, and a thread can be selected so that the execution efficiency in the thread having a higher priority increases.
Patent Literature 2 discloses a technique in which instruction-issuance groups in a thread are counted, and the number of cycles necessary for processing the thread is calculated, so as to efficiently switch a plurality of threads, taking the priority in the threads into account.
Description is given more specifically, with reference to FIG. 12.
FIG. 12 is a diagram which shows an overall configuration of a multithread processor including a conventional instruction-issuance controlling device.
In the overall configuration diagram, a multithread processor capable of simultaneously executing N threads is assumed. However, the number of threads which can be executed is not essentially limited.
As shown in FIG. 12, the multithread processor including a conventional instruction-issuance controlling device includes an instruction cache memory 101, an instruction fetch unit 102, an instruction buffer 103, an instruction-issuance controlling device 104, an instruction execution unit 107, a data cache memory 108, and a register file 109.
The instruction cache memory 101 supplies an instruction to the instruction buffer 103, in response to a request from the instruction fetch unit 102. If the requested instruction is not cached, an instruction is obtained from a main memory (not shown) outside the multithread processor.
The instruction fetch unit 102 fetches the instruction from the instruction cache memory 101 to the instruction buffer 103, in response to a request from the instruction buffer 103.
N instruction buffers 103 are prepared, and each of which corresponds to a corresponding one of threads. Each of the instruction buffers 103 stores an instruction stream to be executed by each of the threads.
The instruction-issuance controlling device 104 sends a control signal to the instruction buffer 103, and the instruction execution unit 107 issues an instruction to be next executed as an instruction group.
The instruction execution unit 107 is a processing unit including a plurality of calculating units, such as an adder and a multiplier, and executes the instruction group issued from the instruction buffer 103.
The data cache memory 108 supplies data necessary for the calculation to the instruction execution unit 107, in response to the request from the instruction execution unit 107. If the data cache memory 108 does not store the requested data, the data cache memory 108 obtains the requested data from the main memory (not shown) outside the multithread processor, and then supplies the obtained data to the instruction execution unit 107.
N register files 109 are prepared, and each of which corresponds to a corresponding one of the threads and to a register access which relates to instruction execution of each of the threads. The register files 109 are a group of registers each of which holds data to be readout and written in by executing each of threads stored in the instruction buffer 103.
The instruction-issuance controlling device 104 includes an instruction grouping unit 105 and a thread selection unit 106.
The instruction grouping unit 105 groups, as an instruction group, one or more instructions which can be simultaneously executed, according to dependency in the instruction buffer 103, among instructions in the instruction buffer 103 which correspond to the respective threads selected by the thread selection unit 106.
In other words, instructions which can be issued in a single cycle are grouped as one group.
The thread selection unit 106 determines a thread to be executed next from among N threads, based on the priority which is previously determined or dynamically varies.
In other words, the conventional instruction-issuance controlling device 104 shown in FIG. 12 determines the thread to be executed based on the priority which is previously determined by the thread selection unit 106 or dynamically varies. In addition, the conventional instruction-issuance controlling device 104 groups, in an issuable group, instruction streams stored in the instruction buffer 103 and corresponding to the respective threads to be executed, and issues the group to the instruction execution unit 107.