1. Field of the Invention
The present invention relates to a data processing device, such as a microprocessor or the like, and more particularly to an effective means for thread management in a multi-thread processor. The multi-thread processor is a process capable of executing a plurality of threads either on a time multiplex basis or simultaneously without requiring the intervention of software, such as an operating system or the like. The threads constitute a flow of instructions having at least an inherent program counter and permit sharing of a register file among them.
2. Prior Art
Many different methods are available for higher speed execution of a serial execution flow by upgrading effective parallelism to a higher level than the serial execution: (1) use of an SIMD (Single Instruction Multiple Data) instruction or a VLIW (Very Long Instruction Word) instruction for simultaneous execution of a single instruction into which a plurality of mutually independent processes are put together, (2) a superscalar method for simultaneous execution of a plurality of mutually independent instructions, (3) an out-of-order execution method of preventing the degradation of effective parallelism and reducing stalls due to dependency among data and resource conflict by executing the flow on an instruction by instruction basis in a different order from that of the serial execution flow, (4) software pipelining to execute a program in which the natural order of the serial execution flow is rearranged in advance to achieve the highest possible level of effective parallelism, and (5) a method of dividing the serial execution flow into a plurality of instruction columns consisting of a plurality of instructions and having this plurality of instruction columns executed by a multi-processor or a multi-thread processor. (1) and (2) are basic methods for parallel processing, (3) and (4), methods for increasing the number of local parallelisms extract, and (5), a method for extracting a general parallelism.
Intel's Merced described in MICROPROCESSOR REPORT, vol. 13, no. 13, Oct. 6, 1991, pp. 1 and 6–10, is mounted with a VLIW system referred to in (1) above, and is further mounted with a total of 256 64-bit registers, comprising 128 each for integers and floating points for use in the software pipelining system mentioned in (4). The large number of registers permits parallelism extraction in the order of tens of instructions.
Compaq's Alpha 21464 described in MICROPROCESSOR REPORT, vol. 13, no. 16, Dec. 6, 1991, pp. 1 and 6–11, is mounted with a superscalar referred to in (2) above, an out-of-order system stated in (3) and a multi-thread system mentioned in (5). It extracts parallelisms in the order of tens of instructions with a large capacity instruction buffer and reorder buffer, further extracts a more general parallelism by a multi-thread method and performs parallel execution by a superscalar method. It is therefore considered capable of extracting an overall parallelism. However, as it does not analyze the relationship of dependency among a plurality of threads, no simultaneous execution of a plurality of threads dependent on one another can be accomplished.
NEC's Merlot described in MICROPROCESSOR REPORT, vol. 14, no. 3, March 2000, pp. 14–15 is an example of multi-processor referred to in (5). Merlot is a tightly coupled on-chip four-parallel processor, executing a plurality of threads simultaneously. It can also simultaneously execute a plurality of threads dependent on one another. In order to facilitate dependency analysis, there is imposed a constraint that a new thread is generated only by the latest existing thread and the new thread comes last in the order of serial execution.
A CPU (Central Processing Unit) in the “speculative parallel instruction threads” in JP-A-8-249183 is an example of multi-thread processor referred to in (5). It is a multi-thread processor for simultaneously executing a main thread and a future threads. The main thread is a thread for serial execution, and the future thread, a thread for speculatively executing a program to be executed in the future in serial execution. Data on a register or memory to be used by the future thread are data at the time of starting the future thread, and may be renewed by the starting time of the future thread in serial execution. If they are renewed, because the data used by the future thread will not be right, the result of the future thread will be discarded, or if not, they will be retained. Whether or not renewal has taken place is judged by checking the program flow until the future thread starting time in possible serial execution by the directions of condition branching and according to whether or not it is a flow to execute an renewal instruction. For this reason, it has the characteristic of requiring no analysis of dependency among the plurality of threads.