1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to the field of instruction issue control, and the processing associated therewith, for multi-threaded in-order superscalar processors.
2. Description of the Prior Art
It is known to provide multi-threaded superscalar processors for reasons including increasing instruction execution throughput. Most known multi-threaded superscalar processors are out-of-order machines such as the Intel Pentium4, IBM Powers and Compaq Alpha 21464 microprocessors. Typically such out-of-order superscalar processors keep separate issue queues for different instruction types, such as branch, ALU and memory access instructions. Instructions that are ready to be issued are issued to the corresponding execution unit from the separate queues. Techniques such as register renaming are used to increase the ability to issue multiple instructions in parallel without generating data hazards and to support the pipelined nature of the execution units. The control of the issue of instructions in such out-of-order systems is highly complex and the issue stage will often comprise a large number of gates, consume considerable power and give rise to speed limiting critical paths. Nevertheless, when maximum instruction execution throughput is the overriding priority, the overhead of supporting such complexities within the issue stage are justified.
It is also possible to apply multi-threaded processing to in-order superscalar processors. Following the program order within each thread, and accordingly accepting the static scheduling of the instructions imposed by the programmer when writing the program, significantly simplifies the overhead associated with the issue stage of the instruction pipeline. Multi-threaded execution using the superscalar execution mechanisms permitting multiple instructions to execute in parallel may nevertheless be supported to increase performance with multiple program threads executing in parallel and the program order being followed within each of those threads. Processors following this model may have a better combination of features such as performance, complexity, cost, power consumption and the like for certain applications, e.g. mobile processing applications where a high power consumption resulting in a short battery life is undesirable. One known multi-threaded in-order superscalar processor is the Porthos network packet processor having 32-thread support and capable of scheduling three instructions to the functional units in parallel. A discussion of the Porthos processor can be found in S. Melvin, M. Nemirovsky, E. Musoll, J. Huynh, R. Milito, H. Urdaneta and K. Saraf, “A Massively Multithreaded Packet Processor”, Workshop on Network Processors held in conjunction with The 9th International Symposium on High-Performance Computer Architecture, Feburary 2003.
The Porthos processor issues at most one instruction from one thread at a time and this can adversely impact the overall performance in certain cases. For example, if only one thread is ready to issue instructions at a time, then only one instruction at a time can be issued even though there are two other issue slots available. This simplification in the Porthos processor allows a simplified issue stage to be formed since it reduces the requirement for data hazard checking and other forms of checking and the control necessary when more aggressive issue is performed in a manner seeking to increase performance by issuing as many instructions as possible in parallel even when they may originate from the same thread and accordingly require significant data hazard and other checking before issue.
It is desirable to maintain the simplicity and speed of a multi-threaded in-order superscalar processor relative to the out-of-order processors. Nevertheless, it is desirable to be able to support relatively aggressive instruction issue control so as to increase instruction throughput whilst not rendering the issue stage within the instruction pipeline a performance limiting part of the overall design, such as by introducing one or more critical paths within a complex issue stage capable of supporting aggressive instruction issue control. Furthermore, reducing the circuit overhead associated with instruction issue control whilst still achieving a good degree of performance is desirable.