1. Field
Various embodiments of the invention pertain to processor operation and architectures, and particularly to a multi-threaded processor that internally reorders output threads thereby avoiding the need for an external reorder buffer.
2. Background
Multi-threaded processors are designed to improve processing performance by efficiently executing multiple streams of encoded data (i.e., threads) at once within a single processor. Multiple storage registers are typically used to maintain the state of multiple threads at the same time. Multi-threaded architectures often provide more efficient utilization of various processor resources, and particularly the execution logic or arithmetic logic unit (ALU) within the processor. By feeding multiple threads to the ALU, clock cycles that would otherwise have been idle due to a stall or other delays in the processing of a particular thread may be utilized to service a different thread.
A conventional multi-threaded processor may receive multiple threads and processes each thread so as to maintain the same input thread order at the output stage. This means that the first thread received from a program is the first thread outputted to the program.
Programmable multi-threaded processors often include flow control capabilities. This permits programs to include flow control instructions sent to the programmable multi-threaded processor that may cause threads to be processed out of order. For example, a first input thread may not finish execution first, in some cases, it may finish execution last. However, programs expect to receive outputted threads in the order in which they were sent to the processor.
One approach to maintaining the order of a sequence of threads for a particular program or application is to add a large buffer to reorder the threads. This buffer is typically external to the multi-threaded processor core and requires additional logic to implement. Adding a large external buffer increases the cost of implementing a multi-threaded processor and also takes up much needed space.
Thus, a way is needed to reorder a sequence of threads for a particular program so that they are outputted by a multi-threaded processor in the same order as they are received without the need for an additional reorder buffer.