Field of the Invention
The present invention generally relates to multi-threaded computer architecture, and, more particularly, to an approach for a configurable phase-based priority scheduler.
Description of the Related Art
A common practice in parallel processing systems is to design a processor that executes multiple threads simultaneously. In a typical thread sequence, the processor executes a series of instructions and then performs an operation to load data from memory. The load operation retrieves one or more data items from memory that the processor then processes during the following execution cycle. For example, the load operation could retrieve data from system memory representing a texture map to be applied to a graphics object. In another example, the load operation retrieves data stored in a file on a storage disk subsystem.
Because the time to retrieve the data items is indeterminate, the processor waits for the memory load operation to complete. The processor may execute some instructions during the waiting period, so long as those instructions do not depend on the data retrieved during the load operation. Otherwise, the processor suspends execution of instructions while the load operation is pending. Once the load operation completes, the processor resumes execution of instructions until the next memory load operation is encountered. The processor then suspends execution again, pending completion of the next load operation. During these suspension periods, the processor does not execute instructions, resulting in loss of performance. This sequence of execution cycles interspersed with memory load operations is typical of operations that may be performed by single instruction multi-thread (SIMT) and single instruction multi-data (SIMD) processors.
In such cases, one approach to improving processor performance is to schedule a second thread to execute during the waiting period. As the first thread suspends execution and enters a wait period, the processor executes the second thread while the memory load operation for the first thread is pending. This approach improves performance because the processor executes instructions for the second thread during the waiting period associated with the first thread. However, one drawback to this approach is that the execution cycle for the second thread may differ from the duration of the waiting period for the first thread. If the execution cycle for the second thread is shorter than the duration of the waiting period for the first thread, then the second thread enters a waiting period while the first thread is still in a waiting period. In such cases, the processor suspends execution of instructions for both threads until at least one of the memory load operation completes.
If, on the other hand, the execution cycle for the second thread is longer than the duration of the waiting period for the first thread, then the first thread may preempt execution of instructions by the second thread. In such cases, the first thread resumes execution of instructions until the first thread encounters the next memory load operation. The processor then suspends the first thread again and completes the remaining portion of the execution cycle for the second thread until the second thread encounters the next memory load operation. The processor then suspends the second thread and waits for at least one of the memory load operations to complete. Again, such a toggled approach to suspending and resuming execution across different threads results in performance losses.
As the foregoing illustrates, what is needed is a more effective way to schedule threads for execution.