Field of the Invention
This invention relates to computing systems, and more particularly, to efficient thread arbitration in a threaded processor with dynamic resource allocation.
Description of the Relevant Art
The performance of computer systems is dependent on both hardware and software. In order to increase the throughput of computing systems, the parallelization of tasks is utilized as much as possible. To this end, compilers may extract parallelized tasks from program code and many modern processor core designs have deep pipelines configured to perform multi-threading.
In software-level multi-threading, an application program uses a process, or a software thread, to stream instructions to a processor for execution. A multi-threaded software application generates multiple software processes within the same application. A multi-threaded operating system manages the dispatch of these and other processes to a processor core. In hardware-level multi-threading, a simultaneous multi-threaded processor core executes hardware instructions from different software processes at the same time. In contrast, single-threaded processors operate on a single thread at a time.
Often times, threads and/or processes share resources. Examples of resources that may be shared between threads include queues utilized in a fetch pipeline stage, a load and store memory pipeline stage, rename and issue pipeline stages, a completion pipeline stage, branch prediction schemes, and memory management control. These resources are generally shared between all active threads. Dynamic resource allocation between threads may result in the best overall throughput performance on commercial workloads. In general, resources may be dynamically allocated within a resource structure such as a queue for storing instructions of multiple threads within a particular pipeline stage.
Over time, shared resources can become biased to a particular thread, especially with respect to long latency operations such as loads that miss a last-level data cache. A thread hog results when a thread accumulates a disproportionate share of a shared resource and the thread is slow to deallocate the resource. For certain workloads, thread hogs can cause dramatic throughput losses for not only the thread hog, but also for other threads sharing the same resource.
In view of the above, methods and mechanisms for efficient thread arbitration in a threaded processor with dynamic resource allocation are desired.