1. Technical Field
The present disclosure relates generally to information processing systems and, more specifically, to prefetching of data via speculative helper threads in chip multiprocessors.
2. Background Art
In order to increase performance of information processing systems, such as those that include microprocessors, both hardware and software techniques have been employed. On the hardware side, microprocessor design approaches to improve microprocessor performance have included increased clock speeds, pipelining, branch prediction, super-scalar execution, out-of-order execution, and caches. Many such approaches have led to increased transistor count, and have even, in some instances, resulted in transistor count increasing at a rate greater than the rate of improved performance.
Rather than seek to increase performance through additional transistors, other performance enhancements involve software techniques. One software approach that has been employed to improve processor performance is known as “multithreading.” In software multithreading, an instruction stream is split into multiple instruction streams, or “threads”, that can be executed concurrently.
In one approach, known as time-slice multithreading or time-multiplex (“TMUX”) multithreading, a single processor switches between threads after a fixed period of time. In still another approach, a single processor switches between threads upon occurrence of a trigger event, such as a long latency cache miss. In this latter approach, known as switch-on-event multithreading, only one thread, at most, is active at a given time.
Increasingly, multithreading is supported in hardware. For instance, processors in a multi-processor system, such as a chip multiprocessor (“CMP”) system, may each act on one of the multiple threads concurrently. In another approach, referred to as simultaneous multithreading (“SMT”), a single physical processor is made to appear as multiple logical processors to operating systems and user programs. In SMT, multiple threads can be active and execute concurrently on a single processor without switching. That is, each logical processor maintains a complete set of the architecture state, but many other resources of the physical processor, such as caches, execution units, branch predictors, control logic and buses are shared. With CMP and SMT approaches, the instructions from multiple threads execute concurrently and may make better use of shared resources than TMUX multithreading or switch-on-event multithreading.