1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to an enhanced load lookahead prefetch in single threaded mode for a simultaneous multithreaded microprocessor.
2. Description of Related Art
There is a continual desire by computer users to maximize performance and a corresponding pressure on the computer industry to increase the computing power and efficiency of microprocessors. This is especially evident in the server computer field where entire businesses are dependent on their computer infrastructure to carry out and monitor day-to-day activities that affect revenue, profit, and the like. Increased microprocessor performance will provide additional resources for computer users while providing a mechanism for computer manufacturers to distinguish themselves from the competition.
Over the years, state-of-the-art microprocessors have evolved from fairly straight forward systems to extremely complex integrated circuits having many millions of transistors on a single silicon substrate. One of the many improvements made to microprocessors is the ability of microprocessors to execute more than one instruction per cycle. This type of microprocessor is typically referred to as being “superscalar.” A further performance enhancement is the ability of microprocessors to execute instructions “out of order.” This out-of-order operation allows instructions having no dependencies to bypass other instructions which were waiting for certain dependencies to be resolved. The IBM® Power™ and PowerPC® series of microprocessors are examples of superscalar systems that provide out-of-order processing of instructions. Microprocessors may support varying levels of out-of-order execution support, meaning that the ability to identify and execute instructions out-of-order may be limited.
One major motivation for limiting out-of-order execution support is the enormous amount of complexity that is required to identify which instructions can execute early, and to track and store the out-of-order results. Additional complexities arise when the instructions executed out-of-order are determined to be incorrect per the in-order execution model, requiring their execution to not impact the architected state of the processor when an older instruction causes an exception. As processor speeds continue to increase, it becomes more attractive to eliminate some of the complexities associated with out-of-order execution. This will eliminate logic and the logic's corresponding chip area, or “real estate”, from the chip which is normally used to track out-of-order instructions, thereby allowing additional “real estate” to become available for use by other processing functions.
As known in the art, there are certain conditions that occur when instructions are executed by a microprocessor that will cause a stall to occur where instruction execution is limited or halted until that condition is resolved. One example is a cache miss which occurs when data required by an instruction is not available in a level one (L1) cache and the microprocessor is forced to wait until the data can be retrieved from a slower cache or main memory. Obtaining data from main memory is a relatively slow operation and, when out-of-order execution is limited due to aforementioned complexities, subsequent instructions cannot be fully executed until valid data is received from memory.
More particularly, an older instruction that takes a long time to execute can create a stall that may prevent any younger or subsequent instructions from executing until the time consuming instruction completes. For example, in the case of a load instruction that requires access to data not in the L1 cache (cache miss), a prolonged stall can occur while data is fetched from a slower cache or main memory. Without facilities to support all out-of-order execution scenarios, it may not be possible to change instruction ordering such that forward progress through the instruction stream can be made while the missed data is retrieved.